PoDoFo  1.0.0-dev
Public Member Functions | Static Public Member Functions | Static Public Attributes | Protected Types | Protected Member Functions | List of all members
PoDoFo::PdfTokenizer Class Reference

A simple tokenizer for PDF files and PDF content streams. More...

#include <PdfTokenizer.h>

Inheritance diagram for PoDoFo::PdfTokenizer:
PoDoFo::PdfPostScriptTokenizer

Public Member Functions

 PdfTokenizer (const PdfTokenizerOptions &options={ })
 
 PdfTokenizer (const std::shared_ptr< charbuff > &buffer, const PdfTokenizerOptions &options={ })
 
bool TryReadNextToken (InputStreamDevice &device, std::string_view &token)
 Reads the next token from the current file position ignoring all comments. More...
 
bool TryReadNextToken (InputStreamDevice &device, std::string_view &token, PdfTokenType &tokenType)
 
bool TryPeekNextToken (InputStreamDevice &device, std::string_view &token)
 Try peek the next token from the current file position ignoring all comments, without actually consuming it. More...
 
bool TryPeekNextToken (InputStreamDevice &device, std::string_view &token, PdfTokenType &tokenType)
 
int64_t ReadNextNumber (InputStreamDevice &device)
 Read the next number from the current file position ignoring all comments. More...
 
bool TryReadNextNumber (InputStreamDevice &device, int64_t &value)
 
void ReadNextVariant (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt={ })
 Read the next variant from the current file position ignoring all comments. More...
 
bool TryReadNextVariant (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt={ })
 

Static Public Member Functions

static bool IsWhitespace (char ch)
 Returns true if the given character is a whitespace according to the pdf reference. More...
 
static bool IsDelimiter (char ch)
 Returns true if the given character is a delimiter according to the pdf reference.
 
static bool IsTokenDelimiter (char ch, PdfTokenType &tokenType)
 Returns true if the given character is a token delimiter.
 
static bool IsRegular (char ch)
 True if the passed character is a regular character according to the PDF reference (Section 3.1.1, Character Set); ie it is neither a white-space nor a delimiter character.
 
static bool IsPrintable (char ch)
 True if the passed character is within the generally accepted "printable" ASCII range.
 

Static Public Attributes

static constexpr unsigned BufferSize = 4096
 

Protected Types

enum class  PdfLiteralDataType {
  Unknown = 0 , Bool , Number , Real ,
  String , HexString , Name , Array ,
  Dictionary , Null , Reference
}
 

Protected Member Functions

void ReadNextVariant (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 Read the next variant from the current file position ignoring all comments. More...
 
bool TryReadNextVariant (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 
void EnqueueToken (const std::string_view &token, PdfTokenType type)
 Add a token to the queue of tokens. More...
 
void ReadDictionary (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 Read a dictionary from the input device and store it into a variant. More...
 
void ReadArray (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 Read an array from the input device and store it into a variant. More...
 
void ReadString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 Read a string from the input device and store it into a variant. More...
 
void ReadHexString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt)
 Read a hex string from the input device and store it into a variant. More...
 
void ReadName (InputStreamDevice &device, PdfVariant &variant)
 Read a name from the input device and store it into a variant. More...
 
PdfLiteralDataType DetermineDataType (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant)
 Determine the possible datatype of a token. More...
 

Detailed Description

A simple tokenizer for PDF files and PDF content streams.

Member Function Documentation

◆ DetermineDataType()

PdfTokenizer::PdfLiteralDataType PdfTokenizer::DetermineDataType ( InputStreamDevice device,
const std::string_view &  token,
PdfTokenType  tokenType,
PdfVariant variant 
)
protected

Determine the possible datatype of a token.

Numbers, reals, bools or nullptr values are parsed directly by this function and saved to a variant.

Returns
the expected datatype

◆ EnqueueToken()

void PdfTokenizer::EnqueueToken ( const std::string_view &  token,
PdfTokenType  type 
)
protected

Add a token to the queue of tokens.

tryReadNextToken() will return all enqueued tokens first before reading new tokens from the input device.

Parameters
tokenstring of the token
typetype of the token
See also
tryReadNextToken

◆ IsWhitespace()

bool PdfTokenizer::IsWhitespace ( char  ch)
static

Returns true if the given character is a whitespace according to the pdf reference.

Returns
true if it is a whitespace character otherwise false

◆ ReadArray()

void PdfTokenizer::ReadArray ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt 
)
protected

Read an array from the input device and store it into a variant.

Parameters
variantstore the array into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadDictionary()

void PdfTokenizer::ReadDictionary ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt 
)
protected

Read a dictionary from the input device and store it into a variant.

Parameters
variantstore the dictionary into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadHexString()

void PdfTokenizer::ReadHexString ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt 
)
protected

Read a hex string from the input device and store it into a variant.

Parameters
variantstore the hex string into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadName()

void PdfTokenizer::ReadName ( InputStreamDevice device,
PdfVariant variant 
)
protected

Read a name from the input device and store it into a variant.

Throws UnexpectedEOF if there is nothing to read.

Parameters
variantstore the name into this variable

◆ ReadNextNumber()

int64_t PdfTokenizer::ReadNextNumber ( InputStreamDevice device)

Read the next number from the current file position ignoring all comments.

Raises NoNumber exception if the next token is no number, and UnexpectedEOF if no token could be read. No token is consumed if NoNumber is thrown.

Returns
a number read from the input device.

◆ ReadNextVariant() [1/2]

void PoDoFo::PdfTokenizer::ReadNextVariant ( InputStreamDevice device,
const std::string_view &  token,
PdfTokenType  tokenType,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt 
)
protected

Read the next variant from the current file position ignoring all comments.

Raises an exception if there is no variant left in the file.

Parameters
tokena token that has already been read
typetype of the passed token
variantwrite the read variant to this value
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadNextVariant() [2/2]

void PdfTokenizer::ReadNextVariant ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt = { } 
)

Read the next variant from the current file position ignoring all comments.

Raises an UnexpectedEOF exception if there is no variant left in the file.

Parameters
variantwrite the read variant to this value
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadString()

void PdfTokenizer::ReadString ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt *  encrypt 
)
protected

Read a string from the input device and store it into a variant.

Parameters
variantstore the string into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ TryPeekNextToken()

bool PoDoFo::PdfTokenizer::TryPeekNextToken ( InputStreamDevice device,
std::string_view &  token 
)

Try peek the next token from the current file position ignoring all comments, without actually consuming it.

Returns
false if EOF

◆ TryReadNextToken()

bool PoDoFo::PdfTokenizer::TryReadNextToken ( InputStreamDevice device,
std::string_view &  token 
)

Reads the next token from the current file position ignoring all comments.

Parameters
[out]tokenOn true return, set to a pointer to the read token (a nullptr-terminated C string). The pointer is to memory owned by PdfTokenizer and must NOT be freed. The contents are invalidated on the next call to tryReadNextToken(..) and by the destruction of the PdfTokenizer. Undefined on false return.
[out]tokenTypeOn true return, if not nullptr the type of the read token will be stored into this parameter. Undefined on false return.
Returns
True if a token was read, false if there are no more tokens to read.

The documentation for this class was generated from the following files: