A simple tokenizer for PDF files and PDF content streams.
More...
#include <PdfTokenizer.h>
|
| PdfTokenizer (const PdfTokenizerOptions &options={ }) |
|
| PdfTokenizer (const std::shared_ptr< charbuff > &buffer, const PdfTokenizerOptions &options={ }) |
|
bool | TryReadNextToken (InputStreamDevice &device, std::string_view &token) |
| Reads the next token from the current file position ignoring all comments. More...
|
|
bool | TryReadNextToken (InputStreamDevice &device, std::string_view &token, PdfTokenType &tokenType) |
|
bool | TryPeekNextToken (InputStreamDevice &device, std::string_view &token) |
| Try peek the next token from the current file position ignoring all comments, without actually consuming it. More...
|
|
bool | TryPeekNextToken (InputStreamDevice &device, std::string_view &token, PdfTokenType &tokenType) |
|
int64_t | ReadNextNumber (InputStreamDevice &device) |
| Read the next number from the current file position ignoring all comments. More...
|
|
bool | TryReadNextNumber (InputStreamDevice &device, int64_t &value) |
|
void | ReadNextVariant (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt={ }) |
| Read the next variant from the current file position ignoring all comments. More...
|
|
bool | TryReadNextVariant (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt={ }) |
|
|
static bool | IsWhitespace (char ch) |
| Returns true if the given character is a whitespace according to the pdf reference. More...
|
|
static bool | IsDelimiter (char ch) |
| Returns true if the given character is a delimiter according to the pdf reference.
|
|
static bool | IsTokenDelimiter (char ch, PdfTokenType &tokenType) |
| Returns true if the given character is a token delimiter.
|
|
static bool | IsRegular (char ch) |
| True if the passed character is a regular character according to the PDF reference (Section 3.1.1, Character Set); ie it is neither a white-space nor a delimiter character.
|
|
static bool | IsPrintable (char ch) |
| True if the passed character is within the generally accepted "printable" ASCII range.
|
|
|
static constexpr unsigned | BufferSize = 4096 |
|
|
enum class | PdfLiteralDataType {
Unknown = 0
, Bool
, Number
, Real
,
String
, HexString
, Name
, Array
,
Dictionary
, Null
, Reference
} |
|
|
void | ReadNextVariant (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
| Read the next variant from the current file position ignoring all comments. More...
|
|
bool | TryReadNextVariant (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
|
void | EnqueueToken (const std::string_view &token, PdfTokenType type) |
| Add a token to the queue of tokens. More...
|
|
void | ReadDictionary (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
| Read a dictionary from the input device and store it into a variant. More...
|
|
void | ReadArray (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
| Read an array from the input device and store it into a variant. More...
|
|
void | ReadString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
| Read a string from the input device and store it into a variant. More...
|
|
void | ReadHexString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt *encrypt) |
| Read a hex string from the input device and store it into a variant. More...
|
|
void | ReadName (InputStreamDevice &device, PdfVariant &variant) |
| Read a name from the input device and store it into a variant. More...
|
|
PdfLiteralDataType | DetermineDataType (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant) |
| Determine the possible datatype of a token. More...
|
|
A simple tokenizer for PDF files and PDF content streams.
◆ DetermineDataType()
PdfTokenizer::PdfLiteralDataType PdfTokenizer::DetermineDataType |
( |
InputStreamDevice & |
device, |
|
|
const std::string_view & |
token, |
|
|
PdfTokenType |
tokenType, |
|
|
PdfVariant & |
variant |
|
) |
| |
|
protected |
Determine the possible datatype of a token.
Numbers, reals, bools or nullptr values are parsed directly by this function and saved to a variant.
- Returns
- the expected datatype
◆ EnqueueToken()
void PdfTokenizer::EnqueueToken |
( |
const std::string_view & |
token, |
|
|
PdfTokenType |
type |
|
) |
| |
|
protected |
Add a token to the queue of tokens.
tryReadNextToken() will return all enqueued tokens first before reading new tokens from the input device.
- Parameters
-
token | string of the token |
type | type of the token |
- See also
- tryReadNextToken
◆ IsWhitespace()
bool PdfTokenizer::IsWhitespace |
( |
char |
ch | ) |
|
|
static |
Returns true if the given character is a whitespace according to the pdf reference.
- Returns
- true if it is a whitespace character otherwise false
◆ ReadArray()
Read an array from the input device and store it into a variant.
- Parameters
-
variant | store the array into this variable |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ ReadDictionary()
Read a dictionary from the input device and store it into a variant.
- Parameters
-
variant | store the dictionary into this variable |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ ReadHexString()
Read a hex string from the input device and store it into a variant.
- Parameters
-
variant | store the hex string into this variable |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ ReadName()
Read a name from the input device and store it into a variant.
Throws UnexpectedEOF if there is nothing to read.
- Parameters
-
variant | store the name into this variable |
◆ ReadNextNumber()
Read the next number from the current file position ignoring all comments.
Raises NoNumber exception if the next token is no number, and UnexpectedEOF if no token could be read. No token is consumed if NoNumber is thrown.
- Returns
- a number read from the input device.
◆ ReadNextVariant() [1/2]
void PoDoFo::PdfTokenizer::ReadNextVariant |
( |
InputStreamDevice & |
device, |
|
|
const std::string_view & |
token, |
|
|
PdfTokenType |
tokenType, |
|
|
PdfVariant & |
variant, |
|
|
const PdfStatefulEncrypt * |
encrypt |
|
) |
| |
|
protected |
Read the next variant from the current file position ignoring all comments.
Raises an exception if there is no variant left in the file.
- Parameters
-
token | a token that has already been read |
type | type of the passed token |
variant | write the read variant to this value |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ ReadNextVariant() [2/2]
Read the next variant from the current file position ignoring all comments.
Raises an UnexpectedEOF exception if there is no variant left in the file.
- Parameters
-
variant | write the read variant to this value |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ ReadString()
Read a string from the input device and store it into a variant.
- Parameters
-
variant | store the string into this variable |
encrypt | an encryption object which is used to decrypt strings during parsing |
◆ TryPeekNextToken()
bool PoDoFo::PdfTokenizer::TryPeekNextToken |
( |
InputStreamDevice & |
device, |
|
|
std::string_view & |
token |
|
) |
| |
Try peek the next token from the current file position ignoring all comments, without actually consuming it.
- Returns
- false if EOF
◆ TryReadNextToken()
bool PoDoFo::PdfTokenizer::TryReadNextToken |
( |
InputStreamDevice & |
device, |
|
|
std::string_view & |
token |
|
) |
| |
Reads the next token from the current file position ignoring all comments.
- Parameters
-
[out] | token | On true return, set to a pointer to the read token (a nullptr-terminated C string). The pointer is to memory owned by PdfTokenizer and must NOT be freed. The contents are invalidated on the next call to tryReadNextToken(..) and by the destruction of the PdfTokenizer. Undefined on false return. |
[out] | tokenType | On true return, if not nullptr the type of the read token will be stored into this parameter. Undefined on false return. |
- Returns
- True if a token was read, false if there are no more tokens to read.
The documentation for this class was generated from the following files: