Snap! Websites
An Open Source CMS System in C++
#include <lexer.h>
flags
. More...The constructor of the Lexer expect a valid pointer of an Input stream.
It optionally accepts an Options pointer. If the pointer is null, then all the options are assumed to be set to zero (0). So all extensions are turned off.
This function determines the type of a character.
The function first uses a switch for most of the characters used in JavaScript are ASCII characters and thus are well defined and can have their type defined in a snap.
Unicode characters make use of a table to convert the character in a type. Unicode character are either viewed as letters (CHAR_LETTER) or as punctuation (CHAR_PUNCTUATION).
The exceptions are the characters viewed as either line terminators or white space characters. Those are captured by the switch.
Definition at line 883 of file lexer.cpp.
References CHAR_DIGIT, CHAR_HEXDIGIT, CHAR_INVALID, CHAR_LETTER, CHAR_LINE_TERMINATOR, CHAR_PUNCTUATION, CHAR_WHITE_SPACE, as2js::anonymous_namespace{lexer.cpp}::identifier_characters_t::f_min, as2js::anonymous_namespace{lexer.cpp}::g_identifier_characters_size, and as2js::String::STRING_CONTINUATION.
Referenced by getc(), read_identifier(), and read_number().
This function reads the next few characters transforming them in one escape sequence character.
Some characters are extensions and require the extended escape sequences to be turned on in order to be accepted. These are marked as an extension in the list below.
The function supports:
Any other character generates an error message if appearing after a backslash ().
Definition at line 1187 of file lexer.cpp.
References as2js::AS_ERR_UNKNOWN_ESCAPE_SEQUENCE, f_input, getc(), has_option_set(), as2js::MESSAGE_LEVEL_ERROR, as2js::Options::OPTION_EXTENDED_ESCAPE_SEQUENCES, read_hex(), read_octal(), as2js::String::STRING_CONTINUATION, and ungetc().
Referenced by read_identifier(), and read_string().
This helper function creates a new node at the current position. This is useful internally and in the parser when creating nodes to build the input tree and in order for the new node to get the correct position according to the current lexer position.
Definition at line 2226 of file lexer.cpp.
References f_position.
This function reads one token from the input stream and transform it in a Node. The Node is automatically assigned the position after the token was read.
Definition at line 2244 of file lexer.cpp.
References as2js::AS_ERR_INVALID_NUMBER, CHAR_LETTER, f_char_type, f_input, f_position, f_result_float64, f_result_int64, f_result_string, f_result_type, get_token(), as2js::MESSAGE_LEVEL_ERROR, as2js::Node::NODE_FLOAT64, as2js::Node::NODE_IDENTIFIER, as2js::Node::NODE_INT64, as2js::Node::NODE_REGULAR_EXPRESSION, and as2js::Node::NODE_STRING.
This function reads one token from the input stream. It reads one character and determine the type of token (identifier, string, number, etc.) and then reads the whole token.
The main purpose of the function is to read characters from the stream and determine what token it represents. It uses many sub-functions to read more complex tokens such as identifiers and numbers.
If the end of the input stream is reached, the function returns with a NODE_EOF. The function can be called any number of times after the end of the input is reached.
Only useful tokens are returned. Comments and white spaces (space, tab, new line, line feed, etc.) are all skipped silently.
The function detects invalid characters which are ignored although the function will first emit an error.
This is the function that handles the case of a regular expression written between slashes (/.../). One can also use the backward quotes (...
) for regular expression to avoid potential confusions with the divide character.
Definition at line 2327 of file lexer.cpp.
References as2js::AS_ERR_NOT_ALLOWED, as2js::AS_ERR_UNEXPECTED_PUNCTUATION, CHAR_DIGIT, CHAR_INVALID, CHAR_LETTER, CHAR_LINE_TERMINATOR, CHAR_WHITE_SPACE, f_char_type, f_input, f_options, f_position, f_result_float64, f_result_string, f_result_type, getc(), has_option_set(), as2js::MESSAGE_LEVEL_ERROR, as2js::Node::NODE_ADD, as2js::Node::NODE_ASSIGNMENT, as2js::Node::NODE_ASSIGNMENT_ADD, as2js::Node::NODE_ASSIGNMENT_BITWISE_AND, as2js::Node::NODE_ASSIGNMENT_BITWISE_OR, as2js::Node::NODE_ASSIGNMENT_BITWISE_XOR, as2js::Node::NODE_ASSIGNMENT_DIVIDE, as2js::Node::NODE_ASSIGNMENT_LOGICAL_AND, as2js::Node::NODE_ASSIGNMENT_LOGICAL_OR, as2js::Node::NODE_ASSIGNMENT_LOGICAL_XOR, as2js::Node::NODE_ASSIGNMENT_MAXIMUM, as2js::Node::NODE_ASSIGNMENT_MINIMUM, as2js::Node::NODE_ASSIGNMENT_MODULO, as2js::Node::NODE_ASSIGNMENT_MULTIPLY, as2js::Node::NODE_ASSIGNMENT_POWER, as2js::Node::NODE_ASSIGNMENT_ROTATE_LEFT, as2js::Node::NODE_ASSIGNMENT_ROTATE_RIGHT, as2js::Node::NODE_ASSIGNMENT_SHIFT_LEFT, as2js::Node::NODE_ASSIGNMENT_SHIFT_RIGHT, as2js::Node::NODE_ASSIGNMENT_SHIFT_RIGHT_UNSIGNED, as2js::Node::NODE_ASSIGNMENT_SUBTRACT, as2js::Node::NODE_BITWISE_AND, as2js::Node::NODE_BITWISE_NOT, as2js::Node::NODE_BITWISE_OR, as2js::Node::NODE_BITWISE_XOR, as2js::Node::NODE_CLOSE_CURVLY_BRACKET, as2js::Node::NODE_CLOSE_PARENTHESIS, as2js::Node::NODE_CLOSE_SQUARE_BRACKET, as2js::Node::NODE_COLON, as2js::Node::NODE_COMMA, as2js::Node::NODE_COMPARE, as2js::Node::NODE_CONDITIONAL, as2js::Node::NODE_DECREMENT, as2js::Node::NODE_DIVIDE, as2js::Node::NODE_EOF, as2js::Node::NODE_EQUAL, as2js::Node::NODE_FLOAT64, as2js::Node::NODE_GREATER, as2js::Node::NODE_GREATER_EQUAL, as2js::Node::NODE_INCREMENT, as2js::Node::NODE_LESS, as2js::Node::NODE_LESS_EQUAL, as2js::Node::NODE_LOGICAL_AND, as2js::Node::NODE_LOGICAL_NOT, as2js::Node::NODE_LOGICAL_OR, as2js::Node::NODE_LOGICAL_XOR, as2js::Node::NODE_MATCH, as2js::Node::NODE_MAXIMUM, as2js::Node::NODE_MEMBER, as2js::Node::NODE_MINIMUM, as2js::Node::NODE_MODULO, as2js::Node::NODE_MULTIPLY, as2js::Node::NODE_NOT_EQUAL, as2js::Node::NODE_NOT_MATCH, as2js::Node::NODE_OPEN_CURVLY_BRACKET, as2js::Node::NODE_OPEN_PARENTHESIS, as2js::Node::NODE_OPEN_SQUARE_BRACKET, as2js::Node::NODE_POWER, as2js::Node::NODE_RANGE, as2js::Node::NODE_REGULAR_EXPRESSION, as2js::Node::NODE_REST, as2js::Node::NODE_ROTATE_LEFT, as2js::Node::NODE_ROTATE_RIGHT, as2js::Node::NODE_SCOPE, as2js::Node::NODE_SEMICOLON, as2js::Node::NODE_SHIFT_LEFT, as2js::Node::NODE_SHIFT_RIGHT, as2js::Node::NODE_SHIFT_RIGHT_UNSIGNED, as2js::Node::NODE_SMART_MATCH, as2js::Node::NODE_STRICTLY_EQUAL, as2js::Node::NODE_STRICTLY_NOT_EQUAL, as2js::Node::NODE_SUBTRACT, as2js::Node::NODE_UNKNOWN, as2js::Options::OPTION_EXTENDED_OPERATORS, read(), read_identifier(), read_number(), read_string(), as2js::Float64::set_infinity(), as2js::Float64::set_NaN(), and ungetc().
Referenced by get_next_token().
This function reads one character of input and returns it.
If the character is a newline, linefeed, etc. it affects the current line number, page number, etc. as required. The following characters have such an effect:
If the ungetc() function was called before a call to getc(), then that last character is returned instead of a new character from the input stream. In that case, the character has no effect on the line number, page number, etc.
Definition at line 753 of file lexer.cpp.
References CHAR_LINE_TERMINATOR, char_type(), CHAR_WHITE_SPACE, f_char_type, f_input, f_unget, and ungetc().
Referenced by escape_sequence(), get_token(), read(), read_binary(), read_hex(), read_identifier(), read_number(), read_octal(), and read_string().
Because the lexer checks options in many places, it makes use of this helper function to simplify the many tests in the rest of the code.
This function checks whether the specified option is set. If so, then it returns true, otherwise it returns false.
Definition at line 2964 of file lexer.cpp.
References f_options.
Referenced by escape_sequence(), get_token(), and read_number().
This function reads all the characters as long as their type match the specified flags. The result is saved in the str
parameter.
At the time the function is called, c
is expected to be the first character to be added to str
.
The first character that does not satisfy the flags is pushed back in the input stream so one can call getc() again to retrieve it.
str
.
[in]flagsThe flags that must match each character, including c
character type.
[in,out]strThe resulting string. It is expected to be empty on call but does not need to (it does not get cleared.)
Definition at line 1309 of file lexer.cpp.
References CHAR_INVALID, f_char_type, getc(), and ungetc().
Referenced by get_token(), and read_number().
This function reads 0's and 1's up until another character is found or max
digits were read. That other character is ungotten so the next call to getc() will return that non-binary character.
Since the function is called without an introducing digit, the number could end up being empty. If that happens, an error is generated and the function returns -1 (although -1 is a valid number assuming you accept all 64 bits.)
Definition at line 1098 of file lexer.cpp.
References as2js::AS_ERR_INVALID_NUMBER, f_input, getc(), as2js::MESSAGE_LEVEL_ERROR, and ungetc().
Referenced by read_number().
This function reads 0's and 1's up until another character is found or max
digits were read. That other character is ungotten so the next call to getc() will return that non-binary character.
Since the function is called without an introducing digit, the number could end up being empty. If that happens, an error is generated and the function returns -1 (although -1 is a valid number assuming you accept all 64 bits.)
Definition at line 1044 of file lexer.cpp.
References as2js::AS_ERR_INVALID_NUMBER, CHAR_HEXDIGIT, f_char_type, f_input, getc(), as2js::MESSAGE_LEVEL_ERROR, and ungetc().
Referenced by escape_sequence(), and read_number().
This function reads an identifier and checks whether that identifier is a keyword.
The list of reserved keywords has defined in ECMAScript is defined below. Note that includes all versions (1 through 5) and we mark all of these identifiers as keywords and we are NOT flexible at all with those. (i.e. JavaScript allows for keywords to be used as object field names as in 'myObj.break = 123;' and we do not.)
The function sets the f_result_type and f_result_string as required.
We also understand additional keywords as defined here:
We also support the special names:
Definition at line 1423 of file lexer.cpp.
References CHAR_DIGIT, CHAR_INVALID, CHAR_LETTER, char_type(), escape_sequence(), f_char_type, f_input, f_result_float64, f_result_int64, f_result_string, f_result_type, getc(), as2js::Node::NODE_ABSTRACT, as2js::Node::NODE_AS, as2js::Node::NODE_BOOLEAN, as2js::Node::NODE_BREAK, as2js::Node::NODE_BYTE, as2js::Node::NODE_CASE, as2js::Node::NODE_CATCH, as2js::Node::NODE_CHAR, as2js::Node::NODE_CLASS, as2js::Node::NODE_CONST, as2js::Node::NODE_CONTINUE, as2js::Node::NODE_DEBUGGER, as2js::Node::NODE_DEFAULT, as2js::Node::NODE_DELETE, as2js::Node::NODE_DO, as2js::Node::NODE_DOUBLE, as2js::Node::NODE_ELSE, as2js::Node::NODE_ENSURE, as2js::Node::NODE_ENUM, as2js::Node::NODE_EXPORT, as2js::Node::NODE_EXTENDS, as2js::Node::NODE_FALSE, as2js::Node::NODE_FINAL, as2js::Node::NODE_FINALLY, as2js::Node::NODE_FLOAT, as2js::Node::NODE_FLOAT64, as2js::Node::NODE_FOR, as2js::Node::NODE_FUNCTION, as2js::Node::NODE_GOTO, as2js::Node::NODE_IDENTIFIER, as2js::Node::NODE_IF, as2js::Node::NODE_IMPLEMENTS, as2js::Node::NODE_IMPORT, as2js::Node::NODE_IN, as2js::Node::NODE_INLINE, as2js::Node::NODE_INSTANCEOF, as2js::Node::NODE_INT64, as2js::Node::NODE_INTERFACE, as2js::Node::NODE_INVARIANT, as2js::Node::NODE_IS, as2js::Node::NODE_LONG, as2js::Node::NODE_NAMESPACE, as2js::Node::NODE_NATIVE, as2js::Node::NODE_NEW, as2js::Node::NODE_NULL, as2js::Node::NODE_PACKAGE, as2js::Node::NODE_PRIVATE, as2js::Node::NODE_PROTECTED, as2js::Node::NODE_PUBLIC, as2js::Node::NODE_REQUIRE, as2js::Node::NODE_RETURN, as2js::Node::NODE_SHORT, as2js::Node::NODE_STATIC, as2js::Node::NODE_STRING, as2js::Node::NODE_SUPER, as2js::Node::NODE_SWITCH, as2js::Node::NODE_SYNCHRONIZED, as2js::Node::NODE_THEN, as2js::Node::NODE_THIS, as2js::Node::NODE_THROW, as2js::Node::NODE_THROWS, as2js::Node::NODE_TRANSIENT, as2js::Node::NODE_TRUE, as2js::Node::NODE_TRY, as2js::Node::NODE_TYPEOF, as2js::Node::NODE_UNDEFINED, as2js::Node::NODE_UNKNOWN, as2js::Node::NODE_USE, as2js::Node::NODE_VAR, as2js::Node::NODE_VOID, as2js::Node::NODE_VOLATILE, as2js::Node::NODE_WHILE, as2js::Node::NODE_WITH, as2js::Node::NODE_YIELD, as2js::Float64::set_infinity(), as2js::Float64::set_NaN(), and ungetc().
Referenced by get_token().
This function is called whenever a digit is found in the input stream. It may also be called if a period was read (the rules are a little more complicated for the period.)
The function checks the following character, if it is:
The result is directly saved in the necessary f_result_... variables.
Definition at line 1987 of file lexer.cpp.
References CHAR_DIGIT, char_type(), f_char_type, f_result_float64, f_result_int64, f_result_type, getc(), has_option_set(), as2js::Node::NODE_FLOAT64, as2js::Node::NODE_INT64, as2js::Options::OPTION_BINARY, as2js::Options::OPTION_OCTAL, read(), read_binary(), read_hex(), read_octal(), as2js::String::to_float64(), as2js::String::to_utf8(), and ungetc().
Referenced by get_token().
This function reads octal digits up until a character other than a valid octal digit or max
digits were read. That character is ungotten so the next call to getc() will return that non-octal character.
Definition at line 1135 of file lexer.cpp.
References getc(), and ungetc().
Referenced by escape_sequence(), and read_number().
This function reads one string from the input stream.
The function expects quote
as an input parameter representing the opening quote. It will read the input stream up to the next line terminator (unless escaped) or the closing quote.
Note that we support backslash quoted "strings" which actually represent regular expressions. These cannot be continuated on the following line.
This function sets the result type to NODE_STRING. It is changed by the caller when a regular expression was found instead.
Definition at line 2181 of file lexer.cpp.
References as2js::AS_ERR_UNTERMINATED_STRING, CHAR_LINE_TERMINATOR, escape_sequence(), f_char_type, f_input, f_result_string, f_result_type, getc(), as2js::MESSAGE_LEVEL_ERROR, as2js::Node::NODE_STRING, and as2js::String::STRING_CONTINUATION.
Referenced by get_token().
Whenever reading a token, it is most often that the end of the token is discovered by reading one too many character. This function is used to push that character back in the input stream.
Also the stream implementation also includes an unget, we do not use that unget. The reason is that the getc() function needs to know whether the character is a brand new character from that input stream or the last ungotten character. The difference is important to know whether the character has to have an effect on the line number, page number, etc.
The getc() function first returns the last character sent via ungetc() (i.e. LIFO).
Definition at line 843 of file lexer.cpp.
References f_unget.
Referenced by escape_sequence(), get_token(), getc(), read(), read_binary(), read_hex(), read_identifier(), read_number(), and read_octal().
Definition at line 63 of file lexer.h.
Referenced by char_type(), get_token(), read_identifier(), and read_number().
Definition at line 67 of file lexer.h.
Referenced by char_type(), and read_hex().
Definition at line 68 of file lexer.h.
Referenced by char_type(), get_token(), read(), and read_identifier().
Definition at line 62 of file lexer.h.
Referenced by char_type(), get_next_token(), get_token(), and read_identifier().
Definition at line 66 of file lexer.h.
Referenced by char_type(), get_token(), getc(), and read_string().
Definition at line 64 of file lexer.h.
Referenced by char_type().
Definition at line 65 of file lexer.h.
Referenced by char_type(), get_token(), and getc().
Definition at line 87 of file lexer.h.
Referenced by get_next_token(), get_token(), getc(), read(), read_hex(), read_identifier(), read_number(), and read_string().
Definition at line 85 of file lexer.h.
Referenced by escape_sequence(), get_input(), get_next_token(), get_token(), getc(), Lexer(), read_binary(), read_hex(), read_identifier(), and read_string().
Definition at line 86 of file lexer.h.
Referenced by get_token(), has_option_set(), and Lexer().
Definition at line 88 of file lexer.h.
Referenced by get_new_node(), get_next_token(), and get_token().
Definition at line 93 of file lexer.h.
Referenced by get_next_token(), get_token(), read_identifier(), and read_number().
Definition at line 92 of file lexer.h.
Referenced by get_next_token(), read_identifier(), and read_number().
Definition at line 91 of file lexer.h.
Referenced by get_next_token(), get_token(), read_identifier(), and read_string().
Definition at line 90 of file lexer.h.
Referenced by get_next_token(), get_token(), read_identifier(), read_number(), and read_string().
This document is part of the Snap! Websites Project.
Copyright by Made to Order Software Corp.
Snap! Websites
An Open Source CMS System in C++