NOTE: This page contains deprecated information.
Semantic Verifications for Lexer Specifications
Here are the semantic verifications that must be done:
- Detect redefinitions within a single names space. The name spaces are:
- Helpers, groups, and tokens.
- Selectors and investigators.
- States.
- Detect the use of an undefined identifier (within the proper name space).
- Detect the improper use of an identifier (e.g. the use of a group name in a place where the name of a token or helper is expected).
- Detect recursive token and helper definitions.
- Detect attempts to use incomplete tokens as helpers.
- Detect inconsistent token collections. A token collection is inconsistent if any of the following is detected:
- The same token or group appears more than once in the same identifier list (identifiers or exluded identifiers)
- A token and its group both appear in the same identifier list.
- A group appears in the excluded identifiers of a subset token collection.
- The same token appears in both lists of a subset token collection.
- A token appears in the excluded identifiers, but its group does not appear in the identifiers of a subset token collection.
- All the tokens of a group appear in the excluded identifiers of a subset token collection.
- The token collection is empty.
- Detect inconsistent priorities. A token priority is inconsistent if any of the following is detected:
- It involves tokens that are not matched in the current state.
- It is transitively cyclic.
- Its high-priority or low-priority token collection includes at least one, but not all of the tokens in a token selection.
- Detect invalid ignored token collections. An ignored token collection is invalid if it contains tokens that are not matched in the current lexer state.
- Detect invalid transitions. A transition is invalid if any of the following is detected:
- The destination state is not an internal state.
- The transition token collection contains tokens that are not matched in the source state.
- The source state of a back transition is a normal state.
- The token collection of a transition from a normal source state includes a complete token.
- The token collection of a transition from an internal source state includes an incomplete token.
- Detect that either all or none of the tokens, in a token selection, are matched in a lexer state.
- Detect invalid internal states:
- An internal state is invalid if it does not transitively lead to a back transition. In other words, there does not exist a chain:
S->T, T->U, U->V, ..., Z->Back
of transitions for an internal state S. - An internal state is also invalid if it matches an incomplete token.
- An internal state is invalid if it does not transitively lead to a back transition. In other words, there does not exist a chain:
- Detect useless constructs:
- Useless helpers. A helper is useless if it is not transitively used by a token.
- Useless tokens. A token can be useless in two situations:
- The token is defined, but it is never used by any state.
- The token is declared to be matched in a state (or the implicit default state), yet it is not effectively matched by the lexer DFA of that state.
- (We do not detect useless groups, as we think that groups, even if unused, provide useful information to users).
- Useless priorities. A priority is useless if it is not effectively used to resolve a multiple match.
- Useless states. A state is useless if it is:
- An internal state that cannot be transitively reached from a normal state.
- A normal state that is not referenced by the parser.
- Detect an interval with a lower bound greater than its upper bound.
- Detect an exponent range with a lower bound greater than its upper bound.
- Detect out of bound exponent numbers. Numbers, in an exponent, are limited to non-negative int values in Java. Precisely: 0..2147483647.
- Detect invalid lookahead usage:
- Lookahead Infinite was specified, yet the lexer does not require infinite lookahead.
- Lookahead None was specified, yet the lexer does require lookahead.
Implementation Notes
- Recursive token and helper definitions can be detected using a depth first search (DFS).
- See TokenPriorityDesign for notes on detecting inconsistent token priorities.
- Invalid internal states can be detected by doing a DFS on the reverse transition graph, starting with the Back node. Any unreached internal state is invalid.
- Detecting useless priorities is tricky. Yet, it can be done easily by recording the usefulness of all priorities on the back path of a search that leads to the discovery of a token involved in a multiple match. An involved token is discovered once in every (valid) multiple match DFS.
