NOTE: This page contains deprecated information.

Semantic Verifications for Lexer Specifications

Here are the semantic verifications that must be done:

  1. Detect redefinitions within a single names space. The name spaces are:
    1. Helpers, groups, and tokens.
    2. Selectors and investigators.
    3. States.
  2. Detect the use of an undefined identifier (within the proper name space).
  3. Detect the improper use of an identifier (e.g. the use of a group name in a place where the name of a token or helper is expected).
  4. Detect recursive token and helper definitions.
  5. Detect attempts to use incomplete tokens as helpers.
  6. Detect inconsistent token collections. A token collection is inconsistent if any of the following is detected:
    1. The same token or group appears more than once in the same identifier list (identifiers or exluded identifiers)
    2. A token and its group both appear in the same identifier list.
    3. A group appears in the excluded identifiers of a subset token collection.
    4. The same token appears in both lists of a subset token collection.
    5. A token appears in the excluded identifiers, but its group does not appear in the identifiers of a subset token collection.
    6. All the tokens of a group appear in the excluded identifiers of a subset token collection.
    7. The token collection is empty.
  7. Detect inconsistent priorities. A token priority is inconsistent if any of the following is detected:
    1. It involves tokens that are not matched in the current state.
    2. It is transitively cyclic.
    3. Its high-priority or low-priority token collection includes at least one, but not all of the tokens in a token selection.
  8. Detect invalid ignored token collections. An ignored token collection is invalid if it contains tokens that are not matched in the current lexer state.
  9. Detect invalid transitions. A transition is invalid if any of the following is detected:
    1. The destination state is not an internal state.
    2. The transition token collection contains tokens that are not matched in the source state.
    3. The source state of a back transition is a normal state.
    4. The token collection of a transition from a normal source state includes a complete token.
    5. The token collection of a transition from an internal source state includes an incomplete token.
  10. Detect that either all or none of the tokens, in a token selection, are matched in a lexer state.
  11. Detect invalid internal states:
    1. An internal state is invalid if it does not transitively lead to a back transition. In other words, there does not exist a chain:
      S->T, T->U, U->V, ..., Z->Back
      
      of transitions for an internal state S.
    2. An internal state is also invalid if it matches an incomplete token.
  12. Detect useless constructs:
    1. Useless helpers. A helper is useless if it is not transitively used by a token.
    2. Useless tokens. A token can be useless in two situations:
      1. The token is defined, but it is never used by any state.
      2. The token is declared to be matched in a state (or the implicit default state), yet it is not effectively matched by the lexer DFA of that state.
    3. (We do not detect useless groups, as we think that groups, even if unused, provide useful information to users).
    4. Useless priorities. A priority is useless if it is not effectively used to resolve a multiple match.
    5. Useless states. A state is useless if it is:
      1. An internal state that cannot be transitively reached from a normal state.
      2. A normal state that is not referenced by the parser.
  13. Detect an interval with a lower bound greater than its upper bound.
  14. Detect an exponent range with a lower bound greater than its upper bound.
  15. Detect out of bound exponent numbers. Numbers, in an exponent, are limited to non-negative int values in Java. Precisely: 0..2147483647.
  16. Detect invalid lookahead usage:
    1. Lookahead Infinite was specified, yet the lexer does not require infinite lookahead.
    2. Lookahead None was specified, yet the lexer does require lookahead.

Implementation Notes

  1. Recursive token and helper definitions can be detected using a depth first search (DFS).
  2. See TokenPriorityDesign for notes on detecting inconsistent token priorities.
  3. Invalid internal states can be detected by doing a DFS on the reverse transition graph, starting with the Back node. Any unreached internal state is invalid.
  4. Detecting useless priorities is tricky. Yet, it can be done easily by recording the usefulness of all priorities on the back path of a search that leads to the discovery of a token involved in a multiple match. An involved token is discovered once in every (valid) multiple match DFS.