NOTE: This page contains deprecated information.

Design of Token Priorities

Token priorities are used by SableCC to select a token when many tokens match a string (in a given lexer state).

Design Alternatives

There were various choices for interpreting token priorities:

  1. Are explicit priorities required between all tokens involved in a multiple match?
  2. Are transitive priorities taken into account? In other words, does (a Over b) and (b Over c) imply (a Over c)?
  3. What is considered as conflicting priorities?
  4. In what context is consistency between priorities verified? Globally, or only between priorities involved in a multiple match?
  5. etc.

Design Decisions

After giving it some thought, it has been decided to take the approach that seems most intuitive:

  1. Transitive priorities are taken into account.
  2. Priorities must be globally consistent; cyclic priorities are prohibited.
  3. On multiple match, one token must have (transitive) priority over all other tokens involved. No priorities are required between these other tokens.

Implementation

Note that priorities are lexer-state specific. So, the data structure, below, is constructed for every lexer state.

Priority Graph Construction

First, a directed graph, called priority graph, is constructed as follows:

  1. One node is added to the graph for every token.
  2. For every token collection C on the left side of a token priority (e.g. C in C Over D), a node N is added, and a directed edge is added from the node of every token included in C to the new node N.
  3. For every token collection D on the right side of a token priority (e.g. D in C Over D), a node N is added, and a directed edge is added from the new node N to the the node of every token included in C.
  4. For every token priority C Over D, a directed edge is added from the node of C to the node of D.

Consistency Verification

Globally consistent priorities have no cycles. Cyclic priorities result in a cyclic priority graph. Therefore, to verify consistency, it is sufficient to look for cycles in the priority graph. This is best done using a depth-first search (DFS).

In prevision of future uses of the priority graph, a record is kept of the finish time, the time (e.g. a sequential number) at which the visit of a node finished during the DFS. This is based on the following observation: if node A has a lower transitive priority than node B, it will necessarily have a smaller finish time.

Multiple Match Resolution

Sometimes a string is matched by more than one token, in other words, the lexer DFA has an accept state which corresponds to many tokens. When this happens, SableCC must find which token, if any, has priority over all the other tokens.

This is done as follows:

  1. The token with the biggest finish time is selected as a candidate token.
  2. A DFS is started from the node of the candidate token.
  3. The candidate is chosen as the matching token if the DFS reached all other tokens involved in the multiple match.
  4. If a token involved in the multiple match is not reached by the DFS, a priority error is reported.

Thus, the cost, per multiple match, is mostly a single partial DFS search of the priority graph.

Justification

It would be tempting to pre-compute all pairwise priorities between tokens, in the hope of reducing the cost of multiple match resolution. But, the storage cost would be quadratic in the number of tokens. As we anticipate that priority graphs will usually be quite small, our solution above will be cheaper.