public final class NFATraceFinderGenerator extends Object
| Modifier and Type | Method and Description |
|---|---|
static NFA |
generateTraceFinder(NFA nfa)
Generates a NFA that can be used to generate a backward-searching DFA that can find the
result (capture group offsets) of a regex match found by a forward-searching DFA.
|
public static NFA generateTraceFinder(NFA nfa)
The idea behind this is the following: If a regular expression does not contain any loops (+ and *), its NFA will be a directed acyclic graph, which can be converted to a tree. If the search pattern is a tree, we can find all possible results (capture group offsets) ahead of time, by a simple tree traversal. Knowing all possible results in advance saves us the work of updating individual result indices during the search. The only problem is that we generally have to find the pattern in a string - the result will not necessarily start at the index we start searching from.
Doing an un-anchored search with a DFA requires that we add a loop in the beginning. For
example, in order to find /a(b|cd)/ in "____acd____", we have to prepend the
expression with [ -]*, and the resulting NFA can no longer be seen as a
tree. In order to still gain some performance from the ability to pre-calculate all results,
we can do the following:
Example:
regular expression: /a(b|c)d(e|fg)/
this expression has two possible results:
(in the form: [start CG 0, end CG 0, start CG 1, end CG 1,... ])
0: [0, 4, 1, 2, 3, 4]
1: [0, 5, 1, 2, 3, 5]
NFA in reverse tree form:
I
/ \
e g
| |
d f
/ | |
b c d
| | | \
a a b c
| |
a a
When searching for the correct result, we can immediately determine it based on the last character:
e -> result 0
g -> result 1
We also have to take care of the order in which results are to be found. For example, in the
expression /a(b)c|ab(c)/, we always have to return the result created from taking the
first branch, never the second. Therefore, we create the reverse tree-shaped NFA while
traversing the original NFA in priority order.nfa - the NFA used for the forward-searching DFA.