public final class PythonRegexLexer extends RegexLexer
RegexLexer.ClassSetOperator, RegexLexer.ParseGroupNameResult, RegexLexer.ParseGroupNameResultStatecompilationBuffer, DEFAULT_WHITESPACE, namedCaptureGroups, pattern, position, PREDEFINED_CHAR_CLASSES, source| Constructor and Description |
|---|
PythonRegexLexer(RegexSource source,
PythonREMode mode,
CompilationBuffer compilationBuffer) |
| Modifier and Type | Method and Description |
|---|---|
void |
addGlobalFlags(PythonFlags newGlobalFlags) |
protected long |
boundedQuantifierMaxValue()
The maximum value allowed while parsing bounded quantifiers.
|
protected ClassSetContents |
caseFoldClassSetAtom(ClassSetContents classSetContents)
Case folds an atom in a class set expression.
|
protected void |
caseFoldUnfold(CodePointSetAccumulator charClass)
Updates a character set by expanding it to the set of characters that case fold to the same
characters as the characters currently in the set.
|
protected void |
checkClassSetCharacter(int codePoint)
Checks whether
codepoint can appear as an unescaped literal class set character. |
protected CodePointSet |
complementClassSet(CodePointSet codePointSet)
Returns the complement of a class set element.
|
protected boolean |
featureEnabledAZPositionAssertions()
Returns
true if \A and \Z position assertions are supported. |
protected boolean |
featureEnabledBoundedQuantifierEmptyMin()
Returns
true if empty minimum values in bounded quantifiers (e.g. |
protected boolean |
featureEnabledCharClassFirstBracketIsLiteral()
Returns
true if the first character in a character class must be interpreted as part
of the character set, even if it is the closing bracket ']'. |
protected boolean |
featureEnabledClassSetExpressions()
Returns
true if class set expressions (e.g. |
protected boolean |
featureEnabledForwardReferences()
Returns
true if forward references are allowed. |
protected boolean |
featureEnabledGroupComments()
Returns
true if group comments (e.g. |
protected boolean |
featureEnabledIgnoreCase()
Returns
true if ignore-case mode is currently enabled. |
protected boolean |
featureEnabledIgnoreWhiteSpace()
Returns
true if white space in the pattern is ignored. |
protected boolean |
featureEnabledLineComments()
Returns
true if line comments (e.g. |
protected boolean |
featureEnabledNestedCharClasses()
Returns
true if nested character classes are supported. |
protected boolean |
featureEnabledOctalEscapes()
Returns
true if octal escapes (e.g. |
protected boolean |
featureEnabledPOSIXCharClasses()
Returns
true if POSIX character classes, character equivalence classes, and the POSIX
Collating Element Operator are supported. |
protected boolean |
featureEnabledPossessiveQuantifiers()
Returns
true if possessive quantifiers (+ suffix) are allowed. |
protected boolean |
featureEnabledSpecialGroups()
Returns
true if any constructs that alter a capture group's function, such as
non-capturing groups (?:) or look-around assertions (?=), are supported. |
protected boolean |
featureEnabledUnicodePropertyEscapes()
Returns
true if unicode property escapes (e.g. |
protected boolean |
featureEnabledWordBoundaries()
Returns
true if \w and \W word boundary position assertions are
supported. |
protected boolean |
featureEnabledZLowerCaseAssertion()
Returns
true if \z position assertion is supported. |
void |
fixFlags() |
protected CodePointSet |
getDotCodePointSet()
Returns the code point set represented by the dot operator.
|
PythonFlags |
getGlobalFlags() |
protected CodePointSet |
getIdContinue()
Returns the set of all codepoints a group identifier may continue with.
|
protected CodePointSet |
getIdStart()
Returns the set of all codepoints a group identifier may begin with.
|
PythonLocaleData |
getLocaleData() |
PythonFlags |
getLocalFlags() |
protected int |
getMaxBackReferenceDigits()
Returns the maximum number of digits to parse when parsing a back-reference.
|
protected CodePointSet |
getPOSIXCharClass(String name)
Returns the POSIX character class associated to the given name.
|
protected CodePointSet |
getPredefinedCharClass(char c)
Returns the CodePointSet associated with the given predefined character class (e.g.
|
protected TBitSet |
getWhitespace()
The set of codepoints to consider as whitespace in comments and "ignore white space" mode.
|
protected RegexSyntaxException |
handleBoundedQuantifierOutOfOrder()
Handle
{2,1}. |
protected Token |
handleBoundedQuantifierOverflow(long min,
long max)
Handle integer overflows in quantifier bounds, e.g.
|
protected Token |
handleBoundedQuantifierOverflowMin(long min,
long max)
Handle integer overflows in quantifier bounds, e.g.
|
protected Token |
handleBoundedQuantifierSyntaxError()
Handle syntax errors in bounded quantifiers (missing }, non-digit characters).
|
protected RegexSyntaxException |
handleCCRangeOutOfOrder(int rangeStart)
Handle out of order character class range elements, e.g.
|
protected void |
handleCCRangeWithPredefCharClass(int rangeStart,
ClassSetContents firstAtom,
ClassSetContents secondAtom)
Handle non-codepoint character class range elements, e.g.
|
protected RegexSyntaxException |
handleComplementOfStringSet()
Handle complement of class set expressions containing strings, e.g.
|
protected RegexSyntaxException |
handleEmptyGroupName()
Handle empty group name in group references.
|
protected void |
handleGroupRedefinition(String name,
int newId,
int oldId) |
protected void |
handleIncompleteEscapeX()
Handle incomplete hex escapes, e.g.
|
protected void |
handleInvalidBackReference(int reference)
Handle group references to non-existent groups.
|
protected void |
handleInvalidBackReference(String reference)
Handle group references to non-existent groups.
|
protected RegexSyntaxException |
handleInvalidCharInCharClass() |
protected RegexSyntaxException |
handleInvalidGroupBeginQ()
Handle groups starting with
(? and invalid next char. |
protected RegexSyntaxException |
handleMissingClassSetOperand(RegexLexer.ClassSetOperator operator)
Handle missing operands in class set expressions, e.g.
|
protected RegexSyntaxException |
handleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator,
RegexLexer.ClassSetOperator rightOperator)
Handle class set expressions with mixed set operators in the same nested set.
|
protected void |
handleOctalOutOfRange()
Handle octal values larger than 255.
|
protected RegexSyntaxException |
handleRangeAsClassSetOperand(RegexLexer.ClassSetOperator operator)
Handle character ranges as operands in class set expressions with operators other than union.
|
protected void |
handleUnfinishedEscape()
Handle unfinished escape (e.g.
|
protected void |
handleUnfinishedGroupComment()
Handle unfinished group comment
(#...). |
protected RegexSyntaxException |
handleUnfinishedGroupQ()
Handle unfinished group with question mark
(?. |
protected RegexSyntaxException |
handleUnfinishedRangeInClassSet()
Handle unfinished range in class set expression
[a-]. |
protected RegexSyntaxException |
handleUnmatchedLeftBracket()
Handle unmatched
[. |
protected void |
handleUnmatchedRightBrace()
Handle unmatched }.
|
protected void |
handleUnmatchedRightBracket()
Handle unmatched
]. |
protected int |
parseCodePointInGroupName()
Parse the next codepoint in a group name and return it.
|
protected Token |
parseCustomEscape(char c)
Parse any escape sequence starting with
\ and the argument c. |
protected int |
parseCustomEscapeChar(char c,
boolean inCharClass)
Parse an escape character sequence (inside character class, or other escapes have already
been tried) starting with
\ and the argument {code c}. |
protected int |
parseCustomEscapeCharFallback(int c,
boolean inCharClass)
Parse an escape character sequence (inside character class, or other escapes have already
been tried) starting with
\ and the code point c.This method is called after
all other means of parsing the escape sequence have been exhausted. |
protected Token |
parseCustomGroupBeginQ(char charAfterQuestionMark)
Parse group starting with
(?. |
protected Token |
parseGroupLt()
Parse group starting with
(<. |
void |
popLocalFlags() |
void |
pushLocalFlags(PythonFlags localFlags) |
RegexSyntaxException |
syntaxErrorAtAbs(String msg,
int i) |
RegexSyntaxException |
syntaxErrorHere(String msg) |
protected void |
validatePOSIXCollationElement(String sequence)
Checks if the given string is a valid collation element.
|
protected void |
validatePOSIXEquivalenceClass(String sequence)
Checks if the given string is a valid equivalence class.
|
advance, advance, atEnd, consumeChar, consumingLookahead, consumingLookahead, count, count, countDecimalDigits, countFrom, countUpTo, curChar, findChars, finishSurrogatePair, getLastAtomPosition, getLastCharacterClassBeginPosition, getLastTokenPosition, getNamedCaptureGroups, hasNamedCaptureGroups, hasNext, inCharacterClass, isAscii, isCurCharClassInverted, isDecimalDigit, isEscaped, isHexDigit, isOctalDigit, isPredefCharClass, literalChar, lookahead, lookahead, lookbehind, next, numberOfCaptureGroupsSoFar, parseGroupName, parseIntSaturated, parseIntSaturated, parseOctal, registerNamedCaptureGroup, retreat, syntaxError, totalNumberOfCaptureGroupspublic PythonRegexLexer(RegexSource source, PythonREMode mode, CompilationBuffer compilationBuffer)
public PythonLocaleData getLocaleData()
public void fixFlags()
public PythonFlags getGlobalFlags()
public void addGlobalFlags(PythonFlags newGlobalFlags)
public PythonFlags getLocalFlags()
public void pushLocalFlags(PythonFlags localFlags)
public void popLocalFlags()
protected boolean featureEnabledIgnoreCase()
RegexLexertrue if ignore-case mode is currently enabled.featureEnabledIgnoreCase in class RegexLexerprotected boolean featureEnabledAZPositionAssertions()
RegexLexertrue if \A and \Z position assertions are supported.featureEnabledAZPositionAssertions in class RegexLexerprotected boolean featureEnabledZLowerCaseAssertion()
RegexLexertrue if \z position assertion is supported.featureEnabledZLowerCaseAssertion in class RegexLexerprotected boolean featureEnabledWordBoundaries()
RegexLexertrue if \w and \W word boundary position assertions are
supported.featureEnabledWordBoundaries in class RegexLexerprotected boolean featureEnabledBoundedQuantifierEmptyMin()
RegexLexertrue if empty minimum values in bounded quantifiers (e.g. {,1}) are
allowed and treated as zero.featureEnabledBoundedQuantifierEmptyMin in class RegexLexerprotected boolean featureEnabledPossessiveQuantifiers()
RegexLexertrue if possessive quantifiers (+ suffix) are allowed.featureEnabledPossessiveQuantifiers in class RegexLexerprotected boolean featureEnabledCharClassFirstBracketIsLiteral()
RegexLexertrue if the first character in a character class must be interpreted as part
of the character set, even if it is the closing bracket ']'.featureEnabledCharClassFirstBracketIsLiteral in class RegexLexerprotected boolean featureEnabledNestedCharClasses()
RegexLexertrue if nested character classes are supported. This is required for
RegexLexer.featureEnabledPOSIXCharClasses() .featureEnabledNestedCharClasses in class RegexLexerprotected boolean featureEnabledPOSIXCharClasses()
RegexLexertrue if POSIX character classes, character equivalence classes, and the POSIX
Collating Element Operator are supported. Requires
RegexLexer.featureEnabledNestedCharClasses().featureEnabledPOSIXCharClasses in class RegexLexerprotected boolean featureEnabledForwardReferences()
RegexLexertrue if forward references are allowed.featureEnabledForwardReferences in class RegexLexerprotected boolean featureEnabledGroupComments()
RegexLexertrue if group comments (e.g. (# ... )) are supported.featureEnabledGroupComments in class RegexLexerprotected boolean featureEnabledLineComments()
RegexLexertrue if line comments (e.g. # ... ) are supported.featureEnabledLineComments in class RegexLexerprotected boolean featureEnabledIgnoreWhiteSpace()
RegexLexertrue if white space in the pattern is ignored. This is relevant only if line
comments are not supported.featureEnabledIgnoreWhiteSpace in class RegexLexerprotected TBitSet getWhitespace()
RegexLexergetWhitespace in class RegexLexerprotected boolean featureEnabledOctalEscapes()
RegexLexertrue if octal escapes (e.g. \012) are supported.featureEnabledOctalEscapes in class RegexLexerprotected boolean featureEnabledSpecialGroups()
RegexLexertrue if any constructs that alter a capture group's function, such as
non-capturing groups (?:) or look-around assertions (?=), are supported. If
this flag is false, groups starting with a question mark (? do not have any
special meaning.featureEnabledSpecialGroups in class RegexLexerprotected boolean featureEnabledUnicodePropertyEscapes()
RegexLexertrue if unicode property escapes (e.g. \p{...}) are supported.featureEnabledUnicodePropertyEscapes in class RegexLexerprotected boolean featureEnabledClassSetExpressions()
RegexLexertrue if class set expressions (e.g. [[\w\q{abc|xyz}]--[a-cx-z]]) are
supported.featureEnabledClassSetExpressions in class RegexLexerprotected CodePointSet getDotCodePointSet()
RegexLexergetDotCodePointSet in class RegexLexerprotected CodePointSet getIdContinue()
RegexLexergetIdContinue in class RegexLexerprotected CodePointSet getIdStart()
RegexLexergetIdStart in class RegexLexerprotected int getMaxBackReferenceDigits()
RegexLexergetMaxBackReferenceDigits in class RegexLexerprotected void caseFoldUnfold(CodePointSetAccumulator charClass)
RegexLexercaseFoldUnfold in class RegexLexerprotected CodePointSet complementClassSet(CodePointSet codePointSet)
RegexLexercomplementClassSet in class RegexLexerprotected ClassSetContents caseFoldClassSetAtom(ClassSetContents classSetContents)
RegexLexercaseFoldClassSetAtom in class RegexLexerprotected CodePointSet getPredefinedCharClass(char c)
RegexLexer\d).
Note that the CodePointSet returned by this function has already been case-folded and negated.
getPredefinedCharClass in class RegexLexerprotected void checkClassSetCharacter(int codePoint)
throws RegexSyntaxException
RegexLexercodepoint can appear as an unescaped literal class set character.checkClassSetCharacter in class RegexLexerRegexSyntaxExceptionprotected long boundedQuantifierMaxValue()
RegexLexerRegexLexer.handleBoundedQuantifierOverflow(long, long).boundedQuantifierMaxValue in class RegexLexerprotected RegexSyntaxException handleBoundedQuantifierOutOfOrder()
RegexLexer{2,1}.handleBoundedQuantifierOutOfOrder in class RegexLexerprotected Token handleBoundedQuantifierSyntaxError() throws RegexSyntaxException
RegexLexerhandleBoundedQuantifierSyntaxError in class RegexLexerRegexSyntaxExceptionprotected Token handleBoundedQuantifierOverflow(long min, long max)
RegexLexer{2147483649}. If this method
returns a non-null value, it will be returned instead of the current quantifier.handleBoundedQuantifierOverflow in class RegexLexerprotected Token handleBoundedQuantifierOverflowMin(long min, long max)
RegexLexer{2147483649}. If this method
returns a non-null value, it will be returned instead of the current quantifier. This method
is called when no explicit max value is present.handleBoundedQuantifierOverflowMin in class RegexLexerprotected RegexSyntaxException handleCCRangeOutOfOrder(int rangeStart)
RegexLexer[b-a].handleCCRangeOutOfOrder in class RegexLexerprotected void handleCCRangeWithPredefCharClass(int rangeStart,
ClassSetContents firstAtom,
ClassSetContents secondAtom)
RegexLexer[\w-a].handleCCRangeWithPredefCharClass in class RegexLexerprotected CodePointSet getPOSIXCharClass(String name)
RegexLexergetPOSIXCharClass in class RegexLexerprotected void validatePOSIXCollationElement(String sequence)
RegexLexervalidatePOSIXCollationElement in class RegexLexerprotected void validatePOSIXEquivalenceClass(String sequence)
RegexLexervalidatePOSIXEquivalenceClass in class RegexLexerprotected RegexSyntaxException handleComplementOfStringSet()
RegexLexer[^\q{abc}] or
\P{RGI_Emoji}.handleComplementOfStringSet in class RegexLexerprotected RegexSyntaxException handleEmptyGroupName()
RegexLexerhandleEmptyGroupName in class RegexLexerprotected void handleGroupRedefinition(String name, int newId, int oldId)
handleGroupRedefinition in class RegexLexerprotected void handleIncompleteEscapeX()
RegexLexer\x1.handleIncompleteEscapeX in class RegexLexerprotected void handleInvalidBackReference(int reference)
RegexLexerhandleInvalidBackReference in class RegexLexerprotected void handleInvalidBackReference(String reference)
RegexLexerhandleInvalidBackReference in class RegexLexerprotected RegexSyntaxException handleInvalidCharInCharClass()
handleInvalidCharInCharClass in class RegexLexerprotected RegexSyntaxException handleInvalidGroupBeginQ()
RegexLexer(? and invalid next char.handleInvalidGroupBeginQ in class RegexLexerprotected RegexSyntaxException handleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator, RegexLexer.ClassSetOperator rightOperator)
RegexLexerhandleMixedClassSetOperators in class RegexLexerprotected RegexSyntaxException handleMissingClassSetOperand(RegexLexer.ClassSetOperator operator)
RegexLexer[\s&&] or [\w--].handleMissingClassSetOperand in class RegexLexerprotected void handleOctalOutOfRange()
RegexLexerhandleOctalOutOfRange in class RegexLexerprotected RegexSyntaxException handleRangeAsClassSetOperand(RegexLexer.ClassSetOperator operator)
RegexLexerhandleRangeAsClassSetOperand in class RegexLexerprotected void handleUnfinishedEscape()
RegexLexer\).handleUnfinishedEscape in class RegexLexerprotected void handleUnfinishedGroupComment()
RegexLexer(#...).handleUnfinishedGroupComment in class RegexLexerprotected RegexSyntaxException handleUnfinishedGroupQ()
RegexLexer(?.handleUnfinishedGroupQ in class RegexLexerprotected RegexSyntaxException handleUnfinishedRangeInClassSet()
RegexLexer[a-].handleUnfinishedRangeInClassSet in class RegexLexerprotected void handleUnmatchedRightBrace()
RegexLexerhandleUnmatchedRightBrace in class RegexLexerprotected RegexSyntaxException handleUnmatchedLeftBracket()
RegexLexer[.handleUnmatchedLeftBracket in class RegexLexerprotected void handleUnmatchedRightBracket()
RegexLexer].handleUnmatchedRightBracket in class RegexLexerprotected int parseCodePointInGroupName()
throws RegexSyntaxException
RegexLexerparseCodePointInGroupName in class RegexLexerRegexSyntaxExceptionprotected Token parseCustomEscape(char c)
RegexLexer\ and the argument c.parseCustomEscape in class RegexLexerprotected int parseCustomEscapeChar(char c,
boolean inCharClass)
RegexLexer\ and the argument {code c}.parseCustomEscapeChar in class RegexLexerprotected int parseCustomEscapeCharFallback(int c,
boolean inCharClass)
RegexLexer\ and the code point c.This method is called after
all other means of parsing the escape sequence have been exhausted.parseCustomEscapeCharFallback in class RegexLexerprotected Token parseCustomGroupBeginQ(char charAfterQuestionMark)
RegexLexer(?.parseCustomGroupBeginQ in class RegexLexerprotected Token parseGroupLt()
RegexLexer(<.parseGroupLt in class RegexLexerpublic RegexSyntaxException syntaxErrorAtAbs(String msg, int i)
public RegexSyntaxException syntaxErrorHere(String msg)