Class Parser<T>
- Direct Known Subclasses:
Parser.Rule
Different from most parser combinators (such as Haskell Parsec), a common source of bug
(infinite loop or StackOverFlowError caused by accidental zero-consumption rule in the context of
many() or recursive grammar) is made impossible by requiring all parsers to consume at least one
character. Optional suffix is achieved through using the built-in combinators such as optionallyFollowedBy() and postfix(); or you can use the
zeroOrMore(), zeroOrMoreDelimitedBy(),
orElse() and optional() fluent chains.
For simplicity, or() and anyOf() will always backtrack upon failure.
But it's more efficient to factor out common left prefix. For example instead of
anyOf(expr.followedBy(";"), expr), use expr.optionallyFollowedBy(";")) instead.
WARNING: careful using this class to parse user-provided input, or in performance critical hot paths. Parser combinators are not known for optimal performance and recursive grammars can be subject to stack overflow error on maliciously crafted input (think of 10K left parens).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionfinal classFluent API for parsing while skipping patterns around lexical tokens.final classFacilitates a fluent chain for matching the current parser optionally.static classThrown if parsing failed.static final classA forward-declared grammar rule, to be used for recursive grammars. -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> Parser<T> Matches if any of the givenparsersmatch.Returns a parser that applies this parser at least once, greedily.atLeastOnce(BinaryOperator<T> reducer) Returns a parser that applies this parser at least once, greedily, and reduces the results using thereducerfunction.final <A,R> Parser <R> atLeastOnce(Collector<? super T, A, ? extends R> collector) Returns a parser that applies this parser at least once, greedily, and collects the return values usingcollector.atLeastOnceDelimitedBy(String delimiter) Returns a parser that matches the current parser at least once, delimited by the given delimiter.atLeastOnceDelimitedBy(String delimiter, BinaryOperator<T> reducer) Returns a parser that matches the current parser at least once, delimited by the given delimiter, using the givenreducerfunction to reduce the results.final <A,R> Parser <R> atLeastOnceDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector) Returns a parser that matches the current parser at least once, delimited by the given delimiter.Returns a parser that matches the current parser enclosed betweenprefixandsuffix.Returns a parser that matches the current parser enclosed betweenprefixandsuffix, which are non-empty string delimiters.chars(int n) Consumes exactlynconsecutive characters.consecutive(CharacterSet characterSet) Matches one or more consecutive characters contained incharacterSet.consecutive(CharPredicate matcher, String name) Matches one or more consecutive characters as specified bymatcher.static <T> Parser<T> Defines a simple recursive grammar without needing to explicitly forward-declare aParser.Rule.digits()One or more regex\d+characters.final <R> Parser<R> If this parser matches, applies functionfto get the next parser to match in sequence.followedBy(Parser<?> suffix) If this parser matches, continue to matchsuffix.followedBy(Parser<X>.OrEmpty suffix) If this parser matches, continue to match the optionalsuffix.followedBy(String suffix) If this parser matches, continue to matchsuffix.followedByOrEof(Parser<?> suffix) Specifies that the matched pattern must be either followed bysuffixor EOF.immediatelyBetween(String prefix, String suffix) Returns a parser that matches the current parser immediately enclosed betweenprefixandsuffix(no skippable characters as specified byparseSkipping()in between).static <T> Parser<T> Returns an equivalent parser that suppresses character skipping that's otherwise applied ifparseSkipping()orskipping()are called.Specifies that the optional (or zero-or-more)ruleshould be matched literally even ifparseSkipping()orskipping()is called.final <R> Parser<R> If this parser matches, returns the result of applying the given function to the match.notFollowedBy(Parser<?> suffix, String name) A form of negative lookahead such that the match is rejected if followed bysuffix.notFollowedBy(String suffix) A form of negative lookahead such that the match is rejected if followed bysuffix.notImmediatelyFollowedBy(CharPredicate predicate, String name) A form of negative lookahead such that the match is rejected if immediately followed by (no skippable characters as specified byparseSkipping()in between) a character that matchespredicate.optional()Starts a fluent chain for matching the current parser optionally.optionallyFollowedBy(String suffix) Returns an equivalent parser except it allowssuffixif present.optionallyFollowedBy(String suffix, Function<? super T, ? extends T> op) If this parser matches, optionally applies theopfunction if the pattern is followed bysuffix.or()Returns a collector that results in a parser that matches if any of the inputparsersmatch.Matches ifthisorthatmatches.Matches ifthisorthatmatches.Starts a fluent chain for matching the current parser optionally.final TParses the entire input string and returns the result.final TParses the input string starting fromfromIndexand returns the result.final TparseSkipping(Parser<?> skip, String input) Parsesinputwhile skipping patterns matched byskiparound atomic matches.final TparseSkipping(CharPredicate charsToSkip, String input) ParsesinputwhilecharsToSkiparound atomic matches.parseToStream(Reader input) Parses the input reader lazily by applying this parser repeatedly until the end of input.parseToStream(String input) Parses the entire input string lazily by applying this parser repeatedly until the end of input.parseToStream(String input, int fromIndex) Parsesinputstarting fromfromIndexto a lazy stream while skipping the skippable patterns around lexical tokens.postfix(Parser<? extends UnaryOperator<T>> operator) Returns a parser that after this parser succeeds, applies theoperatorparser zero or more times and apply the result unary operator function iteratively.prefix(Parser<? extends UnaryOperator<T>> operator) Returns a parser that applies theoperatorparser zero or more times beforethisand applies the result unary operator functions iteratively.Lazily and iteratively matchesinputreader, until the input is exhausted or matching failed.Lazily and iteratively matchesinput, until the input is exhausted or matching failed.Lazily and iteratively matchesinputstarting fromfromIndex, skipping the skippable patterns, until the input is exhausted or matching failed.quotedStringWithEscapes(char quoteChar, Parser<? extends CharSequence> escaped) String literal quoted byquoteCharwith backslash escapes.sequence(Parser<A>.OrEmpty left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright, with both allowed to be optional, and then combines the results using thecombinerfunction.static <A,B, C> Parser <C> sequence(Parser<A> left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright(which is allowed to be optional), and then combines the results using thecombinerfunction.static <A,B, C> Parser <C> sequence(Parser<A> left, Parser<B> right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright, and then combines the results using thecombinerfunction.single(CharPredicate matcher, String name) Matches a character as specified bymatcher.Starts a fluent chain for parsing inputs while skipping patterns matched byskip.skipping(CharPredicate charsToSkip) Starts a fluent chain for parsing inputs while skippingcharsToSkip.source()Returns a parser that matches the current parser and returns the matched string.Matches a literalstring.If this parser matches, applies the givenconditionand disqualifies the match if the condition is false.final <R> Parser<R> If this parser matches, applies the given parser on the remaining input.final <R> Parser<R> If this parser matches, applies the given optional (or zero-or-more) parser on the remaining input.final <R> Parser<R> thenReturn(R result) If this parser matches, returns the given result.word()One or more regex\w+characters.Starts a fluent chain for matching the current parser zero or more times.zeroOrMore(CharPredicate charsToMatch, String name) Starts a fluent chain for matching consecutivecharsToMatchzero or more times.zeroOrMore(Collector<? super T, A, ? extends R> collector) Starts a fluent chain for matching the current parser zero or more times.zeroOrMoreDelimited(Parser<A> first, Parser<B>.OrEmpty second, String delimiter, BiCollector<? super A, ? super B, R> collector) Appliesfirstand the optionalsecondpatterns in order, for zero or more times, collecting the results using the providedBiCollector.zeroOrMoreDelimited(Parser<A> first, Parser<B> second, String delimiter, BiCollector<? super A, ? super B, R> collector) Appliesfirstandsecondpatterns in order, for zero or more times, collecting the results using the providedBiCollector.zeroOrMoreDelimitedBy(String delimiter) Starts a fluent chain for matching the current parser zero or more times, delimited bydelimiter.zeroOrMoreDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector) Starts a fluent chain for matching the current parser zero or more times, delimited bydelimiter.
-
Method Details
-
single
Matches a character as specified bymatcher. -
consecutive
Matches one or more consecutive characters as specified bymatcher. -
consecutive
Matches one or more consecutive characters contained incharacterSet.For example:
import static com.google.common.labs.parse.CharacterSet.charsIn; Parser<Integer> hexNumber = consecutive(charsIn("[0-9A-Fa-f]")) .map(hex -> Integer.parseInt(hex, 16));- Since:
- 9.4
-
chars
-
word
-
digits
-
string
-
quotedStringWithEscapes
public static Parser<String> quotedStringWithEscapes(char quoteChar, Parser<? extends CharSequence> escaped) String literal quoted byquoteCharwith backslash escapes.When a backslash is encountered, the
escapedparser is used to parse the escaped character(s).For example:
will treat the escaped character as literal and returnquotedStringWithEscapes('"', chars(1)).parse("foo\\\\bar");"foo\\bar".You can also support Unicode escaping:
Parser<String> unicodeEscaped = string("u") .then(chars(4)) .suchThat(charsIn("[0-9A-Fa-f]")::matchesAllOf, "4 hex digits") .map(digits -> Character.toString(Integer.parseInt(digits, 16))); quotedStringWithEscapes('"', unicodeEscaped.or(chars(1))).parse("foo\\uD83D"); -
sequence
public static <A,B, Parser<C> sequenceC> (Parser<A> left, Parser<B> right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright, and then combines the results using thecombinerfunction. -
sequence
public static <A,B, Parser<C> sequenceC> (Parser<A> left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright(which is allowed to be optional), and then combines the results using thecombinerfunction. Ifrightis empty, the default value is passed to thecombinerfunction. -
sequence
public static <A,B, Parser<C>.OrEmpty sequenceC> (Parser<A>.OrEmpty left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner) Sequentially matchesleftthenright, with both allowed to be optional, and then combines the results using thecombinerfunction. If either is empty, the corresponding default value is passed to thecombinerfunction. -
anyOf
Matches if any of the givenparsersmatch. -
or
-
or
-
or
-
atLeastOnce
-
atLeastOnce
Returns a parser that applies this parser at least once, greedily, and reduces the results using thereducerfunction.- Since:
- 9.4
-
atLeastOnce
-
atLeastOnceDelimitedBy
Returns a parser that matches the current parser at least once, delimited by the given delimiter.For example if you want to express the regex pattern
(a|b|c), you can use:Parser.anyOf(string("a"), string("b"), string("c")) .atLeastOnceDelimitedBy("|") -
atLeastOnceDelimitedBy
Returns a parser that matches the current parser at least once, delimited by the given delimiter, using the givenreducerfunction to reduce the results.- Since:
- 9.4
-
atLeastOnceDelimitedBy
public final <A,R> Parser<R> atLeastOnceDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector) Returns a parser that matches the current parser at least once, delimited by the given delimiter.For example if you want to express the regex pattern
(a|b|c), you can use:Parser.anyOf(string("a"), string("b"), string("c")) .atLeastOnceDelimitedBy("|", RegexPattern.asAlternation()) -
zeroOrMore
Starts a fluent chain for matching consecutivecharsToMatchzero or more times. If no such character is found, empty string is the result.For example if you need to parse a quoted literal that's allowed to be empty:
zeroOrMore(c -> c != '\'', "quoted").between("'", "'") -
zeroOrMore
-
zeroOrMore
Starts a fluent chain for matching the current parser zero or more times.collectoris used to collect the parsed results and the empty collector result will be used if this parser matches zero times.For example if you want to parse a list of statements between a pair of curly braces, you can use:
statement.zeroOrMore(toBlock()).between("{", "}") -
zeroOrMoreDelimitedBy
Starts a fluent chain for matching the current parser zero or more times, delimited bydelimiter.For example if you want to parse a list of names
[a,b,c], you can use:consecutive(ALPHA, "item") .zeroOrMoreDelimitedBy(",") .between("[", "]") -
zeroOrMoreDelimitedBy
public final <A,R> Parser<R>.OrEmpty zeroOrMoreDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector) Starts a fluent chain for matching the current parser zero or more times, delimited bydelimiter.collectoris used to collect the parsed results and the empty collector result will be used if this parser matches zero times.For example if you want to parse a set of names
[a,b,c], you can use:consecutive(ALPHA, "item") .zeroOrMoreDelimitedBy(",", toImmutableSet()) .between("[", "]") -
zeroOrMoreDelimited
public static <A,B, Parser<R>.OrEmpty zeroOrMoreDelimitedR> (Parser<A> first, Parser<B> second, String delimiter, BiCollector<? super A, ? super B, R> collector) Appliesfirstandsecondpatterns in order, for zero or more times, collecting the results using the providedBiCollector.Typically used to parse key-value pairs:
import static com.google.mu.util.stream.BiCollectors.toMap; import static java.util.stream.Collectors.toList; Parser<Map<String, List<String>>> jsonMap = zeroOrMoreDelimited( word().followedBy(":"), quotedStringWithEscapes('"', Object::toString)), ",", toMap(toList())) .followedBy(string(",").optional()) // only if you need to allow trailing comma .between("{", "}");- Since:
- 9.4
-
zeroOrMoreDelimited
public static <A,B, Parser<R>.OrEmpty zeroOrMoreDelimitedR> (Parser<A> first, Parser<B>.OrEmpty second, String delimiter, BiCollector<? super A, ? super B, R> collector) Appliesfirstand the optionalsecondpatterns in order, for zero or more times, collecting the results using the providedBiCollector.Typically used to parse key-value pairs:
import static com.google.mu.util.stream.BiCollectors.toMap; Parser<Map<String, Integer>> keyValues = zeroOrMoreDelimited( word(), string("=").then(digits()).map(Integer::parseInt).orElse(0), ",", toMap()) .followedBy(string(",").optional()) // only if you need to allow trailing comma .between("{", "}");- Since:
- 9.4
-
prefix
Returns a parser that applies theoperatorparser zero or more times beforethisand applies the result unary operator functions iteratively.For infix operator support, consider using
OperatorTable. -
postfix
Returns a parser that after this parser succeeds, applies theoperatorparser zero or more times and apply the result unary operator function iteratively.This is useful to parse postfix operators such as in regex the quantifiers are usually postfix.
For infix operator support, consider using
OperatorTable. -
between
-
between
-
immediatelyBetween
Returns a parser that matches the current parser immediately enclosed betweenprefixandsuffix(no skippable characters as specified byparseSkipping()in between). -
map
-
flatMap
-
thenReturn
If this parser matches, returns the given result. -
then
-
then
-
suchThat
If this parser matches, applies the givenconditionand disqualifies the match if the condition is false.For example if you are trying to parse a non-reserved word, you can use:
Set<String> reservedWords = ...; Parser<String> identifier = Parser.WORD.suchThat(w -> !reservedWords.contains(w), "identifier");- Since:
- 9.4
-
followedBy
-
followedBy
-
followedBy
-
followedByOrEof
-
optionallyFollowedBy
-
optionallyFollowedBy
-
notFollowedBy
-
notFollowedBy
-
notImmediatelyFollowedBy
A form of negative lookahead such that the match is rejected if immediately followed by (no skippable characters as specified byparseSkipping()in between) a character that matchespredicate. Useful for parsing keywords such asstring("if").notImmediatelyFollowedBy(IDENTIFIER_CHAR, "identifier char"). -
orElse
Starts a fluent chain for matching the current parser optionally.defaultValuewill be the result in case the current parser doesn't match.For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:
consecutive(ALPHA, "placeholder name") .orElse(EMPTY_PLACEHOLDER) .between("{", "}") -
optional
Starts a fluent chain for matching the current parser optionally.Optional.empty()will be the result in case the current parser doesn't match.For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:
consecutive(ALPHA, "placeholder name") .optional() .between("{", "}") -
source
-
literally
Returns an equivalent parser that suppresses character skipping that's otherwise applied ifparseSkipping()orskipping()are called. For example quoted string literals should not skip whitespaces. -
literally
Specifies that the optional (or zero-or-more)ruleshould be matched literally even ifparseSkipping()orskipping()is called. -
skipping
-
skipping
Starts a fluent chain for parsing inputs while skippingcharsToSkip.For example:
jsonRecord.skipping(whitespace()).parseToStream(input); -
parseSkipping
-
parseSkipping
ParsesinputwhilecharsToSkiparound atomic matches.Equivalent to
skipping(charsToSkip).parse(input). -
parse
Parses the entire input string and returns the result. Upon successful return, theinputis fully consumed.- Throws:
Parser.ParseException- if the input cannot be parsed.
-
parse
Parses the input string starting fromfromIndexand returns the result. Upon successful return, theinputstarting fromfromIndexis fully consumed.- Throws:
Parser.ParseException- if the input cannot be parsed.
-
parseToStream
-
parseToStream
-
parseToStream
Parses the input reader lazily by applying this parser repeatedly until the end of input. Results are returned in a lazy stream.UncheckedIOExceptionwill be thrown if the underlying reader throws.Characters are internally buffered, so you don't need to pass in
BufferedReader. -
probe
Lazily and iteratively matchesinput, until the input is exhausted or matching failed.Note that unlike
parseToStream(), a matching failure terminates the stream without throwing exception.This allows quick probing without fully parsing it.
-
probe
Lazily and iteratively matchesinputstarting fromfromIndex, skipping the skippable patterns, until the input is exhausted or matching failed. Note that unlikeparseToStream(), a matching failure terminates the stream without throwing exception.This allows quick probing without fully parsing it.
-
probe
Lazily and iteratively matchesinputreader, until the input is exhausted or matching failed.Note that unlike
parseToStream(), a matching failure terminates the stream without throwing exception.This allows quick probing without fully parsing it.
UncheckedIOExceptionwill be thrown if the underlying reader throws.Characters are internally buffered, so you don't need to pass in
BufferedReader. -
define
public static <T> Parser<T> define(Function<? super Parser<T>, ? extends Parser<? extends T>> definition) Defines a simple recursive grammar without needing to explicitly forward-declare aParser.Rule. Essentially a fixed-point. For example:Parser<Expr> atomic = ...; Parser<Expr> expression = define( expr -> anyOf(expr.between("(", ")"), atomic) .atLeastOnceDelimitedBy("+") .map(nums -> nums.stream().mapToInt(n -> n).sum()));- Since:
- 9.4
-