Class Parser<T>

java.lang.Object
com.google.common.labs.parse.Parser<T>
Direct Known Subclasses:
Parser.Rule

public abstract class Parser<T> extends Object
A simple recursive descent parser combinator intended to parse simple grammars such as regex, csv format string patterns etc.

Different from most parser combinators (such as Haskell Parsec), a common source of bug (infinite loop or StackOverFlowError caused by accidental zero-consumption rule in the context of many() or recursive grammar) is made impossible by requiring all parsers to consume at least one character. Optional suffix is achieved through using the built-in combinators such as optionallyFollowedBy() and postfix(); or you can use the zeroOrMore(), zeroOrMoreDelimitedBy(), orElse() and optional() fluent chains.

For simplicity, or() and anyOf() will always backtrack upon failure. But it's more efficient to factor out common left prefix. For example instead of anyOf(expr.followedBy(";"), expr), use expr.optionallyFollowedBy(";")) instead.

WARNING: careful using this class to parse user-provided input, or in performance critical hot paths. Parser combinators are not known for optimal performance and recursive grammars can be subject to stack overflow error on maliciously crafted input (think of 10K left parens).

  • Method Details

    • single

      public static Parser<Character> single(CharPredicate matcher, String name)
      Matches a character as specified by matcher.
    • consecutive

      public static Parser<String> consecutive(CharPredicate matcher, String name)
      Matches one or more consecutive characters as specified by matcher.
    • consecutive

      public static Parser<String> consecutive(CharacterSet characterSet)
      Matches one or more consecutive characters contained in characterSet.

      For example:

      
       import static com.google.common.labs.parse.CharacterSet.charsIn;
      
       Parser<Integer> hexNumber = consecutive(charsIn("[0-9A-Fa-f]"))
           .map(hex -> Integer.parseInt(hex, 16));
       
      Since:
      9.4
    • chars

      public static Parser<String> chars(int n)
      Consumes exactly n consecutive characters. n must be positive.
      Since:
      9.4
    • word

      public static Parser<String> word()
      One or more regex \w+ characters.
      Since:
      9.4
    • digits

      public static Parser<String> digits()
      One or more regex \d+ characters.
      Since:
      9.4
    • string

      public static Parser<String> string(String value)
      Matches a literal string.
    • quotedStringWithEscapes

      public static Parser<String> quotedStringWithEscapes(char quoteChar, Parser<? extends CharSequence> escaped)
      String literal quoted by quoteChar with backslash escapes.

      When a backslash is encountered, the escaped parser is used to parse the escaped character(s).

      For example:

      
       quotedStringWithEscapes('"', chars(1)).parse("foo\\\\bar");
       
      will treat the escaped character as literal and return "foo\\bar".

      You can also support Unicode escaping:

      
       Parser<String> unicodeEscaped = string("u")
           .then(chars(4))
           .suchThat(charsIn("[0-9A-Fa-f]")::matchesAllOf, "4 hex digits")
           .map(digits -> Character.toString(Integer.parseInt(digits, 16)));
       quotedStringWithEscapes('"', unicodeEscaped.or(chars(1))).parse("foo\\uD83D");
       
    • sequence

      public static <A,B,C> Parser<C> sequence(Parser<A> left, Parser<B> right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right, and then combines the results using the combiner function.
    • sequence

      public static <A,B,C> Parser<C> sequence(Parser<A> left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right (which is allowed to be optional), and then combines the results using the combiner function. If right is empty, the default value is passed to the combiner function.
    • sequence

      public static <A,B,C> Parser<C>.OrEmpty sequence(Parser<A>.OrEmpty left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right, with both allowed to be optional, and then combines the results using the combiner function. If either is empty, the corresponding default value is passed to the combiner function.
    • anyOf

      @SafeVarargs public static <T> Parser<T> anyOf(Parser<? extends T>... parsers)
      Matches if any of the given parsers match.
    • or

      public static <T> Collector<Parser<? extends T>, ?, Parser<T>> or()
      Returns a collector that results in a parser that matches if any of the input parsers match.
    • or

      public final Parser<T> or(Parser<? extends T> that)
      Matches if this or that matches.
    • or

      public final Parser<T>.OrEmpty or(Parser<? extends T>.OrEmpty that)
      Matches if this or that matches. If both failed to match, use the default result specified in that.
      Since:
      9.4
    • atLeastOnce

      public final Parser<List<T>> atLeastOnce()
      Returns a parser that applies this parser at least once, greedily.
    • atLeastOnce

      public final Parser<T> atLeastOnce(BinaryOperator<T> reducer)
      Returns a parser that applies this parser at least once, greedily, and reduces the results using the reducer function.
      Since:
      9.4
    • atLeastOnce

      public final <A,R> Parser<R> atLeastOnce(Collector<? super T, A, ? extends R> collector)
      Returns a parser that applies this parser at least once, greedily, and collects the return values using collector.
    • atLeastOnceDelimitedBy

      public final Parser<List<T>> atLeastOnceDelimitedBy(String delimiter)
      Returns a parser that matches the current parser at least once, delimited by the given delimiter.

      For example if you want to express the regex pattern (a|b|c), you can use:

      
       Parser.anyOf(string("a"), string("b"), string("c"))
           .atLeastOnceDelimitedBy("|")
       
    • atLeastOnceDelimitedBy

      public final Parser<T> atLeastOnceDelimitedBy(String delimiter, BinaryOperator<T> reducer)
      Returns a parser that matches the current parser at least once, delimited by the given delimiter, using the given reducer function to reduce the results.
      Since:
      9.4
    • atLeastOnceDelimitedBy

      public final <A,R> Parser<R> atLeastOnceDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector)
      Returns a parser that matches the current parser at least once, delimited by the given delimiter.

      For example if you want to express the regex pattern (a|b|c), you can use:

      
       Parser.anyOf(string("a"), string("b"), string("c"))
           .atLeastOnceDelimitedBy("|", RegexPattern.asAlternation())
       
    • zeroOrMore

      public static Parser<String>.OrEmpty zeroOrMore(CharPredicate charsToMatch, String name)
      Starts a fluent chain for matching consecutive charsToMatch zero or more times. If no such character is found, empty string is the result.

      For example if you need to parse a quoted literal that's allowed to be empty:

      
       zeroOrMore(c -> c != '\'', "quoted").between("'", "'")
       
    • zeroOrMore

      public final Parser<List<T>>.OrEmpty zeroOrMore()
      Starts a fluent chain for matching the current parser zero or more times.

      For example if you want to parse a list of statements between a pair of curly braces, you can use:

      
       statement.zeroOrMore().between("{", "}")
       
    • zeroOrMore

      public final <A,R> Parser<R>.OrEmpty zeroOrMore(Collector<? super T, A, ? extends R> collector)
      Starts a fluent chain for matching the current parser zero or more times. collector is used to collect the parsed results and the empty collector result will be used if this parser matches zero times.

      For example if you want to parse a list of statements between a pair of curly braces, you can use:

      
       statement.zeroOrMore(toBlock()).between("{", "}")
       
    • zeroOrMoreDelimitedBy

      public final Parser<List<T>>.OrEmpty zeroOrMoreDelimitedBy(String delimiter)
      Starts a fluent chain for matching the current parser zero or more times, delimited by delimiter.

      For example if you want to parse a list of names [a,b,c], you can use:

      
       consecutive(ALPHA, "item")
           .zeroOrMoreDelimitedBy(",")
           .between("[", "]")
       
    • zeroOrMoreDelimitedBy

      public final <A,R> Parser<R>.OrEmpty zeroOrMoreDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector)
      Starts a fluent chain for matching the current parser zero or more times, delimited by delimiter. collector is used to collect the parsed results and the empty collector result will be used if this parser matches zero times.

      For example if you want to parse a set of names [a,b,c], you can use:

      
       consecutive(ALPHA, "item")
           .zeroOrMoreDelimitedBy(",", toImmutableSet())
           .between("[", "]")
       
    • zeroOrMoreDelimited

      public static <A,B,R> Parser<R>.OrEmpty zeroOrMoreDelimited(Parser<A> first, Parser<B> second, String delimiter, BiCollector<? super A, ? super B, R> collector)
      Applies first and second patterns in order, for zero or more times, collecting the results using the provided BiCollector.

      Typically used to parse key-value pairs:

      
       import static com.google.mu.util.stream.BiCollectors.toMap;
       import static java.util.stream.Collectors.toList;
      
       Parser<Map<String, List<String>>> jsonMap =
           zeroOrMoreDelimited(
                  word().followedBy(":"),
                  quotedStringWithEscapes('"', Object::toString)),
                  ",",
                  toMap(toList()))
               .followedBy(string(",").optional()) // only if you need to allow trailing comma
               .between("{", "}");
       
      Since:
      9.4
    • zeroOrMoreDelimited

      public static <A,B,R> Parser<R>.OrEmpty zeroOrMoreDelimited(Parser<A> first, Parser<B>.OrEmpty second, String delimiter, BiCollector<? super A, ? super B, R> collector)
      Applies first and the optional second patterns in order, for zero or more times, collecting the results using the provided BiCollector.

      Typically used to parse key-value pairs:

      
       import static com.google.mu.util.stream.BiCollectors.toMap;
      
       Parser<Map<String, Integer>> keyValues =
           zeroOrMoreDelimited(
                  word(),
                  string("=").then(digits()).map(Integer::parseInt).orElse(0),
                  ",",
                  toMap())
               .followedBy(string(",").optional()) // only if you need to allow trailing comma
               .between("{", "}");
       
      Since:
      9.4
    • prefix

      public final Parser<T> prefix(Parser<? extends UnaryOperator<T>> operator)
      Returns a parser that applies the operator parser zero or more times before this and applies the result unary operator functions iteratively.

      For infix operator support, consider using OperatorTable.

    • postfix

      public final Parser<T> postfix(Parser<? extends UnaryOperator<T>> operator)
      Returns a parser that after this parser succeeds, applies the operator parser zero or more times and apply the result unary operator function iteratively.

      This is useful to parse postfix operators such as in regex the quantifiers are usually postfix.

      For infix operator support, consider using OperatorTable.

    • between

      public final Parser<T> between(String prefix, String suffix)
      Returns a parser that matches the current parser enclosed between prefix and suffix, which are non-empty string delimiters.
    • between

      public final Parser<T> between(Parser<?> prefix, Parser<?> suffix)
      Returns a parser that matches the current parser enclosed between prefix and suffix.
    • immediatelyBetween

      public final Parser<T> immediatelyBetween(String prefix, String suffix)
      Returns a parser that matches the current parser immediately enclosed between prefix and suffix (no skippable characters as specified by parseSkipping() in between).
    • map

      public final <R> Parser<R> map(Function<? super T, ? extends R> f)
      If this parser matches, returns the result of applying the given function to the match.
    • flatMap

      public final <R> Parser<R> flatMap(Function<? super T, Parser<R>> f)
      If this parser matches, applies function f to get the next parser to match in sequence.
    • thenReturn

      public final <R> Parser<R> thenReturn(R result)
      If this parser matches, returns the given result.
    • then

      public final <R> Parser<R> then(Parser<R> next)
      If this parser matches, applies the given parser on the remaining input.
    • then

      public final <R> Parser<R> then(Parser<R>.OrEmpty next)
      If this parser matches, applies the given optional (or zero-or-more) parser on the remaining input.
    • suchThat

      public final Parser<T> suchThat(Predicate<? super T> condition, String name)
      If this parser matches, applies the given condition and disqualifies the match if the condition is false.

      For example if you are trying to parse a non-reserved word, you can use:

      
       Set<String> reservedWords = ...;
       Parser<String> identifier = Parser.WORD.suchThat(w -> !reservedWords.contains(w), "identifier");
       
      Since:
      9.4
    • followedBy

      public final Parser<T> followedBy(String suffix)
      If this parser matches, continue to match suffix.
    • followedBy

      public final Parser<T> followedBy(Parser<?> suffix)
      If this parser matches, continue to match suffix.
    • followedBy

      public final <X> Parser<T> followedBy(Parser<X>.OrEmpty suffix)
      If this parser matches, continue to match the optional suffix.
    • followedByOrEof

      public final Parser<T> followedByOrEof(Parser<?> suffix)
      Specifies that the matched pattern must be either followed by suffix or EOF. No other suffixes allowed.
      Since:
      9.4
    • optionallyFollowedBy

      public final Parser<T> optionallyFollowedBy(String suffix)
      Returns an equivalent parser except it allows suffix if present.
    • optionallyFollowedBy

      public final Parser<T> optionallyFollowedBy(String suffix, Function<? super T, ? extends T> op)
      If this parser matches, optionally applies the op function if the pattern is followed by suffix.
    • notFollowedBy

      public final Parser<T> notFollowedBy(String suffix)
      A form of negative lookahead such that the match is rejected if followed by suffix.
    • notFollowedBy

      public final Parser<T> notFollowedBy(Parser<?> suffix, String name)
      A form of negative lookahead such that the match is rejected if followed by suffix.
    • notImmediatelyFollowedBy

      public final Parser<T> notImmediatelyFollowedBy(CharPredicate predicate, String name)
      A form of negative lookahead such that the match is rejected if immediately followed by (no skippable characters as specified by parseSkipping() in between) a character that matches predicate. Useful for parsing keywords such as string("if").notImmediatelyFollowedBy(IDENTIFIER_CHAR, "identifier char").
    • orElse

      public final Parser<T>.OrEmpty orElse(T defaultValue)
      Starts a fluent chain for matching the current parser optionally. defaultValue will be the result in case the current parser doesn't match.

      For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:

      
       consecutive(ALPHA, "placeholder name")
           .orElse(EMPTY_PLACEHOLDER)
           .between("{", "}")
       
    • optional

      public final Parser<Optional<T>>.OrEmpty optional()
      Starts a fluent chain for matching the current parser optionally. Optional.empty() will be the result in case the current parser doesn't match.

      For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:

      
       consecutive(ALPHA, "placeholder name")
           .optional()
           .between("{", "}")
       
    • source

      public final Parser<String> source()
      Returns a parser that matches the current parser and returns the matched string.
    • literally

      public static <T> Parser<T> literally(Parser<T> parser)
      Returns an equivalent parser that suppresses character skipping that's otherwise applied if parseSkipping() or skipping() are called. For example quoted string literals should not skip whitespaces.
    • literally

      public static <T> Parser<T>.OrEmpty literally(Parser<T>.OrEmpty rule)
      Specifies that the optional (or zero-or-more) rule should be matched literally even if parseSkipping() or skipping() is called.
    • skipping

      public final Parser<T>.Lexical skipping(Parser<?> skip)
      Starts a fluent chain for parsing inputs while skipping patterns matched by skip.
    • skipping

      public final Parser<T>.Lexical skipping(CharPredicate charsToSkip)
      Starts a fluent chain for parsing inputs while skipping charsToSkip.

      For example:

      
       jsonRecord.skipping(whitespace()).parseToStream(input);
       
    • parseSkipping

      public final T parseSkipping(Parser<?> skip, String input)
      Parses input while skipping patterns matched by skip around atomic matches.

      Equivalent to skipping(skip).parse(input).

    • parseSkipping

      public final T parseSkipping(CharPredicate charsToSkip, String input)
      Parses input while charsToSkip around atomic matches.

      Equivalent to skipping(charsToSkip).parse(input).

    • parse

      public final T parse(String input)
      Parses the entire input string and returns the result. Upon successful return, the input is fully consumed.
      Throws:
      Parser.ParseException - if the input cannot be parsed.
    • parse

      public final T parse(String input, int fromIndex)
      Parses the input string starting from fromIndex and returns the result. Upon successful return, the input starting from fromIndex is fully consumed.
      Throws:
      Parser.ParseException - if the input cannot be parsed.
    • parseToStream

      public final Stream<T> parseToStream(String input)
      Parses the entire input string lazily by applying this parser repeatedly until the end of input. Results are returned in a lazy stream.
    • parseToStream

      public final Stream<T> parseToStream(String input, int fromIndex)
      Parses input starting from fromIndex to a lazy stream while skipping the skippable patterns around lexical tokens.
    • parseToStream

      public final Stream<T> parseToStream(Reader input)
      Parses the input reader lazily by applying this parser repeatedly until the end of input. Results are returned in a lazy stream.

      UncheckedIOException will be thrown if the underlying reader throws.

      Characters are internally buffered, so you don't need to pass in BufferedReader.

    • probe

      public final Stream<T> probe(String input)
      Lazily and iteratively matches input, until the input is exhausted or matching failed.

      Note that unlike parseToStream(), a matching failure terminates the stream without throwing exception.

      This allows quick probing without fully parsing it.

    • probe

      public final Stream<T> probe(String input, int fromIndex)
      Lazily and iteratively matches input starting from fromIndex, skipping the skippable patterns, until the input is exhausted or matching failed. Note that unlike parseToStream(), a matching failure terminates the stream without throwing exception.

      This allows quick probing without fully parsing it.

    • probe

      public final Stream<T> probe(Reader input)
      Lazily and iteratively matches input reader, until the input is exhausted or matching failed.

      Note that unlike parseToStream(), a matching failure terminates the stream without throwing exception.

      This allows quick probing without fully parsing it.

      UncheckedIOException will be thrown if the underlying reader throws.

      Characters are internally buffered, so you don't need to pass in BufferedReader.

    • define

      public static <T> Parser<T> define(Function<? super Parser<T>, ? extends Parser<? extends T>> definition)
      Defines a simple recursive grammar without needing to explicitly forward-declare a Parser.Rule. Essentially a fixed-point. For example:
      
       Parser<Expr> atomic = ...;
       Parser<Expr> expression = define(
           expr ->
               anyOf(expr.between("(", ")"), atomic)
                   .atLeastOnceDelimitedBy("+")
                   .map(nums -> nums.stream().mapToInt(n -> n).sum()));
       
      Since:
      9.4