Class Parser<T>

java.lang.Object
com.google.common.labs.parse.Parser<T>
Direct Known Subclasses:
Parser.Lazy

public abstract class Parser<T> extends Object
A simple recursive descent parser combinator intended to parse simple grammars such as regex, csv format string patterns etc.

Different from most parser combinators (such as Haskell Parsec), a common source of bug (infinite loop or StackOverFlowError caused by accidental zero-consumption rule in the context of many() or recursive grammar) is made impossible by requiring all parsers to consume at least one character. Optional suffix is achieved through using the built-in combinators such as optionallyFollowedBy() and postfix(); or you can use the zeroOrMore(), zeroOrMoreDelimitedBy(), orElse() and optional() fluent chains.

For simplicity, or() and anyOf() will always backtrack upon failure. But it's more efficient to factor out common left prefix. For example instead of anyOf(expr.followedBy(";"), expr), use expr.optionallyFollowedBy(";")) instead.

WARNING: do not use this class to parse user-provided input, or in performance critical hot paths. Parser combinators are not known for optimal performance and recursive grammars can be subject to stack overflow error on maliciously crafted input (think of 10K left parens).

  • Method Details

    • single

      public static Parser<Character> single(CharPredicate matcher, String name)
      Matches a character as specified by matcher.
    • consecutive

      public static Parser<String> consecutive(CharPredicate matcher, String name)
      Matches one or more consecutive characters as specified by matcher.
    • string

      public static Parser<String> string(String value)
      Matches a literal string.
    • sequence

      public static <A,B,C> Parser<C> sequence(Parser<A> left, Parser<B> right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right, and then combines the results using the combiner function.
    • sequence

      public static <A,B,C> Parser<C> sequence(Parser<A> left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right (which is allowed to be optional), and then combines the results using the combiner function. If right is empty, the default value is passed to the combiner function.
    • sequence

      public static <A,B,C> Parser<C>.OrEmpty sequence(Parser<A>.OrEmpty left, Parser<B>.OrEmpty right, BiFunction<? super A, ? super B, ? extends C> combiner)
      Sequentially matches left then right, with both allowed to be optional, and then combines the results using the combiner function. If either is empty, the corresponding default value is passed to the combiner function.
    • anyOf

      @SafeVarargs public static <T> Parser<T> anyOf(Parser<? extends T>... parsers)
      Matches if any of the given parsers match.
    • or

      public static <T> Collector<Parser<? extends T>, ?, Parser<T>> or()
      Returns a collector that results in a parser that matches if any of the input parsers match.
    • or

      public final Parser<T> or(Parser<T> that)
      Matches if this or that matches.
    • atLeastOnce

      public final Parser<List<T>> atLeastOnce()
      Returns a parser that applies this parser at least once, greedily.
    • atLeastOnce

      public final <A,R> Parser<R> atLeastOnce(Collector<? super T, A, ? extends R> collector)
      Returns a parser that applies this parser at least once, greedily, and collects the return values using collector.
    • atLeastOnceDelimitedBy

      public final Parser<List<T>> atLeastOnceDelimitedBy(String delimiter)
      Returns a parser that matches the current parser at least once, delimited by the given delimiter.

      For example if you want to express the regex pattern (a|b|c), you can use:

      
       Parser.anyOf(string("a"), string("b"), string("c"))
           .atLeastOnceDelimitedBy("|")
       
    • atLeastOnceDelimitedBy

      public final <A,R> Parser<R> atLeastOnceDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector)
      Returns a parser that matches the current parser at least once, delimited by the given delimiter.

      For example if you want to express the regex pattern (a|b|c), you can use:

      
       Parser.anyOf(string("a"), string("b"), string("c"))
           .atLeastOnceDelimitedBy("|", RegexPattern.asAlternation())
       
    • zeroOrMore

      public static Parser<String>.OrEmpty zeroOrMore(CharPredicate charsToMatch, String name)
      Starts a fluent chain for matching consecutive charsToMatch zero or more times. If no such character is found, empty string is the result.

      For example if you need to parse a quoted literal that's allowed to be empty:

      
       zeroOrMore(c -> c != '\'', "quoted").between("'", "'")
       
    • zeroOrMore

      public final Parser<List<T>>.OrEmpty zeroOrMore()
      Starts a fluent chain for matching the current parser zero or more times.

      For example if you want to parse a list of statements between a pair of curly braces, you can use:

      
       statement.zeroOrMore().between("{", "}")
       
    • zeroOrMore

      public final <A,R> Parser<R>.OrEmpty zeroOrMore(Collector<? super T, A, ? extends R> collector)
      Starts a fluent chain for matching the current parser zero or more times. collector is used to collect the parsed results and the empty collector result will be used if this parser matches zero times.

      For example if you want to parse a list of statements between a pair of curly braces, you can use:

      
       statement.zeroOrMore(toBlock()).between("{", "}")
       
    • zeroOrMoreDelimitedBy

      public final Parser<List<T>>.OrEmpty zeroOrMoreDelimitedBy(String delimiter)
      Starts a fluent chain for matching the current parser zero or more times, delimited by delimiter.

      For example if you want to parse a list of names [a,b,c], you can use:

      
       consecutive(ALPHA, "item")
           .zeroOrMoreDelimitedBy(",")
           .between("[", "]")
       
    • zeroOrMoreDelimitedBy

      public final <A,R> Parser<R>.OrEmpty zeroOrMoreDelimitedBy(String delimiter, Collector<? super T, A, ? extends R> collector)
      Starts a fluent chain for matching the current parser zero or more times, delimited by delimiter. collector is used to collect the parsed results and the empty collector result will be used if this parser matches zero times.

      For example if you want to parse a set of names [a,b,c], you can use:

      
       consecutive(ALPHA, "item")
           .zeroOrMoreDelimitedBy(",", toImmutableSet())
           .between("[", "]")
       
    • prefix

      public final Parser<T> prefix(Parser<? extends UnaryOperator<T>> operator)
      Returns a parser that applies the operator parser zero or more times before this and applies the result unary operator functions iteratively.

      For infix operator support, consider using OperatorTable.

    • postfix

      public final Parser<T> postfix(Parser<? extends UnaryOperator<T>> operator)
      Returns a parser that after this parser succeeds, applies the operator parser zero or more times and apply the result unary operator function iteratively.

      This is useful to parse postfix operators such as in regex the quantifiers are usually postfix.

      For infix operator support, consider using OperatorTable.

    • between

      public final Parser<T> between(String open, String close)
      Returns a parser that matches the current parser immediately enclosed between open and close, which are non-empty string delimiters.
    • between

      public final Parser<T> between(Parser<?> open, Parser<?> close)
      Returns a parser that matches the current parser immediately enclosed between open and close.
    • map

      public final <R> Parser<R> map(Function<? super T, ? extends R> f)
      If this parser matches, returns the result of applying the given function to the match.
    • flatMap

      public final <R> Parser<R> flatMap(Function<? super T, Parser<R>> f)
      If this parser matches, applies function f to get the next parser to match in sequence.
    • thenReturn

      public final <R> Parser<R> thenReturn(R result)
      If this parser matches, returns the given result.
    • then

      public final <R> Parser<R> then(Parser<R> next)
      If this parser matches, applies the given parser on the remaining input.
    • then

      public final <R> Parser<R> then(Parser<R>.OrEmpty next)
      If this parser matches, applies the given optional (or zero-or-more) parser on the remaining input.
    • followedBy

      public final Parser<T> followedBy(String suffix)
      If this parser matches, continue to match suffix.
    • followedBy

      public final Parser<T> followedBy(Parser<?> suffix)
      If this parser matches, continue to match suffix.
    • followedBy

      public final <X> Parser<T> followedBy(Parser<X>.OrEmpty suffix)
      If this parser matches, continue to match the optional suffix.
    • optionallyFollowedBy

      public final Parser<T> optionallyFollowedBy(String suffix)
      Returns an equivalent parser except it allows suffix if present.
    • optionallyFollowedBy

      public final Parser<T> optionallyFollowedBy(String suffix, Function<? super T, ? extends T> op)
      If this parser matches, optionally applies the op function if the pattern is followed by suffix.
    • optionallyFollowedBy

      public final Parser<T> optionallyFollowedBy(Parser<? extends UnaryOperator<T>> suffix)
      Returns an equivalent parser except it will optionally apply the unary operator resulting from suffix.
    • notFollowedBy

      public final Parser<T> notFollowedBy(String suffix)
      A form of negative lookahead such that the match is rejected if followed by suffix.
    • notFollowedBy

      public final Parser<T> notFollowedBy(Parser<?> suffix, String name)
      A form of negative lookahead such that the match is rejected if followed by suffix.
    • orElse

      public final Parser<T>.OrEmpty orElse(T defaultValue)
      Starts a fluent chain for matching the current parser optionally. defaultValue will be the result in case the current parser doesn't match.

      For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:

      
       consecutive(ALPHA, "placeholder name")
           .orElse(EMPTY_PLACEHOLDER)
           .between("{", "}")
       
    • optional

      public final Parser<Optional<T>>.OrEmpty optional()
      Starts a fluent chain for matching the current parser optionally. Optional.empty() will be the result in case the current parser doesn't match.

      For example if you want to parse an optional placeholder name enclosed by curly braces, you can use:

      
       consecutive(ALPHA, "placeholder name")
           .optional()
           .between("{", "}")
       
    • expecting

      public final Parser<T> expecting(Predicate<? super T> predicate, String name)
      Returns an equivalent parser that fails if the parser result doesn't satisfy the given predicate. For example, you could use it to implement a parser for keywords:
      
       consecutive(ALPHA, "word").expecting(SUPPORTED_KEYWORDS::contains, "keyword")
       
    • literally

      public static <T> Parser<T> literally(Parser<T> parser)
      Returns an equivalent parser that suppresses character skipping that's otherwise applied if parseSkipping() or parseToStreamSkipping() are called. For example quoted string literals should not skip whitespaces.
    • literally

      public static <T> Parser<T>.OrEmpty literally(Parser<T>.OrEmpty rule)
      Specifies that the optional (or zero-or-more) rule should be matched literally even if parseSkipping() or parseToStreamSkipping() is called.
    • parseSkipping

      public final T parseSkipping(Parser<?> skip, String input)
      Parses input while skipping patterns matched by skip around atomic matches.
    • parseSkipping

      public final T parseSkipping(CharPredicate charsToSkip, String input)
      Parses input while charsToSkip around atomic matches.
    • parseToStreamSkipping

      public final Stream<T> parseToStreamSkipping(Parser<?> skip, String input)
      Parses input to a lazy stream while skipping patterns matched by skip around atomic matches.
    • parseToStreamSkipping

      public final Stream<T> parseToStreamSkipping(CharPredicate charsToSkip, String input)
      Parses input to a lazy stream while charsToSkip around atomic matches.
    • parse

      public final T parse(String input)
      Parses the entire input string and returns the result. Upon successful return, the input is fully consumed.
      Throws:
      Parser.ParseException - if the input cannot be parsed.
    • parseToStream

      public final Stream<T> parseToStream(String input)
      Parses the entire input string lazily by applying this parser repeatedly until the end of input. Results are returned in a lazy stream.