Class Substring

java.lang.Object
com.google.mu.util.Substring

public final class Substring extends Object
Utilities for creating patterns that attempt to match a substring in an input string. The matched substring can be extracted, removed, replaced, or used to divide the input string into parts.

For example, to strip off the "http://" prefix from a uri string if present:

   static String stripHttp(String uri) {
     return Substring.prefix("http://").removeFrom(uri);
   }
 
To strip off either an "http://" or "https://" prefix if present:
   static import com.google.util.Substring.prefix;

   static String stripHttpOrHttps(String uri) {
     return prefix("http://").or(prefix("https://")).removeFrom(uri);
   }
 
To strip off a suffix starting with a dash (-) character:
   static String stripDashSuffix(String str) {
     return last('-').toEnd().removeFrom(str);
   }
 
To replace a trailing "//" with "/":
   static String fixTrailingSlash(String str) {
     return Substring.suffix("//").replaceFrom(str, '/');
   }
 
To extract the 'name' and 'value' from an input string in the format of "name:value":
   Substring.first(':')
       .split("name:joe")
       .map(NameValue::new)
       .orElseThrow(BadFormatException::new);
 
To parse key-value pairs:

 import static com.google.mu.util.stream.GuavaCollectors.toImmutableListMultimap;

 ImmutableListMultimap<String, String> tags =
     first(',')
         .repeatedly()
         .splitThenTrimKeyValuesAround(first('='), "k1=v1, k2=v2")  // => [(k1, v1), (k2, v2)]
         .collect(toImmutableListMultimap());
 
To replace the placeholders in a text with values (although do consider using a proper templating framework because it's a security vulnerability if your values come from untrusted sources like the user inputs):

 ImmutableMap<String, String> variables =
     ImmutableMap.of("who", "Arya Stark", "where", "Braavos");
 String rendered =
     spanningInOrder("{", "}")
         .repeatedly()
         .replaceAllFrom(
             "{who} went to {where}.",
             placeholder -> variables.get(placeholder.skip(1, 1).toString()));
 assertThat(rendered).isEqualTo("Arya Stark went to Braavos.");
 
Since:
2.0
  • Field Details

    • NONE

      public static final Substring.Pattern NONE
      Pattern that never matches any substring.
    • BEGINNING

      public static final Substring.Pattern BEGINNING
      Pattern that matches the empty substring at the beginning of the input string. Typically used to represent an optional delimiter. For example, the following pattern matches the substring after optional "header_name=":
       static final Substring.Pattern VALUE = Substring.after(first('=').or(BEGINNING));
       
    • END

      public static final Substring.Pattern END
      Pattern that matches the empty substring at the end of the input string. Typically used to represent an optional delimiter. For example, the following pattern matches the text between the first occurrence of the string "id=" and the end of that line, or the end of the string:
       static final Substring.Pattern ID =
           Substring.between(substring("id="), substring("\n").or(END));
       
  • Method Details

    • prefix

      public static Substring.Prefix prefix(String prefix)
      Returns a Prefix pattern that matches strings starting with prefix.

      Typically if you have a String constant representing a prefix, consider to declare a Substring.Prefix constant instead. The type is more explicit, and utilitiy methods like Substring.Pattern.removeFrom(java.lang.String), Substring.Pattern.from(java.lang.CharSequence) are easier to discover and use.

    • prefix

      public static Substring.Prefix prefix(char prefix)
      Returns a Prefix pattern that matches strings starting with prefix.

      Typically if you have a char constant representing a prefix, consider to declare a Substring.Prefix constant instead. The type is more explicit, and utilitiy methods like Substring.Pattern.removeFrom(java.lang.String), Substring.Pattern.from(java.lang.CharSequence) are easier to discover and use.

    • suffix

      public static Substring.Suffix suffix(String suffix)
      Returns a Suffix pattern that matches strings ending with suffix.

      Typically if you have a String constant representing a suffix, consider to declare a Substring.Suffix constant instead. The type is more explicit, and utilitiy methods like Substring.Pattern.removeFrom(java.lang.String), Substring.Pattern.from(java.lang.CharSequence) are easier to discover and use.

    • suffix

      public static Substring.Suffix suffix(char suffix)
      Returns a Suffix pattern that matches strings ending with suffix.

      Typically if you have a char constant representing a suffix, consider to declare a Substring.Suffix constant instead. The type is more explicit, and utilitiy methods like Substring.Pattern.removeFrom(java.lang.String), Substring.Pattern.from(java.lang.CharSequence) are easier to discover and use.

    • first

      public static Substring.Pattern first(String str)
      Returns a Pattern that matches the first occurrence of str.
    • first

      public static Substring.Pattern first(char character)
      Returns a Pattern that matches the first occurrence of character.
    • first

      public static Substring.Pattern first(CharPredicate charMatcher)
      Returns a Pattern that matches the first character found by charMatcher.
      Since:
      6.0
    • last

      public static Substring.Pattern last(CharPredicate charMatcher)
      Returns a Pattern that matches the last character found by charMatcher.
      Since:
      6.0
    • first

      public static Substring.Pattern first(Pattern regexPattern)
      Returns a Pattern that matches the first occurrence of regexPattern.

      Unlike str.replaceFirst(regexPattern, replacement),

      first(regexPattern).replaceFrom(str, replacement)
      treats the replacement as a literal string, with no special handling of backslash (\) and dollar sign ($) characters.
    • word

      public static Substring.Pattern word()
      Returns a Pattern that matches the first occurrence of a word composed of [a-zA-Z0-9_] characters.
      Since:
      6.0
    • word

      public static Substring.Pattern word(String word)
      Returns a Pattern that matches the first occurrence of word that isn't immediately preceded or followed by another "word" ([a-zA-Z0-9_]) character.

      For example, if you are looking for an English word "cat" in the string "catchie has a cat", first("cat") won't work because it'll match the first three letters of "cathie". Instead, you should use word("cat") to skip over "cathie".

      If your word boundary isn't equivalent to the regex \W character class, you can define your own word boundary CharMatcher and then use Substring.Pattern.separatedBy(com.google.mu.util.CharPredicate) instead. Say, if your word is lower-case alpha with dash ('-'), then:

      
       CharMatcher boundary = CharMatcher.inRange('a', 'z').or(CharMatcher.is('-')).negate();
       Substring.Pattern petFriendly = first("pet-friendly").separatedBy(boundary);
       
      Since:
      6.0
    • leading

      public static Substring.Pattern leading(CharPredicate matcher)
      Returns a Pattern that matches from the beginning of the input string, a non-empty sequence of leading characters identified by matcher.

      For example: leading(javaLetter()).from("System.err") will result in "System".

      Since:
      6.0
    • trailing

      public static Substring.Pattern trailing(CharPredicate matcher)
      Returns a Pattern that matches from the end of the input string, a non-empty sequence of trailing characters identified by matcher.

      For example: trailing(digit()).from("60612-3588") will result in "3588".

      Since:
      6.0
    • consecutive

      public static Substring.Pattern consecutive(CharPredicate matcher)
      Returns a Pattern that matches the first non-empty sequence of consecutive characters identified by matcher.

      For example: consecutive(javaLetter()).from("(System.out)") will find "System", and consecutive(javaLetter()).repeatedly().from("(System.out)") will produce ["System", "out"].

      Equivalent to matcher.collapseFrom(string, replacement), you can do consecutive(matcher).repeatedly().replaceAllFrom(string, replacement). But you can also do things other than collapsing these consecutive groups, for example to inspect their values and replace conditionally: consecutive(matcher).repeatedly().replaceAllFrom(string, group -> ...), or other more sophisticated use cases like building index maps of these sub sequences.

      Since:
      6.0
    • topLevelGroups

      public static Substring.RepeatingPattern topLevelGroups(Pattern regexPattern)
      Returns a repeating pattern representing all the top-level groups from regexPattern. If regexPattern has no capture group, the entire pattern is considered the only group.

      For example, topLevelGroups(compile("(g+)(o+)")).from("ggooo") will return ["gg", "ooo"].

      Nested capture groups are not taken into account. For example: topLevelGroups(compile("((foo)+(bar)*)(zoo)")).from("foofoobarzoo") will return ["foofoobar", "zoo"].

      Note that the top-level groups are statically determined by the regexPattern. Particularly, quantifiers on a capture group do not increase or decrease the number of captured groups. That is, when matching "(foo)+" against "foofoofoo", there will only be one top-level group, with "foo" as the value.

      Since:
      5.3
    • first

      public static Substring.Pattern first(Pattern regexPattern, int group)
      Returns a Pattern that matches the first occurrence of regexPattern and then selects the capturing group identified by group.

      For example, the following pattern finds the shard number (12) from a string like 12-of-99:

         import java.util.regex.Pattern;
      
         private static final Substring.Pattern SHARD_NUMBER =
             Substring.first(Pattern.compile("(\\d+)-of-\\d+"), 1);
       
      Throws:
      IndexOutOfBoundsException - if group is negative or exceeds the number of capturing groups in regexPattern.
    • firstOccurrence

      public static Collector<Substring.Pattern,?,Substring.Pattern> firstOccurrence()
      Returns a Collector that collects the input candidate Substring.Pattern and reults in a pattern that matches whichever that occurs first in the input string. For example you can use it to find the first occurrence of any reserved word in a set:
      
       Substring.Pattern reserved =
           Stream.of("if", "else", "for", "public")
               .map(Substring::word)
               .collect(firstOccurrence());
       
      Since:
      6.1
    • spanningInOrder

      public static Substring.Pattern spanningInOrder(String stop1, String stop2, String... moreStops)
      Returns a Pattern that matches the first occurrence of stop1, followed by an occurrence of stop2, followed sequentially by occurrences of moreStops in order, including any characters between consecutive stops.

      Note that with more than two stops and if all the stops are literals, you may want to use StringFormat.span() instead.

      For example, to find hyperlinks like <a href="...">...</a>, you can use StringFormat.span("<a href=\"{link}\">{...}</a>"), which is equivalent to spanningInOrder("<a href=\"", "\">", "</a>") but more self-documenting with proper placeholder names.

    • last

      public static Substring.Pattern last(String str)
      Returns a Pattern that matches the last occurrence of str.
    • last

      public static Substring.Pattern last(char character)
      Returns a Pattern that matches the last occurrence of character.
    • before

      public static Substring.Pattern before(Substring.Pattern delimiter)
      Returns a Pattern that covers the substring before delimiter. For example:
         String file = "/home/path/file.txt";
         String path = Substring.before(last('/')).from(file).orElseThrow(...);
         assertThat(path).isEqualTo("/home/path");
       
    • after

      public static Substring.Pattern after(Substring.Pattern delimiter)
      Returns a Pattern that covers the substring after delimiter. For example:
         String file = "/home/path/file.txt";
         String ext = Substring.after(last('.')).from(file).orElseThrow(...);
         assertThat(ext).isEqualTo("txt");
       
    • upToIncluding

      public static Substring.Pattern upToIncluding(Substring.Pattern pattern)
      Returns a Pattern that will match from the beginning of the original string up to the substring matched by pattern inclusively. For example:
         String uri = "http://google.com";
         String schemeStripped = upToIncluding(first("://")).removeFrom(uri);
         assertThat(schemeStripped).isEqualTo("google.com");
       

      To match from the start of pattern to the end of the original string, use Substring.Pattern.toEnd() instead.

    • between

      public static Substring.Pattern between(String open, String close)
      Returns a Pattern that will match the substring between the first open and the first close after it.

      If for example you need to find the substring between the first "<-" and the last "->", use between(first("<-"), last("->")) instead.

      Since:
      6.0
    • between

      public static Substring.Pattern between(String open, Substring.BoundStyle openBound, String close, Substring.BoundStyle closeBound)
      Similar to between(String, String) but allows to use alternative bound styles to include or exclude the delimiters at both ends.
      Since:
      7.2
    • between

      public static Substring.Pattern between(char open, char close)
      Returns a Pattern that will match the substring between the first open and the first close after it.

      If for example you need to find the substring between the first and the last '/', use between(first('/'), last('/')) instead.

      Since:
      6.0
    • between

      public static Substring.Pattern between(char open, Substring.BoundStyle openBound, char close, Substring.BoundStyle closeBound)
      Similar to between(char, char) but allows to use alternative bound styles to include or exclude the delimiters at both ends.
      Since:
      7.2
    • between

      public static Substring.Pattern between(Substring.Pattern open, Substring.Pattern close)
      Returns a Pattern that will match the substring between open and close. For example the following pattern finds the link text in markdown syntax:
         private static final Substring.Pattern DEPOT_PATH =
             Substring.between(first("//depot/"), last('/'));
         assertThat(DEPOT_PATH.from("//depot/google3/foo/bar/baz.txt")).hasValue("google3/foo/bar");
       
    • between

      public static Substring.Pattern between(Substring.Pattern open, Substring.BoundStyle openBound, Substring.Pattern close, Substring.BoundStyle closeBound)
      Similar to between(Pattern, Pattern) but allows to use alternative bound styles to include or exclude the delimiters at both ends.
      Since:
      7.2