Class Substring
extracted
, removed
, replaced
, or used to divide the input string into parts.
For example, to strip off the "http://" prefix from a uri string if present:
static String stripHttp(String uri) { return Substring.prefix("http://").removeFrom(uri); }To strip off either an "http://" or "https://" prefix if present:
static import com.google.util.Substring.prefix; static String stripHttpOrHttps(String uri) { return prefix("http://").or(prefix("https://")).removeFrom(uri); }To strip off a suffix starting with a dash (-) character:
static String stripDashSuffix(String str) { return last('-').toEnd().removeFrom(str); }To replace a trailing "//" with "/":
static String fixTrailingSlash(String str) { return Substring.suffix("//").replaceFrom(str, '/'); }To extract the 'name' and 'value' from an input string in the format of "name:value":
Substring.first(':') .split("name:joe") .map(NameValue::new) .orElseThrow(BadFormatException::new);To parse key-value pairs:
import static com.google.mu.util.stream.GuavaCollectors.toImmutableListMultimap;
ImmutableListMultimap<String, String> tags =
first(',')
.repeatedly()
.splitThenTrimKeyValuesAround(first('='), "k1=v1, k2=v2") // => [(k1, v1), (k2, v2)]
.collect(toImmutableListMultimap());
To replace the placeholders in a text with values (although do consider using a proper templating
framework because it's a security vulnerability if your values come from untrusted sources like
the user inputs):
ImmutableMap<String, String> variables =
ImmutableMap.of("who", "Arya Stark", "where", "Braavos");
String rendered =
spanningInOrder("{", "}")
.repeatedly()
.replaceAllFrom(
"{who} went to {where}.",
placeholder -> variables.get(placeholder.skip(1, 1).toString()));
assertThat(rendered).isEqualTo("Arya Stark went to Braavos.");
- Since:
- 2.0
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
The style of the bounds of a match.static final class
The result of successfully matching aSubstring.Pattern
against a string, providing access to thematched substring
, to the parts of the stringbefore
andafter
it, and to copies with the matched substringremoved
orreplaced
.static class
A pattern that can be matched against a string, finding a single substring from it.static final class
An immutable string prefixPattern
with extra utilities such asSubstring.Prefix.addToIfAbsent(String)
,Substring.Prefix.removeFrom(StringBuilder)
,Substring.Prefix.isIn(CharSequence)
etc.static class
A substring pattern to be applied repeatedly on the input string, each time over the remaining substring after the previous match.static final class
An immutable string suffixPattern
with extra utilities such asSubstring.Suffix.addToIfAbsent(String)
,Substring.Suffix.removeFrom(StringBuilder)
,Substring.Suffix.isIn(CharSequence)
etc. -
Field Summary
Modifier and TypeFieldDescriptionstatic final Substring.Pattern
Pattern
that matches the empty substring at the beginning of the input string.static final Substring.Pattern
Pattern
that matches the empty substring at the end of the input string.static final Substring.Pattern
Pattern
that never matches any substring. -
Method Summary
Modifier and TypeMethodDescriptionstatic Substring.Pattern
after
(Substring.Pattern delimiter) Returns aPattern
that covers the substring afterdelimiter
.static Substring.Pattern
before
(Substring.Pattern delimiter) Returns aPattern
that covers the substring beforedelimiter
.static Substring.Pattern
between
(char open, char close) Returns aPattern
that will match the substring between the firstopen
and the firstclose
after it.static Substring.Pattern
between
(char open, Substring.BoundStyle openBound, char close, Substring.BoundStyle closeBound) Similar tobetween(char, char)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.static Substring.Pattern
between
(Substring.Pattern open, Substring.BoundStyle openBound, Substring.Pattern close, Substring.BoundStyle closeBound) Similar tobetween(Pattern, Pattern)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.static Substring.Pattern
between
(Substring.Pattern open, Substring.Pattern close) Returns aPattern
that will match the substring betweenopen
andclose
.static Substring.Pattern
between
(String open, Substring.BoundStyle openBound, String close, Substring.BoundStyle closeBound) Similar tobetween(String, String)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.static Substring.Pattern
Returns aPattern
that will match the substring between the firstopen
and the firstclose
after it.static Substring.Pattern
consecutive
(CharPredicate matcher) Returns aPattern
that matches the first non-empty sequence of consecutive characters identified bymatcher
.static Substring.Pattern
first
(char character) Returns aPattern
that matches the first occurrence ofcharacter
.static Substring.Pattern
first
(CharPredicate charMatcher) Returns aPattern
that matches the first character found bycharMatcher
.static Substring.Pattern
Returns aPattern
that matches the first occurrence ofstr
.static Substring.Pattern
Returns aPattern
that matches the first occurrence ofregexPattern
.static Substring.Pattern
Returns aPattern
that matches the first occurrence ofregexPattern
and then selects the capturing group identified bygroup
.static Collector
<Substring.Pattern, ?, Substring.Pattern> Returns aCollector
that collects the input candidateSubstring.Pattern
and reults in a pattern that matches whichever that occurs first in the input string.static Substring.Pattern
last
(char character) Returns aPattern
that matches the last occurrence ofcharacter
.static Substring.Pattern
last
(CharPredicate charMatcher) Returns aPattern
that matches the last character found bycharMatcher
.static Substring.Pattern
Returns aPattern
that matches the last occurrence ofstr
.static Substring.Pattern
leading
(CharPredicate matcher) Returns aPattern
that matches from the beginning of the input string, a non-empty sequence of leading characters identified bymatcher
.static Substring.Prefix
prefix
(char prefix) Returns aPrefix
pattern that matches strings starting withprefix
.static Substring.Prefix
Returns aPrefix
pattern that matches strings starting withprefix
.static Substring.Pattern
spanningInOrder
(String stop1, String stop2, String... moreStops) Returns aPattern
that matches the first occurrence ofstop1
, followed by an occurrence ofstop2
, followed sequentially by occurrences ofmoreStops
in order, including any characters between consecutive stops.static Substring.Suffix
suffix
(char suffix) Returns aSuffix
pattern that matches strings ending withsuffix
.static Substring.Suffix
Returns aSuffix
pattern that matches strings ending withsuffix
.static Substring.RepeatingPattern
topLevelGroups
(Pattern regexPattern) Returns a repeating pattern representing all the top-level groups fromregexPattern
.static Substring.Pattern
trailing
(CharPredicate matcher) Returns aPattern
that matches from the end of the input string, a non-empty sequence of trailing characters identified bymatcher
.static Substring.Pattern
upToIncluding
(Substring.Pattern pattern) Returns aPattern
that will match from the beginning of the original string up to the substring matched bypattern
inclusively.static Substring.Pattern
word()
Returns aPattern
that matches the first occurrence of a word composed of[a-zA-Z0-9_]
characters.static Substring.Pattern
Returns aPattern
that matches the first occurrence ofword
that isn't immediately preceded or followed by another "word" ([a-zA-Z0-9_]
) character.
-
Field Details
-
NONE
Pattern
that never matches any substring. -
BEGINNING
Pattern
that matches the empty substring at the beginning of the input string. Typically used to represent an optional delimiter. For example, the following pattern matches the substring after optional "header_name=":static final Substring.Pattern VALUE = Substring.after(first('=').or(BEGINNING));
-
END
Pattern
that matches the empty substring at the end of the input string. Typically used to represent an optional delimiter. For example, the following pattern matches the text between the first occurrence of the string "id=" and the end of that line, or the end of the string:static final Substring.Pattern ID = Substring.between(substring("id="), substring("\n").or(END));
-
-
Method Details
-
prefix
Returns aPrefix
pattern that matches strings starting withprefix
.Typically if you have a
String
constant representing a prefix, consider to declare aSubstring.Prefix
constant instead. The type is more explicit, and utilitiy methods likeSubstring.Pattern.removeFrom(java.lang.String)
,Substring.Pattern.from(java.lang.CharSequence)
are easier to discover and use. -
prefix
Returns aPrefix
pattern that matches strings starting withprefix
.Typically if you have a
char
constant representing a prefix, consider to declare aSubstring.Prefix
constant instead. The type is more explicit, and utilitiy methods likeSubstring.Pattern.removeFrom(java.lang.String)
,Substring.Pattern.from(java.lang.CharSequence)
are easier to discover and use. -
suffix
Returns aSuffix
pattern that matches strings ending withsuffix
.Typically if you have a
String
constant representing a suffix, consider to declare aSubstring.Suffix
constant instead. The type is more explicit, and utilitiy methods likeSubstring.Pattern.removeFrom(java.lang.String)
,Substring.Pattern.from(java.lang.CharSequence)
are easier to discover and use. -
suffix
Returns aSuffix
pattern that matches strings ending withsuffix
.Typically if you have a
char
constant representing a suffix, consider to declare aSubstring.Suffix
constant instead. The type is more explicit, and utilitiy methods likeSubstring.Pattern.removeFrom(java.lang.String)
,Substring.Pattern.from(java.lang.CharSequence)
are easier to discover and use. -
first
Returns aPattern
that matches the first occurrence ofstr
. -
first
Returns aPattern
that matches the first occurrence ofcharacter
. -
first
Returns aPattern
that matches the first character found bycharMatcher
.- Since:
- 6.0
-
last
Returns aPattern
that matches the last character found bycharMatcher
.- Since:
- 6.0
-
first
Returns aPattern
that matches the first occurrence ofregexPattern
.Unlike
str.replaceFirst(regexPattern, replacement)
,first(regexPattern).replaceFrom(str, replacement)
treats thereplacement
as a literal string, with no special handling of backslash (\) and dollar sign ($) characters. -
word
Returns aPattern
that matches the first occurrence of a word composed of[a-zA-Z0-9_]
characters.- Since:
- 6.0
-
word
Returns aPattern
that matches the first occurrence ofword
that isn't immediately preceded or followed by another "word" ([a-zA-Z0-9_]
) character.For example, if you are looking for an English word "cat" in the string "catchie has a cat",
first("cat")
won't work because it'll match the first three letters of "cathie". Instead, you should useword("cat")
to skip over "cathie".If your word boundary isn't equivalent to the regex
\W
character class, you can define your own word boundaryCharMatcher
and then useSubstring.Pattern.separatedBy(com.google.mu.util.CharPredicate)
instead. Say, if your word is lower-case alpha with dash ('-'), then:CharMatcher boundary = CharMatcher.inRange('a', 'z').or(CharMatcher.is('-')).negate(); Substring.Pattern petFriendly = first("pet-friendly").separatedBy(boundary);
- Since:
- 6.0
-
leading
Returns aPattern
that matches from the beginning of the input string, a non-empty sequence of leading characters identified bymatcher
.For example:
leading(javaLetter()).from("System.err")
will result in"System"
.- Since:
- 6.0
-
trailing
Returns aPattern
that matches from the end of the input string, a non-empty sequence of trailing characters identified bymatcher
.For example:
trailing(digit()).from("60612-3588")
will result in"3588"
.- Since:
- 6.0
-
consecutive
Returns aPattern
that matches the first non-empty sequence of consecutive characters identified bymatcher
.For example:
consecutive(javaLetter()).from("(System.out)")
will find"System"
, andconsecutive(javaLetter()).repeatedly().from("(System.out)")
will produce["System", "out"]
.Equivalent to
matcher.collapseFrom(string, replacement)
, you can doconsecutive(matcher).repeatedly().replaceAllFrom(string, replacement)
. But you can also do things other than collapsing these consecutive groups, for example to inspect their values and replace conditionally:consecutive(matcher).repeatedly().replaceAllFrom(string, group -> ...)
, or other more sophisticated use cases like building index maps of these sub sequences.- Since:
- 6.0
-
topLevelGroups
Returns a repeating pattern representing all the top-level groups fromregexPattern
. IfregexPattern
has no capture group, the entire pattern is considered the only group.For example,
topLevelGroups(compile("(g+)(o+)")).from("ggooo")
will return["gg", "ooo"]
.Nested capture groups are not taken into account. For example:
topLevelGroups(compile("((foo)+(bar)*)(zoo)")).from("foofoobarzoo")
will return["foofoobar", "zoo"]
.Note that the top-level groups are statically determined by the
regexPattern
. Particularly, quantifiers on a capture group do not increase or decrease the number of captured groups. That is, when matching"(foo)+"
against"foofoofoo"
, there will only be one top-level group, with"foo"
as the value.- Since:
- 5.3
-
first
Returns aPattern
that matches the first occurrence ofregexPattern
and then selects the capturing group identified bygroup
.For example, the following pattern finds the shard number (12) from a string like
12-of-99
:import java.util.regex.Pattern; private static final Substring.Pattern SHARD_NUMBER = Substring.first(Pattern.compile("(\\d+)-of-\\d+"), 1);
- Throws:
IndexOutOfBoundsException
- ifgroup
is negative or exceeds the number of capturing groups inregexPattern
.
-
firstOccurrence
Returns aCollector
that collects the input candidateSubstring.Pattern
and reults in a pattern that matches whichever that occurs first in the input string. For example you can use it to find the first occurrence of any reserved word in a set:Substring.Pattern reserved = Stream.of("if", "else", "for", "public") .map(Substring::word) .collect(firstOccurrence());
- Since:
- 6.1
-
spanningInOrder
Returns aPattern
that matches the first occurrence ofstop1
, followed by an occurrence ofstop2
, followed sequentially by occurrences ofmoreStops
in order, including any characters between consecutive stops.Note that with more than two stops and if all the stops are literals, you may want to use
StringFormat.span()
instead.For example, to find hyperlinks like
<a href="...">...</a>
, you can useStringFormat.span("<a href=\"{link}\">{...}</a>")
, which is equivalent tospanningInOrder("<a href=\"", "\">", "</a>")
but more self-documenting with proper placeholder names. -
last
Returns aPattern
that matches the last occurrence ofstr
. -
last
Returns aPattern
that matches the last occurrence ofcharacter
. -
before
Returns aPattern
that covers the substring beforedelimiter
. For example:String file = "/home/path/file.txt"; String path = Substring.before(last('/')).from(file).orElseThrow(...); assertThat(path).isEqualTo("/home/path");
-
after
Returns aPattern
that covers the substring afterdelimiter
. For example:String file = "/home/path/file.txt"; String ext = Substring.after(last('.')).from(file).orElseThrow(...); assertThat(ext).isEqualTo("txt");
-
upToIncluding
Returns aPattern
that will match from the beginning of the original string up to the substring matched bypattern
inclusively. For example:String uri = "http://google.com"; String schemeStripped = upToIncluding(first("://")).removeFrom(uri); assertThat(schemeStripped).isEqualTo("google.com");
To match from the start of
pattern
to the end of the original string, useSubstring.Pattern.toEnd()
instead. -
between
Returns aPattern
that will match the substring between the firstopen
and the firstclose
after it.If for example you need to find the substring between the first
"<-"
and the last"->"
, usebetween(first("<-"), last("->"))
instead.- Since:
- 6.0
-
between
public static Substring.Pattern between(String open, Substring.BoundStyle openBound, String close, Substring.BoundStyle closeBound) Similar tobetween(String, String)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.- Since:
- 7.2
-
between
Returns aPattern
that will match the substring between the firstopen
and the firstclose
after it.If for example you need to find the substring between the first and the last
'/'
, usebetween(first('/'), last('/'))
instead.- Since:
- 6.0
-
between
public static Substring.Pattern between(char open, Substring.BoundStyle openBound, char close, Substring.BoundStyle closeBound) Similar tobetween(char, char)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.- Since:
- 7.2
-
between
Returns aPattern
that will match the substring betweenopen
andclose
. For example the following pattern finds the link text in markdown syntax:private static final Substring.Pattern DEPOT_PATH = Substring.between(first("//depot/"), last('/')); assertThat(DEPOT_PATH.from("//depot/google3/foo/bar/baz.txt")).hasValue("google3/foo/bar");
-
between
public static Substring.Pattern between(Substring.Pattern open, Substring.BoundStyle openBound, Substring.Pattern close, Substring.BoundStyle closeBound) Similar tobetween(Pattern, Pattern)
but allows to use alternative bound styles to include or exclude the delimiters at both ends.- Since:
- 7.2
-