class RegularExpression
extends java.lang.Object
implements java.io.Serializable
RegularExpression re = new RegularExpression(regex); if (re.matches(text)) { ... }
RegularExpression re = new RegularExpression(regex);
Match match = new Match();
if (re.matches(text, match)) {
... // You can refer captured texts with methods of the Match
class.
}
RegularExpression re = new RegularExpression(regex, "i"); if (re.matches(text) >= 0) { ...}
You can specify options to RegularExpression(
regex,
options)
or setPattern(
regex,
options)
.
This options parameter consists of the following characters.
"i"
"m"
"s"
"u"
"w"
","
"X"
match()
method does not do subsring matching
but entire string matching.
Differences from the Perl 5 regular expression
|
Meta characters are `. * + ? { [ ( ) | \ ^ $'.
This range matches the character.
This range matches a character which has a code point that is >= C1's code point and <= C2's code point. + *
...
These expressions specifies the same ranges as the following expressions.
Enumerated ranges are merged (union operation). [a-ec-z] is equivalent to [a-z]
Match
instance
after matches(String,Match)
.
The 0th group means whole of this regular expression.
The Nth gorup is the inside of the Nth left parenthesis.
For instance, a regular expression is " *([^<:]*) +<([^>]*)> *" and target text is "From: TAMURA Kent <kent@trl.ibm.co.jp>":
Match.getCapturedText(0)
:
" TAMURA Kent <kent@trl.ibm.co.jp>"
Match.getCapturedText(1)
: "TAMURA Kent"
Match.getCapturedText(2)
: "kent@trl.ibm.co.jp"
regex ::= ('(?' options ')')? term ('|' term)* term ::= factor+ factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )? | '(?#' [^)]* ')' minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}' atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9] | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X' | '(?>' regex ')' | '(?' options ':' regex ')' | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')' options ::= [imsw]* ('-' [imsw]+)? anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\<' | '\>' looks ::= '(?=' regex ')' | '(?!' regex ')' | '(?<=' regex ')' | '(?<!' regex ')' char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1 category-block ::= '\' [pP] category-symbol-1 | ('\p{' | '\P{') (category-symbol | block-name | other-properties) '}' category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S' category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo' | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No' | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs' | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po' | 'Sm' | 'Sc' | 'Sk' | 'So' block-name ::= (See above) other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED' character-1 ::= (any character except meta-characters) char-class ::= '[' ranges ']' | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')' ranges ::= '^'? (range ','?)+ range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block | range-char | range-char '-' range-char range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2 code-point ::= '\x' hex-char hex-char | '\x{' hex-char+ '}' | '\v' hex-char hex-char hex-char hex-char hex-char hex-char hex-char ::= [0-9a-fA-F] character-2 ::= (any character except \[]-,)
Modifier and Type | Class and Description |
---|---|
(package private) static class |
RegularExpression.Context |
Modifier and Type | Field and Description |
---|---|
(package private) static int |
CARRIAGE_RETURN |
(package private) RegularExpression.Context |
context |
(package private) static boolean |
DEBUG |
(package private) static int |
EXTENDED_COMMENT
"x"
|
(package private) RangeToken |
firstChar |
(package private) java.lang.String |
fixedString |
(package private) boolean |
fixedStringOnly |
(package private) int |
fixedStringOptions |
(package private) BMPattern |
fixedStringTable |
(package private) boolean |
hasBackReferences |
(package private) static int |
IGNORE_CASE
"i"
|
(package private) static int |
LINE_FEED |
(package private) static int |
LINE_SEPARATOR |
(package private) int |
minlength |
(package private) static int |
MULTIPLE_LINES
"m"
|
(package private) int |
nofparen
The number of parenthesis in the regular expression.
|
(package private) int |
numberOfClosures |
(package private) Op |
operations |
(package private) int |
options |
(package private) static int |
PARAGRAPH_SEPARATOR |
(package private) static int |
PROHIBIT_FIXED_STRING_OPTIMIZATION
"F"
|
(package private) static int |
PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
"H"
|
(package private) java.lang.String |
regex
A regular expression.
|
(package private) static int |
SINGLE_LINE
"s"
|
(package private) static int |
SPECIAL_COMMA
",".
|
(package private) Token |
tokentree
Internal representation of the regular expression.
|
(package private) static int |
UNICODE_WORD_BOUNDARY
An option.
|
(package private) static int |
USE_UNICODE_CATEGORY
This option redefines \d \D \w \W \s \S.
|
private static int |
WT_IGNORE |
private static int |
WT_LETTER |
private static int |
WT_OTHER |
(package private) static int |
XMLSCHEMA_MODE
"X".
|
Constructor and Description |
---|
RegularExpression(java.lang.String regex)
Creates a new RegularExpression instance.
|
RegularExpression(java.lang.String regex,
java.lang.String options)
Creates a new RegularExpression instance with options.
|
RegularExpression(java.lang.String regex,
Token tok,
int parens,
boolean hasBackReferences,
int options) |
Modifier and Type | Method and Description |
---|---|
private void |
compile(Token tok)
Compiles a token tree into an operation flow.
|
private Op |
compile(Token tok,
Op next,
boolean reverse)
Converts a token to an operation.
|
boolean |
equals(java.lang.Object obj)
Return true if patterns are the same and the options are equivalent.
|
(package private) boolean |
equals(java.lang.String pattern,
int options) |
int |
getNumberOfGroups()
Return the number of regular expression groups.
|
java.lang.String |
getOptions()
Returns a option string.
|
java.lang.String |
getPattern() |
private static int |
getPreviousWordType(char[] target,
int begin,
int end,
int offset,
int opts) |
private static int |
getPreviousWordType(java.text.CharacterIterator target,
int begin,
int end,
int offset,
int opts) |
private static int |
getPreviousWordType(java.lang.String target,
int begin,
int end,
int offset,
int opts) |
private static int |
getWordType(char[] target,
int begin,
int end,
int offset,
int opts) |
private static int |
getWordType(java.text.CharacterIterator target,
int begin,
int end,
int offset,
int opts) |
private static int |
getWordType(java.lang.String target,
int begin,
int end,
int offset,
int opts) |
private static int |
getWordType0(char ch,
int opts) |
int |
hashCode() |
private static boolean |
isEOLChar(int ch) |
private static boolean |
isSet(int options,
int flag) |
private static boolean |
isWordChar(int ch) |
private int |
matchCharacterIterator(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts) |
private int |
matchCharArray(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts) |
boolean |
matches(char[] target)
Checks whether the target text contains this pattern or not.
|
boolean |
matches(char[] target,
int start,
int end)
Checks whether the target text contains this pattern
in specified range or not.
|
boolean |
matches(char[] target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern
in specified range or not.
|
boolean |
matches(char[] target,
Match match)
Checks whether the target text contains this pattern or not.
|
boolean |
matches(java.text.CharacterIterator target)
Checks whether the target text contains this pattern or not.
|
boolean |
matches(java.text.CharacterIterator target,
Match match)
Checks whether the target text contains this pattern or not.
|
boolean |
matches(java.lang.String target)
Checks whether the target text contains this pattern or not.
|
boolean |
matches(java.lang.String target,
int start,
int end)
Checks whether the target text contains this pattern
in specified range or not.
|
boolean |
matches(java.lang.String target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern
in specified range or not.
|
boolean |
matches(java.lang.String target,
Match match)
Checks whether the target text contains this pattern or not.
|
private static boolean |
matchIgnoreCase(int chardata,
int ch) |
private int |
matchString(RegularExpression.Context con,
Op op,
int offset,
int dx,
int opts) |
(package private) void |
prepare()
Prepares for matching.
|
private static boolean |
regionMatches(char[] target,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatches(char[] target,
int offset,
int limit,
java.lang.String part,
int partlen) |
private static boolean |
regionMatches(java.text.CharacterIterator target,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatches(java.text.CharacterIterator target,
int offset,
int limit,
java.lang.String part,
int partlen) |
private static boolean |
regionMatches(java.lang.String text,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatches(java.lang.String text,
int offset,
int limit,
java.lang.String part,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(char[] target,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(char[] target,
int offset,
int limit,
java.lang.String part,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(java.text.CharacterIterator target,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(java.text.CharacterIterator target,
int offset,
int limit,
java.lang.String part,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(java.lang.String text,
int offset,
int limit,
int offset2,
int partlen) |
private static boolean |
regionMatchesIgnoreCase(java.lang.String text,
int offset,
int limit,
java.lang.String part,
int partlen) |
void |
setPattern(java.lang.String newPattern) |
private void |
setPattern(java.lang.String newPattern,
int options) |
void |
setPattern(java.lang.String newPattern,
java.lang.String options) |
java.lang.String |
toString()
Represents this instence in String.
|
static final boolean DEBUG
java.lang.String regex
int options
int nofparen
Token tokentree
boolean hasBackReferences
transient int minlength
transient Op operations
transient int numberOfClosures
transient RegularExpression.Context context
transient RangeToken firstChar
transient java.lang.String fixedString
transient int fixedStringOptions
transient BMPattern fixedStringTable
transient boolean fixedStringOnly
static final int IGNORE_CASE
static final int SINGLE_LINE
static final int MULTIPLE_LINES
static final int EXTENDED_COMMENT
static final int USE_UNICODE_CATEGORY
static final int UNICODE_WORD_BOUNDARY
By default, the engine considers a position between a word character (\w) and a non word character is a word boundary.
By this option, the engine checks word boundaries with the method of 'Unicode Regular Expression Guidelines' Revision 4.
static final int PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
static final int PROHIBIT_FIXED_STRING_OPTIMIZATION
static final int XMLSCHEMA_MODE
static final int SPECIAL_COMMA
private static final int WT_IGNORE
private static final int WT_LETTER
private static final int WT_OTHER
static final int LINE_FEED
static final int CARRIAGE_RETURN
static final int LINE_SEPARATOR
static final int PARAGRAPH_SEPARATOR
public RegularExpression(java.lang.String regex) throws ParseException
regex
- A regular expressionParseException
- regex is not conforming to the syntax.public RegularExpression(java.lang.String regex, java.lang.String options) throws ParseException
regex
- A regular expressionoptions
- A String consisted of "i" "m" "s" "u" "w" "," "X"ParseException
- regex is not conforming to the syntax.RegularExpression(java.lang.String regex, Token tok, int parens, boolean hasBackReferences, int options)
private void compile(Token tok)
public boolean matches(char[] target)
public boolean matches(char[] target, int start, int end)
start
- Start offset of the range.end
- End offset +1 of the range.public boolean matches(char[] target, Match match)
match
- A Match instance for storing matching result.public boolean matches(char[] target, int start, int end, Match match)
start
- Start offset of the range.end
- End offset +1 of the range.match
- A Match instance for storing matching result.private int matchCharArray(RegularExpression.Context con, Op op, int offset, int dx, int opts)
private static final int getPreviousWordType(char[] target, int begin, int end, int offset, int opts)
private static final int getWordType(char[] target, int begin, int end, int offset, int opts)
private static final boolean regionMatches(char[] target, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatches(char[] target, int offset, int limit, int offset2, int partlen)
private static final boolean regionMatchesIgnoreCase(char[] target, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatchesIgnoreCase(char[] target, int offset, int limit, int offset2, int partlen)
public boolean matches(java.lang.String target)
public boolean matches(java.lang.String target, int start, int end)
start
- Start offset of the range.end
- End offset +1 of the range.public boolean matches(java.lang.String target, Match match)
match
- A Match instance for storing matching result.public boolean matches(java.lang.String target, int start, int end, Match match)
start
- Start offset of the range.end
- End offset +1 of the range.match
- A Match instance for storing matching result.private int matchString(RegularExpression.Context con, Op op, int offset, int dx, int opts)
private static final int getPreviousWordType(java.lang.String target, int begin, int end, int offset, int opts)
private static final int getWordType(java.lang.String target, int begin, int end, int offset, int opts)
private static final boolean regionMatches(java.lang.String text, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatches(java.lang.String text, int offset, int limit, int offset2, int partlen)
private static final boolean regionMatchesIgnoreCase(java.lang.String text, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatchesIgnoreCase(java.lang.String text, int offset, int limit, int offset2, int partlen)
public boolean matches(java.text.CharacterIterator target)
public boolean matches(java.text.CharacterIterator target, Match match)
match
- A Match instance for storing matching result.private int matchCharacterIterator(RegularExpression.Context con, Op op, int offset, int dx, int opts)
private static final int getPreviousWordType(java.text.CharacterIterator target, int begin, int end, int offset, int opts)
private static final int getWordType(java.text.CharacterIterator target, int begin, int end, int offset, int opts)
private static final boolean regionMatches(java.text.CharacterIterator target, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatches(java.text.CharacterIterator target, int offset, int limit, int offset2, int partlen)
private static final boolean regionMatchesIgnoreCase(java.text.CharacterIterator target, int offset, int limit, java.lang.String part, int partlen)
private static final boolean regionMatchesIgnoreCase(java.text.CharacterIterator target, int offset, int limit, int offset2, int partlen)
void prepare()
private static final boolean isSet(int options, int flag)
public void setPattern(java.lang.String newPattern) throws ParseException
ParseException
private void setPattern(java.lang.String newPattern, int options) throws ParseException
ParseException
public void setPattern(java.lang.String newPattern, java.lang.String options) throws ParseException
ParseException
public java.lang.String getPattern()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String getOptions()
setPattern()
.public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
boolean equals(java.lang.String pattern, int options)
public int hashCode()
hashCode
in class java.lang.Object
public int getNumberOfGroups()
private static final int getWordType0(char ch, int opts)
private static final boolean isEOLChar(int ch)
private static final boolean isWordChar(int ch)
private static final boolean matchIgnoreCase(int chardata, int ch)