Package org.apache.lucene.analysis.ar
Class ArabicNormalizer
java.lang.Object
org.apache.lucene.analysis.ar.ArabicNormalizer
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) int
normalize
(char[] s, int len) Normalize an input buffer of Arabic text
-
Field Details
-
ALEF
static final char ALEF- See Also:
-
ALEF_MADDA
static final char ALEF_MADDA- See Also:
-
ALEF_HAMZA_ABOVE
static final char ALEF_HAMZA_ABOVE- See Also:
-
ALEF_HAMZA_BELOW
static final char ALEF_HAMZA_BELOW- See Also:
-
YEH
static final char YEH- See Also:
-
DOTLESS_YEH
static final char DOTLESS_YEH- See Also:
-
TEH_MARBUTA
static final char TEH_MARBUTA- See Also:
-
HEH
static final char HEH- See Also:
-
TATWEEL
static final char TATWEEL- See Also:
-
FATHATAN
static final char FATHATAN- See Also:
-
DAMMATAN
static final char DAMMATAN- See Also:
-
KASRATAN
static final char KASRATAN- See Also:
-
FATHA
static final char FATHA- See Also:
-
DAMMA
static final char DAMMA- See Also:
-
KASRA
static final char KASRA- See Also:
-
SHADDA
static final char SHADDA- See Also:
-
SUKUN
static final char SUKUN- See Also:
-
-
Constructor Details
-
ArabicNormalizer
ArabicNormalizer()
-
-
Method Details
-
normalize
int normalize(char[] s, int len) Normalize an input buffer of Arabic text- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-