Package org.apache.lucene.analysis.ckb
Class SoraniNormalizer
java.lang.Object
org.apache.lucene.analysis.ckb.SoraniNormalizer
Normalizes the Unicode representation of Sorani text.
Normalization consists of:
- Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
- Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
- Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
- Alternate (joining) form of 'h' (06BE) is converted to 0647
- Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
- Harakat, tatweel, and formatting characters such as directional controls are removed.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) int
normalize
(char[] s, int len) Normalize an input buffer of Sorani text
-
Field Details
-
YEH
static final char YEH- See Also:
-
DOTLESS_YEH
static final char DOTLESS_YEH- See Also:
-
FARSI_YEH
static final char FARSI_YEH- See Also:
-
KAF
static final char KAF- See Also:
-
KEHEH
static final char KEHEH- See Also:
-
HEH
static final char HEH- See Also:
-
AE
static final char AE- See Also:
-
ZWNJ
static final char ZWNJ- See Also:
-
HEH_DOACHASHMEE
static final char HEH_DOACHASHMEE- See Also:
-
TEH_MARBUTA
static final char TEH_MARBUTA- See Also:
-
REH
static final char REH- See Also:
-
RREH
static final char RREH- See Also:
-
RREH_ABOVE
static final char RREH_ABOVE- See Also:
-
TATWEEL
static final char TATWEEL- See Also:
-
FATHATAN
static final char FATHATAN- See Also:
-
DAMMATAN
static final char DAMMATAN- See Also:
-
KASRATAN
static final char KASRATAN- See Also:
-
FATHA
static final char FATHA- See Also:
-
DAMMA
static final char DAMMA- See Also:
-
KASRA
static final char KASRA- See Also:
-
SHADDA
static final char SHADDA- See Also:
-
SUKUN
static final char SUKUN- See Also:
-
-
Constructor Details
-
SoraniNormalizer
SoraniNormalizer()
-
-
Method Details
-
normalize
int normalize(char[] s, int len) Normalize an input buffer of Sorani text- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-