Class HMMChineseTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
Factory for
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a
WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese
stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
- Since:
- 4.10.0
-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
Constructor Summary
ConstructorsConstructorDescriptionDefault ctor for compatibility with SPICreates a new HMMChineseTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate
(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryMethods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
-
Constructor Details
-
HMMChineseTokenizerFactory
Creates a new HMMChineseTokenizerFactory -
HMMChineseTokenizerFactory
public HMMChineseTokenizerFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
create
Description copied from class:TokenizerFactory
Creates a TokenStream of the specified input using the given AttributeFactory- Specified by:
create
in classTokenizerFactory
-