java.lang.Object
org.apache.lucene.analysis.morph.Viterbi<Token,ViterbiNBest.PositionNBest>
org.apache.lucene.analysis.morph.ViterbiNBest<Token,JaMorphData>
org.apache.lucene.analysis.ja.ViterbiNBest
Viterbi
subclass for Japanese morphological analysis.
This also performs n-best path calculation-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.morph.ViterbiNBest
ViterbiNBest.Lattice<U extends MorphData>, ViterbiNBest.PositionNBest
Nested classes/interfaces inherited from class org.apache.lucene.analysis.morph.Viterbi
Viterbi.Position, Viterbi.WrappedPositionArray<U extends Viterbi.Position>
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final CharacterDefinition
private final boolean
private GraphvizFormatter
<JaMorphData> private final boolean
private final boolean
private static final int
private static final int
private static final int
private static final int
private final boolean
private final UnknownDictionary
private final UserDictionary
Fields inherited from class org.apache.lucene.analysis.morph.ViterbiNBest
dictionaryMap, lattice
Fields inherited from class org.apache.lucene.analysis.morph.Viterbi
buffer, costs, enableSpacePenaltyFactor, end, lastBackTracePos, MAX_UNKNOWN_WORD_LENGTH, outputLongestUserEntryOnly, outputNBest, pending, pos, positions, VERBOSE, wordIdRef
-
Constructor Summary
ConstructorsConstructorDescriptionViterbiNBest
(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, boolean searchMode, boolean extendedMode, boolean outputCompounds) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
backtrace
(Viterbi.Position endPosData, int fromIDX) Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list.protected int
computePenalty
(int pos, int length) Returns the penalty for a specific input regionprivate int
computeSecondBestThreshold
(int pos, int length) (package private) Dictionary
<? extends JaMorphData> protected int
private static boolean
isPunctuation
(char ch) protected int
processUnknownWord
(boolean anyMatches, Viterbi.Position posData) Add unknown words to the position graph.private void
pruneAndRescore
(int startPos, int endPos, int bestStartIDX) protected void
registerNode
(int node, char[] fragment) Add n-best tokens to the pending list.(package private) void
protected void
setNBestCost
(int value) protected boolean
shouldSkipProcessUnknownWord
(int unknownWordEndIndex, Viterbi.Position posData) Methods inherited from class org.apache.lucene.analysis.morph.ViterbiNBest
backtraceNBest, fixupPendingList, getLatticeRootBase, probeDelta
Methods inherited from class org.apache.lucene.analysis.morph.Viterbi
add, computeSpacePenalty, forward, getPending, getPos, isEnd, isOutputNBest, resetBuffer, resetState
-
Field Details
-
unkDictionary
-
characterDefinition
-
userDictionary
-
discardPunctuation
private final boolean discardPunctuation -
searchMode
private final boolean searchMode -
extendedMode
private final boolean extendedMode -
outputCompounds
private final boolean outputCompounds -
dotOut
-
SEARCH_MODE_KANJI_LENGTH
private static final int SEARCH_MODE_KANJI_LENGTH- See Also:
-
SEARCH_MODE_OTHER_LENGTH
private static final int SEARCH_MODE_OTHER_LENGTH- See Also:
-
SEARCH_MODE_KANJI_PENALTY
private static final int SEARCH_MODE_KANJI_PENALTY- See Also:
-
SEARCH_MODE_OTHER_PENALTY
private static final int SEARCH_MODE_OTHER_PENALTY- See Also:
-
-
Constructor Details
-
ViterbiNBest
ViterbiNBest(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, boolean searchMode, boolean extendedMode, boolean outputCompounds)
-
-
Method Details
-
shouldSkipProcessUnknownWord
- Overrides:
shouldSkipProcessUnknownWord
in classViterbi<Token,
ViterbiNBest.PositionNBest>
-
computePenalty
Description copied from class:Viterbi
Returns the penalty for a specific input region- Overrides:
computePenalty
in classViterbi<Token,
ViterbiNBest.PositionNBest> - Throws:
IOException
-
computeSecondBestThreshold
- Throws:
IOException
-
processUnknownWord
Description copied from class:Viterbi
Add unknown words to the position graph.- Specified by:
processUnknownWord
in classViterbi<Token,
ViterbiNBest.PositionNBest> - Returns:
- word length
- Throws:
IOException
-
setGraphvizFormatter
-
backtrace
Description copied from class:Viterbi
Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list. The pending list is then in-reverse (last token should be returned first).- Specified by:
backtrace
in classViterbi<Token,
ViterbiNBest.PositionNBest> - Throws:
IOException
-
pruneAndRescore
- Throws:
IOException
-
registerNode
protected void registerNode(int node, char[] fragment) Description copied from class:ViterbiNBest
Add n-best tokens to the pending list.- Specified by:
registerNode
in classViterbiNBest<Token,
JaMorphData>
-
getDict
-
setNBestCost
protected void setNBestCost(int value) - Overrides:
setNBestCost
in classViterbiNBest<Token,
JaMorphData>
-
getNBestCost
protected int getNBestCost()- Overrides:
getNBestCost
in classViterbiNBest<Token,
JaMorphData>
-
isPunctuation
private static boolean isPunctuation(char ch)
-