public class WordBreakSpellChecker
extends java.lang.Object
A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.
Modifier and Type | Class and Description |
---|---|
static class |
WordBreakSpellChecker.BreakSuggestionSortMethod
Determines the order to list word break suggestions
|
private static class |
WordBreakSpellChecker.CombinationsThenFreqComparator |
private static class |
WordBreakSpellChecker.CombineSuggestionWrapper |
private static class |
WordBreakSpellChecker.LengthThenMaxFreqComparator |
private static class |
WordBreakSpellChecker.LengthThenSumFreqComparator |
private static class |
WordBreakSpellChecker.SuggestWordArrayWrapper |
Modifier and Type | Field and Description |
---|---|
private int |
maxChanges |
private int |
maxCombineWordLength |
private int |
maxEvaluations |
private int |
minBreakWordLength |
private int |
minSuggestionFrequency |
static Term |
SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined
|
Constructor and Description |
---|
WordBreakSpellChecker()
Creates a new spellchecker with default configuration values
|
Modifier and Type | Method and Description |
---|---|
private int |
generateBreakUpSuggestions(Term term,
IndexReader ir,
int numberBreaks,
int maxSuggestions,
int useMinSuggestionFrequency,
SuggestWord[] prefix,
java.util.Queue<WordBreakSpellChecker.SuggestWordArrayWrapper> suggestions,
int totalEvaluations,
WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) |
private SuggestWord |
generateSuggestWord(IndexReader ir,
java.lang.String fieldname,
java.lang.String text) |
int |
getMaxChanges()
Returns the maximum number of changes to perform on the input
|
int |
getMaxCombineWordLength()
Returns the maximum length of a combined suggestion
|
int |
getMaxEvaluations()
Returns the maximum number of word combinations to evaluate.
|
int |
getMinBreakWordLength()
Returns the minimum size of a broken word
|
int |
getMinSuggestionFrequency()
Returns the minimum frequency a term must have
to be part of a suggestion.
|
private SuggestWord[] |
newPrefix(SuggestWord[] oldPrefix,
SuggestWord append) |
private SuggestWord[] |
newSuggestion(SuggestWord[] prefix,
SuggestWord append1,
SuggestWord append2) |
void |
setMaxChanges(int maxChanges)
The maximum numbers of changes (word breaks or combinations) to make on the
original term(s).
|
void |
setMaxCombineWordLength(int maxCombineWordLength)
The maximum length of a suggestion made by combining 1 or more original
terms.
|
void |
setMaxEvaluations(int maxEvaluations)
The maximum number of word combinations to evaluate.
|
void |
setMinBreakWordLength(int minBreakWordLength)
The minimum length to break words down to.
|
void |
setMinSuggestionFrequency(int minSuggestionFrequency)
The minimum frequency a term must have to be included as part of a
suggestion.
|
SuggestWord[][] |
suggestWordBreaks(Term term,
int maxSuggestions,
IndexReader ir,
SuggestMode suggestMode,
WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
Generate suggestions by breaking the passed-in term into multiple words.
|
CombineSuggestion[] |
suggestWordCombinations(Term[] terms,
int maxSuggestions,
IndexReader ir,
SuggestMode suggestMode)
Generate suggestions by combining one or more of the passed-in terms into
single words.
|
private int minSuggestionFrequency
private int minBreakWordLength
private int maxCombineWordLength
private int maxChanges
private int maxEvaluations
public static final Term SEPARATOR_TERM
public WordBreakSpellChecker()
public SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws java.io.IOException
Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.
suggestMode
- - default = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
sortMethod
- - default =
WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY
java.io.IOException
- If there is a low-level I/O error.public CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) throws java.io.IOException
Generate suggestions by combining one or more of the passed-in terms into
single words. The returned CombineSuggestion
contains both a
SuggestWord
and also an array detailing which passed-in terms were
involved in creating this combination. The scores returned are equal to the
number of word combinations needed, also one less than the length of the
array CombineSuggestion.originalTermIndexes
. Generally, a
suggestion with a lower score is preferred over a higher score.
To prevent two adjacent terms from being combined (for instance, if one is
mandatory and the other is prohibited), separate the two terms with
SEPARATOR_TERM
When suggestMode equals SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
, each
suggestion will include at least one term not in the index.
When suggestMode equals SuggestMode.SUGGEST_MORE_POPULAR
, each
suggestion will have the same, or better frequency than the most-popular
included term.
java.io.IOException
- If there is a low-level I/O error.private int generateBreakUpSuggestions(Term term, IndexReader ir, int numberBreaks, int maxSuggestions, int useMinSuggestionFrequency, SuggestWord[] prefix, java.util.Queue<WordBreakSpellChecker.SuggestWordArrayWrapper> suggestions, int totalEvaluations, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws java.io.IOException
java.io.IOException
private SuggestWord[] newPrefix(SuggestWord[] oldPrefix, SuggestWord append)
private SuggestWord[] newSuggestion(SuggestWord[] prefix, SuggestWord append1, SuggestWord append2)
private SuggestWord generateSuggestWord(IndexReader ir, java.lang.String fieldname, java.lang.String text) throws java.io.IOException
java.io.IOException
public int getMinSuggestionFrequency()
setMinSuggestionFrequency(int)
public int getMaxCombineWordLength()
setMaxCombineWordLength(int)
public int getMinBreakWordLength()
setMinBreakWordLength(int)
public int getMaxChanges()
setMaxChanges(int)
public int getMaxEvaluations()
setMaxEvaluations(int)
public void setMinSuggestionFrequency(int minSuggestionFrequency)
The minimum frequency a term must have to be included as part of a
suggestion. Default=1 Not applicable when used with
SuggestMode.SUGGEST_MORE_POPULAR
getMinSuggestionFrequency()
public void setMaxCombineWordLength(int maxCombineWordLength)
The maximum length of a suggestion made by combining 1 or more original terms. Default=20
getMaxCombineWordLength()
public void setMinBreakWordLength(int minBreakWordLength)
The minimum length to break words down to. Default=1
getMinBreakWordLength()
public void setMaxChanges(int maxChanges)
The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1
getMaxChanges()
public void setMaxEvaluations(int maxEvaluations)
The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.
getMaxEvaluations()