Package org.apache.lucene.analysis.hunspell
package org.apache.lucene.analysis.hunspell
A Java implementation of Hunspell stemming and
spell-checking algorithms (
Hunspell
), and a stemming
TokenFilter (HunspellStemFilter
) based on it.
For dictionaries, see e.g. LibreOffice repository or Titus Wormer's collection (UTF)
-
ClassDescriptionChecks the "condition" part of affix definition, as inAn object representing the analysis result of a simple (non-compound) wordAn object representing a prefix or a suffix applied to a word stemICONV or OCONV replacement tableAn object representing homonym dictionary entries.An object representing *.dic file entry with its word, flags and morphological data.In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.Possible word breaks according to BREAK directivesUsed to read flags as UTF-8 even if the rest of the file is in the default (8-bit) encodingImplementation of
Dictionary.FlagParsingStrategy
that assumes each flag is encoded as two ASCII characters whose codes must be combined into a single character.Abstraction of the process of parsing flags taken from the affix and dic filesImplementation ofDictionary.FlagParsingStrategy
that assumes each flag is encoded in its numerical form.Simple implementation ofDictionary.FlagParsingStrategy
that treats the chars in each String as a individual flags.Suggestion to add/edit dictionary entries to generate a given list of words created byWordFormGenerator.compress(java.util.List<java.lang.String>, java.util.Set<java.lang.String>, java.lang.Runnable)
.A structure similar toBytesRefHash
, but specialized for sorted char sequences used for Hunspell flags.An oracle for quickly checking that a specific part of a word can never be a valid word.A class that traverses the entire dictionary and applies affix rules to check if those yield correct suggestions similar enough to the given misspelled wordGeneratingSuggester.Weighted<T extends Comparable<T>>A spell checker based on Hunspell dictionaries.TokenFilter that uses hunspell affix rules and words to stem tokens.TokenFilterFactory that creates instances ofHunspellStemFilter
.A class that modifies the given misspelled word in various ways to get correct suggestionsAFragmentChecker
based on all character n-grams possible in a certain language, keeping them in a relatively memory-efficient, but probabilistic data structure.A callback for n-gram ranges in wordsRoot<T extends CharSequence>The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word.A generator for misspelled word corrections based on Hunspell flags.A cache allowing for CPU-cache-friendlier iteration overWordStorage
entries that can be used for suggestions.An exception thrown whenHunspell.suggest(java.lang.String)
call takes too long, ifTimeoutPolicy.THROW_EXCEPTION
is used.A strategy determining what to do when Hunspell API calls take too much timeAn automaton allowing to achieve the same results as non-weightedGeneratingSuggester.ngramScore(int, java.lang.String, java.lang.String, boolean)
, but faster (in O(s2.length) time).A utility class used for generating possible word forms by adding affixes to stems (WordFormGenerator.getAllWordForms(String, String, Runnable)
), and suggesting stems and flags to generate the given set of words (WordFormGenerator.compress(List, Set, Runnable)
).A data structure for memory-efficient word storage and fast lookup/enumeration.