Using stemming with Dovetail Seeker
What is Stemming
In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stems", "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish". On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument".
Available Stemmers in Dovetail Seeker
- Porter
- Snowball
- None
Snowball is nearly universally regarded as an improvement over Porter.
Examples (using the Snowball stemmer)
Dovetail Seeker Configuration
SearchAnalysis.Stemmer configuration
The stemming algorithm to be used is defined by the SearchAnalysis.Stemmer setting.
This setting should only exist in the seeker.config file.
Important Note 1: If the SearchAnalysis.Stemmer setting also exists in SeekerConsole.exe.config or in SeekerService.exe.config, it should be removed from those files to eliminate any confusion.
Important Note 2: The casing of the setting name matters. SearchAnalysis.Stemmer is valid. searchAnalysis.stemmer is invalid.
Restart Seeker Indexer
After any changes to the configured stemming algorithm, it is recommended to restart the Seeker Indexer service.
When the Seeker Indexer starts up, it will log the stemming algorithm in use. Example:INFO Dovetail.Search.AnalyzerFactory - Using the Snowball stemming algorithm.
This allows for verification of the desired configuration.
Restart Seeker Web Application
After any changes to the configured stemming algorithm, it is recommended to restart the Seeker web application.
When the Seeker web app starts up, it will log the stemming algorithm in use. Example:INFO Dovetail.Search.AnalyzerFactory - Using the Snowball stemming algorithm.
This allows for verification of the desired configuration.
Reindex
After any changes to the configured stemming algorithm, it is recommended to re-index your content.