Understanding analyzer
We have seen an overview of text analysis. Now let's dive deeper and understand the core processes running behind the scenes of analysis. As we have seen previously, the analyzer, tokenizer and filter are the three main components Solr uses for text analysis. Let's explore an analyzer.
What is an analyzer?
An analyzer examines the text of fields and generates a token stream. Normally, only fields of type solr.TextField
will specify an analyzer. An analyzer is defined as a child element of the <fieldType>
element in the managed-schema.xml
file. Here is a simple analyzer configuration:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/> </fieldType>
Here, we have defined a single <analyzer>
element. This is the simplest way to define an analyzer. We've already understood the positionIncrementGap
attribute, which adds a space between multi-value...