We have previously seen that an analyzer may be a single class or a set of defined tokenizer and filter classes.
The analyzer executes the analysis process in two steps:
- Tokenization (parsing): Using configured tokenizer classes
- Filtering (transformation): Using configured filter classes
We can also do preprocessing on a character stream before tokenization; we can do this with the help of CharFilters (we will see this later in the chapter). An analyzer knows its configured field, but a tokenizer doesn't have any idea about the field. The job of the tokenizer is only to read from a character stream, apply a tokenization mechanism based on its behavior, and produce a new sequence of a token stream.