The changed default text scoring in Lucene - BM25
Scoring is the most important part of Apache Lucene. It is the process of calculating the score property of a document in a scope of a given query. A score is a factor that describes how well the document matches the query. For score calculation, Lucene supports many algorithms, but since the beginning of Lucene, TF-IDF (term frequency-inverse document frequency) has been the default scoring algorithm. With the release of Apache Lucene 6.0, one of the major changes in Lucene is the changed default scoring algorithm. The default algorithm is now BM25 (Best Matching). In this section, we will also cover two fundamental concepts of search relevancy: precision and recall, and after that, we'll look at the new default Apache Lucene scoring mechanism and how it differs from TF-IDF.
Precision versus recall
After executing a search query, an obvious question comes to mind: Have I found the most relevant documents or am I missing important documents...