Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Configuring the Solr spellchecker

If you are used to the way the spellchecker worked in the previous Solr versions, then you might remember that it required its own index to give you spelling corrections. This approach had some disadvantages, such as the need to rebuild the index on each Solr node or replicate the spellchecker index between the nodes. With Solr 4.0, a new spellchecker implementation was introduced, solr.DirectSolrSpellchecker. It allows you to use your main index to provide spelling suggestions and doesn't need to be rebuilt after every commit. Now, let's see how to use this new spellchecker implementation in Solr.

How to do it...

First, let's assume we have a field in the index called title in which we hold the titles of our documents. What's more, we don't want the spellchecker to have its own index, and we would like to use this title field to provide spelling suggestions. In addition, we would like to decide when we want spelling suggestions. In order to do this, we need to do two things:

  1. First, we need to edit our solrconfig.xml file and add the spellchecking component, the definition of which can look like this:
    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
     <str name="queryAnalyzerFieldType">text_general</str>
     <lst name="spellchecker">
      <str name="name">direct</str>
      <str name="field">title</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.8</float>
      <int name="maxEdits">1</int>
      <int name="minPrefix">1</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">3</int>
      <float name="maxQueryFrequency">0.01</float>
     </lst>
    </searchComponent>
  2. Now, we need to add a proper request handler configuration that will use the preceding search component. To do this, we need to add the following section to the solrconfig.xml file:
    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
     <lst name="defaults">
      <str name="df">title</str>
      <str name="spellcheck.dictionary">direct</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
     </lst>
     <arr name="last-components">
      <str>spellcheck</str>
     </arr>
    </requestHandler>
  3. That's all. In order to get spelling suggestions, we need to run the following query:
    /spell?q=disa
  4. In response, we will get something like this:
    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">5</int>
     </lst>
    <result name="response" numFound="0" start="0">
    </result>
    <lst name="spellcheck">
     <lst name="suggestions">
      <lst name="disa">
       <int name="numFound">1</int>
       <int name="startOffset">0</int>
       <int name="endOffset">4</int>
       <int name="origFreq">0</int>
       <arr name="suggestion">
        <lst>
         <str name="word">data</str>
         <int name="freq">1</int>
        </lst>
       </arr>
      </lst>
      <bool name="correctlySpelled">false</bool>
      <lst name="collation">
       <str name="collationQuery">data</str>
       <int name="hits">1</int>
       <lst name="misspellingsAndCorrections">
        <str name="disa">data</str>
       </lst>
      </lst>
     </lst>
    </lst>
    </response>

If you check your data folder, you will see that there is no directory responsible for holding the spellchecker index. Now, let's see how this works.

How it works...

Now, let's get into some specifics about how the configuration shown in the preceding example works. We will start from the search component configuration. The queryAnalyzerFieldType property tells Solr which field configuration should be used to analyze the query passed to the spellchecker. The name property sets the name of the spellchecker, which is used in the handler configuration later. The field property specifies which field should be used as the source for the data used to build spelling suggestions. As you probably figured out, the classname property specifies the implementation class, which in our case is solr.DirectSolrSpellChecker, enabling us to omit having a separate spellchecker index; spellchecker will just use the main index. The next parameters visible in the previous configuration specify how the Solr spellchecker should behave; however, this is beyond the scope of this recipe (if you want to read more about the parameters, visit the http://wiki.apache.org/solr/SpellCheckComponent URL).

The last thing is the request handler configuration. Let's concentrate on all the properties that start with the spellcheck prefix. First, we have spellcheck.dictionary, which, in our case, specifies the name of the spellchecking component we want to use (note that the value of the property matches the value of the name property in the search component configuration). We tell Solr that we want spellchecking results to be present (the spellcheck property with the on value), and we also tell Solr that we want to see the extended result format, which allows us to see more with regard to the results (spellcheck.extendedResults set to true). In addition to the previous configuration properties, we also said that we want to have a maximum of five suggestions (the spellcheck.count property), and we want to see the collation and its extended results (spellcheck.collate and spellcheck.collateExtendedResults both set to true).

There's more...

Let's see one more thing—the ability to have more than one spellchecker defined in a request handler.

More than one spellchecker

If you want to have more than one spellchecker handling spelling suggestions, you can configure your handler to use multiple search components. For example, if you want to use search components (spellchecking ones) named word and better (you have to have them configured), you can add multiple spellcheck.dictionary parameters to your request handler. This is what your request handler configuration will look like:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
 <lst name="defaults">
  <str name="df">title</str>
  <str name="spellcheck.dictionary">direct</str>
  <str name="spellcheck.dictionary">word</str>
  <str name="spellcheck.dictionary">better</str>
  <str name="spellcheck">on</str>
  <str name="spellcheck.extendedResults">true</str>
  <str name="spellcheck.count">5</str>
  <str name="spellcheck.collate">true</str>
  <str name="spellcheck.collateExtendedResults">true</str>
 </lst>
 <arr name="last-components">
  <str>spellcheck</str>
 </arr>
</requestHandler>
You have been reading a chapter from
Solr Cookbook - Third Edition - Third Edition
Published in: Jan 2015
Publisher:
ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image