Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Mastering Apache Solr 7.x An expert guide to advancing, optimizing, and scaling your enterprise search

Product type Paperback

Published in Feb 2018

Publisher Packt

ISBN-13 9781788837385

Length 308 pages

Edition 1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Authors (3):

Dharmesh Vasoya

Chintan Mehta

Sandeep Nair

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Solr 7

2. Getting Started FREE CHAPTER

3. Designing Schemas

4. Mastering Text Analysis Methodologies

5. Data Indexing and Operations

6. Advanced Queries – Part I

7. Advanced Queries – Part II

8. Managing and Fine-Tuning Solr

9. Client APIs – An Overview

Apache Tika and indexing

We have seen how to index data from a standard file format such as JSON or XML. But what about proprietary file formats such as Word and PDF? Luckily, Solr comes to the rescue with the use of the Apache Tika project. The Tika framework provides a way to incorporate various file formats such as Word and PDF.

Internally, Tika uses the Apache PDFBox parser to parse PDF and Apache POI for the Word format. Solr provides ExtractingRequestHandler, which makes use of Tika to upload binary files and to index as well as extract data.

This framework in Solr is known as Solr Cell, which is an abbreviation of Solr content extraction library, the name when this framework was under development.

Solr Cell basics

...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Nair

Nishanth Nair is a Mobile Solutions Architect, currently working as a Consultant for Sears Holdings Corporation. He holds a bachelor's degree in Computer Science and Engineering and has extensive experience with .NET technologies working for companies such as Accenture, McAfee, and Neudesic. He is a Microsoft Certified Application Developer and a Microsoft Certified Technology Specialist. In his free time, he likes to play cricket, tennis, and watch movies.

See other products by Nair

Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.

See other products by Dharmesh Vasoya

Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.

See other products by Mehta