Indexing data with meta using Apache Spark
Using a simple map for ingesting data is not good for simple jobs. The best practice in Spark is to use the case class so that you have fast serialization and can manage complex type checking. During indexing, providing custom IDs can be very handy. In this recipe, we will see how to cover these issues.
Getting ready
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
You also need a working installation of Apache Spark.
How to do it...
To store data in Elasticsearch using Apache Spark, we will perform the following steps:
- In the Spark root directory, start the Spark shell to apply the Elasticsearch configuration by running the following command:
./bin/spark-shell \ Â Â Â Â --conf spark.es.index.auto.create=true \ Â Â Â Â --conf spark.es.net.http.auth.user=$ES_USER \ Â Â Â ...