You're reading from Data Engineering with Python Work with massive datasets to design data models and automate data pipelines using Python

Product type Paperback

Published in Oct 2020

Publisher Packt

ISBN-13 9781839214189

Length 356 pages

Edition 1st Edition

Languages

Python

Concepts

Data Analysis

Author (1):

Paul Crickard

View More author details

Table of Contents (21) Chapters

Preface

1. Section 1: Building Data Pipelines – Extract Transform, and Load

2. Chapter 1: What is Data Engineering? FREE CHAPTER

3. Chapter 2: Building Our Data Engineering Infrastructure

4. Chapter 3: Reading and Writing Files

5. Chapter 4: Working with Databases

6. Chapter 5: Cleaning, Transforming, and Enriching Data

7. Chapter 6: Building a 311 Data Pipeline

8. Section 2:Deploying Data Pipelines in Production

9. Chapter 7: Features of a Production Pipeline

10. Chapter 8: Version Control with the NiFi Registry

11. Chapter 9: Monitoring Data Pipelines

12. Chapter 10: Deploying Data Pipelines

13. Chapter 11: Building a Production Data Pipeline

14. Section 3:Beyond Batch – Building Real-Time Data Pipelines

15. Chapter 12: Building a Kafka Cluster

16. Chapter 13: Streaming Data with Apache Kafka

17. Chapter 14: Data Processing with Apache Spark

18. Chapter 15: Real-Time Edge Data with MiNiFi, Kafka, and Spark

19. Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix

Installing and configuring Elasticsearch

Elasticsearch is a search engine. In this book, you will use it as a NoSQL database. You will move data both to and from Elasticsearch to other locations. To download Elasticsearch, take the following steps:

Use curl to download the files, as shown:

curl https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.0-darwin-x86_64.tar.gz --output elasticsearch.tar.gz

Extract the files using the following command:
```
tar xvzf elasticsearch.tar.gz
```
You can edit the config/elasticsearch.yml file to name your node and cluster. Later in this book, you will set up an Elasticsearch cluster with multiple nodes. For now, I have changed the following properties:
```
cluster.name: DataEngineeringWithPython 
node.name: OnlyNode
```
Now, you can start Elasticsearch. To start Elasticsearch, run the following:
```
bin/elasticsearch
```
Once Elasticsearch has started, you can see the results at http://localhost:9200. You should see the following output...