Apache Spark and Databricks
The most popular integrated platform for learning and using Apache Spark is provided by Databricks. Databricks takes Apache Spark to the next level. It offers five times the performance (compared to Vanilla Apache Spark on the cloud) and integrated Jupyter notebooks in a secure cloud-enabled platform. The core team that developed Apache Spark while at Berkeley is part of Databricks. We will get into the details of the core operations of Spark, namely, transformations and actions. We will use the integrated Jupyter notebooks in Databricks to write the code for this. Databricks enables us to spin Spark clusters on the cloud and connect to it with integrated Jupyter notebooks. So, in the next section, let's set up the Databricks environment and learn to create and use a Jupyter notebook.
Exercise 7.01: Creating Your Databricks Notebook
The best way to learn Spark is by doing exercises and tutorials. You could either set up Spark locally or, even...