Chapter 3. Working with Big Data Frameworks
Note
Learning Objectives
By the end of this chapter, you will be able to:
Explain the HDFS and YARN Hadoop components
Perform file operations with HDFS
Compare a pandas DataFrame with a Spark DataFrame
Read files from a local filesystem and HDFS using Spark
Write files in Parquet format using Spark
Write partitioned files in Parquet for fast analysis
Manipulate non-structured data with Spark
Note
In this chapter, we will explore big data tools such as Hadoop and Spark.