Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scala and Spark for Big Data Analytics

You're reading from   Scala and Spark for Big Data Analytics Explore the concepts of functional programming, data streaming, and machine learning

Arrow left icon
Product type Paperback
Published in Jul 2017
Publisher Packt
ISBN-13 9781785280849
Length 796 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Sridhar Alla Sridhar Alla
Author Profile Icon Sridhar Alla
Sridhar Alla
Md. Rezaul Karim Md. Rezaul Karim
Author Profile Icon Md. Rezaul Karim
Md. Rezaul Karim
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Introduction to Scala FREE CHAPTER 2. Object-Oriented Scala 3. Functional Programming Concepts 4. Collection APIs 5. Tackle Big Data – Spark Comes to the Party 6. Start Working with Spark – REPL and RDDs 7. Special RDD Operations 8. Introduce a Little Structure - Spark SQL 9. Stream Me Up, Scotty - Spark Streaming 10. Everything is Connected - GraphX 11. Learning Machine Learning - Spark MLlib and Spark ML 12. My Name is Bayes, Naive Bayes 13. Time to Put Some Order - Cluster Your Data with Spark MLlib 14. Text Analytics Using Spark ML 15. Spark Tuning 16. Time to Go to ClusterLand - Deploying Spark on a Cluster 17. Testing and Debugging Spark 18. PySpark and SparkR

Scala for the beginners

In this part, you will find that we assume that you have a basic understanding of any previous programming language. If Scala is your first entry into the coding world, then you will find a large set of materials and even courses online that explain Scala for beginners. As mentioned, there are lots of tutorials, videos, and courses out there.

There is a whole Specialization, which contains this course, on Coursera: https://www.coursera.org/specializations/scala. Taught by the creator of Scala, Martin Odersky, this online class takes a somewhat academic approach to teaching the fundamentals of functional programming. You will learn a lot about Scala by solving the programming assignments. Moreover, this specialization includes a course on Apache Spark. Furthermore, Kojo (http://www.kogics.net/sf:kojo) is an interactive learning environment that uses Scala programming to explore and play with math, art, music, animations, and games.

Your first line of code

As a first example, we will use the pretty common Hello, world! program in order to show you how to use Scala and its tools without knowing much about it. Let's open your favorite editor (this example runs on Windows 7, but can be run similarly on Ubuntu or macOS), say Notepad++, and type the following lines of code:

object HelloWorld {
def main(args: Array[String]){
println("Hello, world!")
}
}

Now, save the code with a name, say HelloWorld.scala, as shown in the following figure:

Figure 11: Saving your first Scala source code using Notepad++

Let's compile the source file as follows:

C:\>scalac HelloWorld.scala
C:\>scala HelloWorld
Hello, world!
C:\>

I'm the hello world program, explain me well!

The program should be familiar to anyone who has some programming of experience. It has a main method which prints the string Hello, world! to your console. Next, to see how we defined the main function, we used the def main() strange syntax to define it. def is a Scala keyword to declare/define a method, and we will be covering more about methods and different ways of writing them in the next chapter. So, we have an Array[String] as an argument for this method, which is an array of strings that can be used for initial configurations of your program, and omit is valid. Then, we use the common println() method, which takes a string (or formatted one) and prints it to the console. A simple hello world has opened up many topics to learn; three in particular:

● Methods (covered in a later chapter)
● Objects and classes (covered in a later chapter)
● Type inference - the reason why Scala is a statically typed language - explained earlier

Run Scala interactively!

The scala command starts the interactive shell for you, where you can interpret Scala expressions interactively:

> scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala>
scala> object HelloWorld {
| def main(args: Array[String]){
| println("Hello, world!")
| }
| }
defined object HelloWorld
scala> HelloWorld.main(Array())
Hello, world!
scala>
The shortcut :q stands for the internal shell command :quit, used to exit the interpreter.

Compile it!

The scalac command, which is similar to javac command, compiles one or more Scala source files and generates a bytecode as output, which then can be executed on any Java Virtual Machine. To compile your hello world object, use the following:

> scalac HelloWorld.scala

By default, scalac generates the class files into the current working directory. You may specify a different output directory using the -d option:

> scalac -d classes HelloWorld.scala

However, note that the directory called classes must be created before executing this command.

Execute it with Scala command

The scala command executes the bytecode that is generated by the interpreter:

$ scala HelloWorld

Scala allows us to specify command options, such as the -classpath (alias -cp) option:

$ scala -cp classes HelloWorld

Before using the scala command to execute your source file(s), you should have a main method that acts as an entry point for your application. Otherwise, you should have an Object that extends Trait Scala.App, then all the code inside this object will be executed by the command. The following is the same Hello, world! example, but using the App trait:

#!/usr/bin/env Scala 
object HelloWorld extends App {
println("Hello, world!")
}
HelloWorld.main(args)

The preceding script can be run directly from the command shell:

./script.sh

Note: we assume here that the file script.sh has the execute permission:

$ sudo chmod +x script.sh

Then, the search path for the scala command is specified in the $PATH environment variable.

You have been reading a chapter from
Scala and Spark for Big Data Analytics
Published in: Jul 2017
Publisher: Packt
ISBN-13: 9781785280849
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image