Understanding Various Spark Actions
Spark actions trigger specified transformations. Transformations create RDDs from another RDD. Actions are the operations that are performed on RDDs to give non-RDD values.
Popular actions include reduce
, collect
, count
, first
, and s
. Actions are executed and values of actions are stored back in Spark drivers or external storage systems.
Let's understand transformations in more detail:
reduce(func)
: This aggregates the elements of a dataset by executing a function on them.reduce
works only with commutative and associative functions as it runs in parallel. For example,reduce
could be taking (a, b) as the two inputs and having a+b as one output. Say if the input data is {1,2,…100}, using thesum
function onreduce
would result in {5050}, which is the sum of all the elements of the dataset.collect()
: This returns all the elements in a dataset. This is the equivalent ofselect *
in SQL. For example, if the dataset...