文章大纲
Quickstart: DataFrame
This is a short introduction and quickstart for the PySpark DataFrame API. PySpark DataFrames are lazily evaluated. They are implemented on top of RDDs. When Spark transforms data, it does not immediately compute the transformation but plans how to compute later. When actions such as collect()
are explicitly called, the computation starts.
This notebook shows the basic usa