获取 PySpark 数据帧的行数和列数

最新推荐文章于 2024-11-15 16:37:53 发布

dishui89

最新推荐文章于 2024-11-15 16:37:53 发布

阅读量1.1k

点赞数 1

文章标签： python 开发语言

本文链接：https://blog.youkuaiyun.com/dishui89/article/details/130330480

版权

本文介绍了如何使用PySpark来获取数据帧的行数和列数。通过调用`df.count()`得到数据帧的行数，`df.columns`获取列名列表，然后使用`len(df.columns)`计算列数。示例代码创建了一个数据帧并展示了其结构，接着提取并打印了行数和列数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在本文中，我们将讨论如何获取 PySpark 数据帧的行数和列数。为了找到行数和列数，我们将分别使用带有 len()函数的 count()和 columns()。

df.count (): This function is used to extract the number of rows from a data frame.
df.distinct()。 Count (): This function is used to extract different rows that are not duplicated/duplicated in the data frame.
df.columns (): This function is used to extract the list of column names existing in the data frame.
len (df.columns): This function is used to count the number of items in the list.

例 1:获取 pyspark 中数据帧的 行数和列数。

# importing necessary libraries
from pyspark.sql import SparkSession

# function to create SparkSession
def create_session():
  spk = SparkSession.builder \
      .master("local") \
      .appName("Products.com") \
      .getOrCreate()
  return spk

# function to create Dataframe
def create_df(spark,data,schema):
  df1 = spark.createDataFrame(data,schema)
  return df1

# main function
if __name__ == "__main__":

  # calling function to create SparkSession
  spark = create_session()

  input_data = [(1,"Direct-Cool Single Door Refrigerator",12499),
          (2,"Full HD Smart LED TV",49999),
          (3,"8.5 kg Washing Machine",69999),
          (4,"T-shirt",1999),
          (5,"Jeans",3999),
          (6,"Men's Running Shoes",1499),
          (7,"Combo Pack Face Mask",999)]

  schm = ["Id","Product Name","Price"]

  # calling function to create dataframe
  df = create_df(spark,input_data,schm)
  df.show()

  # extracting number of rows from the Dataframe
  row = df.count()

  # extracting number of columns from the Dataframe
  col = len(df.columns)

  # printing
  print(f'Dimension of the Dataframe is: {(row,col)}')
  print(f'Number of Rows are: {row}')
  print(f'Number of Columns are: {col}')

版权属于：月萌API www.moonapi.com，转载请注明出处

本文链接：获取 PySpark 数据帧的行数和列数开发文档