【数据分析项目实战】华盛顿州电车统计：Pyecharts进阶可视化_clean alternative fuel vehicle (cafv) eligibility-优快云博客

本文链接：https://blog.youkuaiyun.com/SNeutronS/article/details/139212950

写在前面：所谓进阶，就是代码多几条，图像更花里胡哨的意思。但仍有这篇，是希望免去查帮助文档的步骤。

文章目录

一、清洗数据
二、可视化

内容介绍：
VIN (1-10)': The 1st 10 characters of each vehicle's Vehicle Identification Number (VIN).

'County': The county in which the registered owner resides.

'City': The city in which the registered owner resides

'State': The state in which the registered owner resides

'Postal Code': The 5 digit zip code in which the registered owner resides

'Model Year': The model year of the vehicle, determined by decoding the Vehicle Identification Number (VIN)

'Make': The manufacturer of the vehicle, determined by decoding the Vehicle Identification Number (VIN)

'Model': The model of the vehicle, determined by decoding the Vehicle Identification Number (VIN).

'Electric Vehicle Type': This distinguishes the vehicle as all electric or a plug-in hybrid.

'Clean Alternative Fuel Vehicle (CAFV) Eligibility': This categorizes vehicle as Clean Alternative Fuel Vehicles (CAFVs) based on the fuel requirement and electric-only range requirement in House Bill 2042 as passed in the 2019 legislative session.

'Electric Range': Describes how far a vehicle can travel purely on its electric charge.

'Base MSRP': This is the lowest Manufacturer's Suggested Retail Price (MSRP) for any trim level of the model in question.

'Legislative District': The specific section of Washington State that the vehicle's owner resides in, as represented in the state legislature.

'DOL Vehicle ID': Unique number assigned to each vehicle by Department of Licensing for identification purposes.

'Vehicle Location': The center of the ZIP Code for the registered vehicle.

'Electric Utility': This is the electric power retail service territories serving the address of the registered vehicle.

准备工作：
写两个常用计数的自定义函数：
（自定义后，自动补全更容易实现）

def vcounts(a):
    return a.value_counts()
def group_mean(a,b,c):
    return a.groupby(b)[c].mean()

一、清洗数据

1.简化信息

（1）删除无用信息

#df.State.value_counts() WA    181060 其他异地登记就先不看了
df=df[df['State']=='WA']
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 181060 entries, 0 to 181457
Data columns (total 17 columns):
 #   Column                                             Non-Null Count   Dtype  
---  ------                                             --------------   -----  
 0   VIN (1-10)                                         181060 non-null  object 
 1   County                                             181060 non-null  object 
 2   City                                               181060 non-null  object 
 3   State                                              181060 non-null  object 
 4   Postal Code                                        181060 non-null  float64
 5   Model Year                                         181060 non-null  int64  
 6   Make                                               181060 non-null  object 
 7   Model                                              181060 non-null  object 
 8   Electric Vehicle Type                              181060 non-null  object 
 9   Clean Alternative Fuel Vehicle (CAFV) Eligibility  181060 non-null  object 
 10  Electric Range                                     181060 non-null  int64  
 11  Base MSRP                                          181060 non-null  int64  
 12  Legislative District                               181060 non-null  float64
 13  DOL Vehicle ID                                     181060 non-null  int64  
 14  Vehicle Location                                   181055 non-null  object 
 15  Electric Utility                                   181060 non-null  object 
 16  2020 Census Tract                                  181060 non-null  float64
dtypes: float64(3), int64(4), object(10)
memory usage: 24.9+ MB

nouse=['VIN (1-10)','Postal Code','2020 Census Tract','Legislative District','Base MSRP']
dta=df.drop(df[nouse],axis=1)

（2）删除缺失值

df=df.dropna()

2.重命名简化内容

先看复杂列名里有什么：

dta['Clean Alternative Fuel Vehicle (CAFV) Eligibility'].value_counts()
结果：
Clean Alternative Fuel Vehicle (CAFV) Eligibility
Eligibility unknown as battery range has not been researched    94566
Clean Alternative Fuel Vehicle Eligible                         66646
Not eligible due to low battery range                           19843

dta.rename(columns={
   
   'Clean Alternative Fuel Vehicle (CAFV) Eligibility':'CAFV'},inplace=True) #使用字典映射

dta['isCAFV']=dta['CAFV'].apply(lambda x:'unknown' if x=='Eligibility unknown as battery range has not been researched'
                               else 'CAFV' if x=='Clean Alternative Fuel Vehicle Eligible'
                               else 'NOT')
dta['isCAFV'].value_counts()#验证

另一列dta[‘Electric Vehicle Type’]同理

datetime

dta['year']=pd.to_datetime(dta['Model Year'],format='%Y').dt.year #时间格式练习，可不用 #因只有year，需要声明

二、可视化

1.电车拥有量前十的县

希望在柱状图里嵌套个饼状图（后期）

county_top=vcounts(dta['County'])[0:10];county_top
county_top_pair=[(k,v) for k,v in county_top.items()];county_top_pair
#推导式封装

from pyecharts.charts import Bar
county_bar=