大数据 银行业应用案例
A Portuguese banking institution ran a marketing campaign to convince potential customers to invest in bank term deposits. Information related to direct marketing campaigns of the bank is as follows. The marketing campaigns were based on phone calls. Often, the same customer was contacted more than once through phone, to assess if they would want to subscribe to the bank term deposit or not.
一家葡萄牙银行机构开展了一项营销活动,以说服潜在客户投资银行定期存款。 与银行的直接营销活动有关的信息如下。 市场营销活动基于电话。 通常,通过电话与同一个客户联系多次,以评估他们是否要订阅银行定期存款。
The following questions were answered by data analysis with Spark
Spark的数据分析回答了以下问题
- Load data and create a Spark data frame加载数据并创建一个Spark数据框
- Give marketing success rate. (No. of people subscribed / total no. of entries) 给出营销成功率。 (订阅人数/总参赛人数)
- Give marketing failure rate 给出营销失败率
- Maximum, Mean, and Minimum age of the average targeted customer平均目标客户的最高年龄,平均年龄和最低年龄
- Check the quality of customers by checking the average balance, median balance of customers通过检查平均余额,中位数余额来检查客户的质量
- Check if age matters in marketing subscription for deposit检查年龄是否与营销订阅中的存款有关
- Check if marital status mattered for subscription to deposit.检查婚姻状况是否对订金有重要意义。
- Check if age and marital status together mattered for subscription to deposit scheme 检查年龄和婚姻状况是否对订阅存款计划有重要影响
- Do feature engineering for the column — age and the right age effect on the campaign对列进行功能设计-年龄和正确的年龄对广告系列的影响
The dataset is from the banking sector with the following attributes
数据集来自银行业,具有以下属性
Features attributes: age, job, marital, education, default, balance, housing, loan, contact, day, month, duration, campaign, pdays, previous, poutcome.
功能属性:年龄,工作,婚姻,教育,默认,余额,住房,贷款,联系方式,日期,月份,持续时间,竞选活动,周日,以前,结果。
Target attributes: y
目标属性:y
From the attributes the column ‘y’ is important and it has a two-class, ‘yes’ and ‘no’. If the user is subscribed to a term deposit then it is ‘yes’ otherwise ‘no’.
从属性中,“ y”列很重要,它具有两类,“是”和“否”。 如果用户订阅了定期存款,则为“是”,否则为“否”。
Loading data and create Spark data frame