1
问题引入
- 商品销售输入与广告支出经费之间的关系,销售输入与广告支出有着密切的关系,但是还与商品质量、居民收入等因素有关。
- 粮食产量与施肥量之间的关系。在一定范围内,施肥量越大,粮食生产就越高。除此之外,粮食产量还受到土壤质量、降雨量等的影响。
- 人体内脂肪的含量与年龄之间的关系。在一定年龄段内,随着年龄的增长,人体内的脂肪含量会增加,但人体内的脂肪含量还和饮食习惯,体育锻炼有关系,可能还与先天体质有关系。
2
脂肪含量和年龄相关吗?




3
问题的求解







4
Execl绘制相关图形
- 准备数据,首先绘制散点图
- 在散点图基础上点”趋势预测”即可看到拟合的直线。


5
SparkMl代码实
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.regression.{LinearRegression, LinearRegressionModel}
import org.apache.spark.sql.SparkSession
object testFat {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession.builder().master("local[*]").appName("traintestSplitTest").getOrCreate()
spark.sparkContext.setLogLevel("WARN")
val data = spark.createDataFrame(Seq(
(9.5, Vectors.dense(23)),
(17.8, Vectors.dense(27)),
(21.2, Vectors.dense(39)),
(25.9, Vectors.dense(41)),
(27.5, Vectors.dense(45)),
(26.3, Vectors.dense(49)),
(28.2, Vectors.dense(50)),
(29.6, Vectors.dense(53)),
(30.2, Vectors.dense(54)),
(31.4, Vectors.dense(56)),
(30.8, Vectors.dense(57)),
(33.5, Vectors.dense(58)),
(35.2, Vectors.dense(60)),
(34.6, Vectors.dense(61))
)).toDF("label", "features")
//1-data split
val Array(train, test): Array[Dataset[Row]] = data.randomSplit(Array(0.9, 0.1), seed = 120L)
//2-training model
val lr: LinearRegression = new LinearRegression()
val lrModel: LinearRegressionModel = lr.fit(data)
//3- Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
// 4-Summarize the model over the training set and print out some metrics
val trainingSummary = lrModel.summary
println(s"numIterations: ${trainingSummary.totalIterations}")
println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]")
trainingSummary.residuals.show()
println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")
println(s"r2: ${trainingSummary.r2}")
// Coefficients: [0.5764772505370067] Intercept: -0.44779925795753567
// numIterations: 1
// objectiveHistory: [0.0]
// +---------------------------------------------+
// | residuals|
// +--------------------------------------------+
// | -3.311177504393619|
// | 2.682913493458356|
// .............................
// RMSE: 1.629890205389517
// r2: 0.9423379190667397
}
}
总结

