Spark社区可能放弃Spark 1.7而直接发布Spark 2.x

ReynoldXin宣布Spark社区将跳过Spark 1.7版本,直接转向Spark 2.x系列。Spark 2.x将默认使用Scala 2.11,移除对Hadoop 1.x的支持,并可能不支持Hadoop 2.6以下版本。同时,将移除一些已标记为废弃的接口、配置和模块,如Bagel,并从streaming中移除对Akka的依赖。

最近由Reynold Xin给Spark开发者发布的一封邮件透露,Spark社区很有可能会跳过Spark 1.7版本的发布,而直接转向Spark 2.x。

  如果Spark 2.x发布,那么它将:
  (1)、Spark编译将默认使用Scala 2.11,但是还是会支持Scala 2.10。
  (2)、移除对Hadoop 1.x的支持。不过也有可能移除对Hadoop 2.2以下版本的支持,因为Hadoop 2.0和2.1版本分别是alpha和beta;甚至直接不支持Hadoop 2.6以下版本了。
  (3)、在Spark 1.x里面标记为deprecated的interfaces, configs, and modules (e.g. Bagel)将会被移除;
  (4)、从streaming中移除对Akka的依赖;

  (5)、移除Guava的依赖。


详情参见邮件内容:

I’m starting a new thread since the other one got intermixed with feature requests. Please refrain from making feature request in this thread. Not that we shouldn’t be adding features, but we can always add features in 1.7, 2.1, 2.2, ...

First - I want to propose a premise for how to think about Spark 2.0 and major releases in Spark, based on discussion with several members of the community: a major release should be low overhead and minimally disruptive to the Spark community. A major release should not be very different from a minor release and should not be gated based on new features. The main purpose of a major release is an opportunity to fix things that are broken in the current API and remove certain deprecated APIs (examples follow).

For this reason, I would *not* propose doing major releases to break substantial API's or perform large re-architecting that prevent users from upgrading. Spark has always had a culture of evolving architecture incrementally and making changes - and I don't think we want to change this model. In fact, we’ve released many architectural changes on the 1.X line.

If the community likes the above model, then to me it seems reasonable to do Spark 2.0 either after Spark 1.6 (in lieu of Spark 1.7) or immediately after Spark 1.7. It will be 18 or 21 months since Spark 1.0. A cadence of major releases every 2 years seems doable within the above model.

Under this model, here is a list of example things I would propose doing in Spark 2.0, separated into APIs and Operation/Deployment:

APIs

1. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in Spark 1.x.

2. Remove Akka from Spark’s API dependency (in streaming), so user applications can use Akka (SPARK-5293). We have gotten a lot of complaints about user applications being unable to use Akka due to Spark’s dependency on Akka.

3. Remove Guava from Spark’s public API (JavaRDD Optional).

4. Better class package structure for low level developer API’s. In particular, we have some DeveloperApi (mostly various listener-related classes) added over the years. Some packages include only one or two public classes but a lot of private classes. A better structure is to have public classes isolated to a few public packages, and these public packages should have minimal private classes for low level developer APIs.

5. Consolidate task metric and accumulator API. Although having some subtle differences, these two are very similar but have completely different code path.

6. Possibly making Catalyst, Dataset, and DataFrame more general by moving them to other package(s). They are already used beyond SQL, e.g. in ML pipelines, and will be used by streaming also.

Operation/Deployment

1. Scala 2.11 as the default build. We should still support Scala 2.10, but it has been end-of-life.

2. Remove Hadoop 1 support.

3. Assembly-free distribution of Spark: don’t require building an enormous assembly jar in order to run Spark.


数据驱动的两阶段分布鲁棒(1-范数和∞-范数约束)的电热综合能源系统研究(Matlab代码实现)内容概要:本文围绕“数据驱动的两阶段分布鲁棒(1-范数和∞-范数约束)的电热综合能源系统研究”展开,提出了一种结合数据驱动与分布鲁棒优化方法的建模框架,用于解决电热综合能源系统在不确定性环境下的优化调度问题。研究采用两阶段优化结构,第一阶段进行预决策,第二阶段根据实际场景进行调整,通过引入1-范数和∞-范数约束来构建不确定集,有效刻画风电、负荷等不确定性变量的波动特性,提升模型的鲁棒性和实用性。文中提供了完整的Matlab代码实现,便于读者复现和验证算法性能,并结合具体案例分析了不同约束条件下系统运行的经济性与可靠性。; 适合人群:具备一定电力系统、优化理论和Matlab编程基础的研究生、科研人员及工程技术人员,尤其适合从事综合能源系统、鲁棒优化、不确定性建模等相关领域研究的专业人士。; 使用场景及目标:①掌握数据驱动的分布鲁棒优化方法在综合能源系统中的应用;②理解1-范数和∞-范数在构建不确定集中的作用与差异;③学习两阶段鲁棒优化模型的建模思路与Matlab实现技巧,用于科研复现、论文写作或工程项目建模。; 阅读建议:建议读者结合提供的Matlab代码逐段理解算法实现细节,重点关注不确定集构建、两阶段模型结构设计及求解器调用方式,同时可尝试更换数据或调整约束参数以加深对模型鲁棒性的理解。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值