Apache Spark as a Service

最新推荐文章于 2023-07-07 16:09:53 发布

转载最新推荐文章于 2023-07-07 16:09:53 发布 · 1.9k 阅读

spark 专栏收录该内容

80 篇文章

订阅专栏

QDS提供了一种简单、快速且成本效益高的方式来在AWS云和Google Cloud Platform上部署Apache Spark，使其适用于企业级应用。通过支持各种Spark库如MLlib、GraphX、SparkSQL等，QDS不仅简化了大数据解决方案的部署过程，还提供了丰富的用户界面选项，包括交互式查询和算法开发所需的Spark Notebook。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Apache Spark as a Service

Enterprise Spark on QDS

QDS makes Spark enterprise ready, delivering simple, fast, cost effective and secure Spark processing on the AWS Cloud and Google Cloud Platform. QDS provides the flexibility you need to succeed, focusing on simplifying your time to deployment, making your business users self-sufficient, and accelerating your time to value.

Spark is an open source analytic engine for fast large-scale data processing. Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3). Spark supports in-memory processing to boost the performance of big data analytics applications and also supports disk-based processing.

Complete Cluster Life Cycle Management

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Spark clusters. QDS does all the heavy lifting for you.

Quick and Easy Spark Debugging

QDS makes it easy to debug both active and historical jobs with a Spark Application UI. Results and logs are always available even without active running clusters.

Instance Selection Options

QDS supports a wide variety of Amazon EC2 instance types for your Spark workload, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

Spot Instance
Pricing

QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.

Elastic Pricing
Model

With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.

Extensive
User Interfaces

QDS gives you user interface options to match your use case. The Spark Notebook and a web-based UI are suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.

Amazon Virtual Private Cloud (VPC) Support

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.

When Should I Use QDS for Spark?

QDS for Spark is ideal for data scientists that use a combination of machine learning, SQL and advanced statistical analysis, and graph processing. QDS supports all Spark Libraries including: MLlib (machine learning), GraphX (graph processing), Spark SQL and Spark Streaming as well as SparkR for running Spark on R, a widely used statistic programming language. With support for Spark Notebooks, you can do quick and iterative development of algorithms using an interactive UI.

Machine Learning and Advanced Analytics

By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well suited to processing machine learning algorithms. In addition, Spark’s MLlib provides common machine learning algorithms such as classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives. SparkR lets you apply many additional algorithms.

Your data isn’t useful until you use it to drive decision making. Using QDS for Spark, you can deliver machine learning applications that turn your data into actionable predictive intelligence, including recommendation engines, sentiment analysis, fraud detection, customer segmentation and many other applications.

The Spark Notebook for Interactive Queries and Iterative Algorithm Development

Spark’s in-memory capabilities can provide for faster interactive exploration. This works well with a Spark Notebook, an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media.

QDS preserves the Spark Notebook even when clusters are not in use.

Define Your Own Project

Spark provides support for additional use cases. Spark Streaming is useful for real-time processing of streaming data such as log files. Spark SQL supports relational data processing. And, because Spark typically caches recently-read data in memory, applications requiring fast SQL execution benefit from Spark’s speed advantage over slower running Hadoop MapReduce jobs.

How Does Spark Fit into the QDS Landscape?

QDS gives you the freedom to work with Spark, Hadoop MapReduce, Presto, and Hive as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. While Spark is great for machine learning and other use cases that benefit from in-memory data and fast response time, Hive and MapReduce are tried and proven for batch workloads where reliability and stability are of highest importance, Similarly, Presto is a proven scalable SQL engine for simple, interactive analysis at companies such as Facebook, Netflix, Airbnb, and more.

Get Started Now

To help accelerate adoption of big data tools such as Spark running on the AWS Cloud, Qubole is offering a promotion for commercial AWS users in which AWS will cover two weeks of AWS usage for Proof-of-Concepts based on eligibility. Let us fund your POC!