Apache Spark as a Service
Enterprise Spark on QDS
QDS makes Spark enterprise ready, delivering simple, fast, cost effective and secure Spark processing on the AWS Cloud and Google Cloud Platform. QDS provides the flexibility you need to succeed, focusing on simplifying your time to deployment, making your business users self-sufficient, and accelerating your time to value.
Spark is an open source analytic engine for fast large-scale data processing. Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3). Spark supports in-memory processing to boost the performance of big data analytics applications and also supports disk-based processing.

Complete Cluster Life Cycle Management
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Spark clusters. QDS does all the heavy lifting for you.

Quick and Easy Spark Debugging
QDS makes it easy to debug both active and historical jobs with a Spark Application UI. Results and logs are always available even without active running clusters.

Instance Selection Options
QDS supports a wide variety of Amazon EC2 instance types for your Spark workload, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

Spot Instance
Pricing
QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.

Elastic Pricing
Model
With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.

Extensive
User Interfaces
QDS gives you user interface options to match your use case. The Spark Notebook and a web-based UI are suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.

Amazon Virtual Private Cloud (VPC) Support
QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.
When Should I Use QDS for Spark?
QDS for Spark is ideal for data scientists that use a combination of machine learning, SQL and advanced statistical analysis, and graph processing. QDS supports all Spark Libraries including: MLlib (machine learning), GraphX (graph processing), Spark SQL and Spark Streaming as well as SparkR for running Spark on R, a widely used statistic programming language. With support for Spark Notebooks, you can do quick and iterative development of algorithms using an interactive UI.
Machine Learning and Advanced Analytics
By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well suited to processing machine learning algorithms. In addition, Spark’s MLlib provides common machine learning algorithms such as classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives. SparkR lets you apply many additional algorithms.
Your data isn’t useful until you use it to drive decision making. Using QDS for Spark, you can deliver machine learning applications that turn your data into actionable predictive intelligence, including recommendation engines, sentiment analysis, fraud detection, customer segmentation and many other applications.
The Spark Notebook for Interactive Queries and Iterative Algorithm Development
Spark’s in-memory capabilities can provide for faster interactive exploration. This works well with a Spark Notebook, an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media.
QDS preserves the Spark Notebook even when clusters are not in use.

Define Your Own Project
Spark provides support for additional use cases. Spark Streaming is useful for real-time processing of streaming data such as log files. Spark SQL supports relational data processing. And, because Spark typically caches recently-read data in memory, applications requiring fast SQL execution benefit from Spark’s speed advantage over slower running Hadoop MapReduce jobs.
How Does Spark Fit into the QDS Landscape?
QDS gives you the freedom to work with Spark, Hadoop MapReduce, Presto, and Hive as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. While Spark is great for machine learning and other use cases that benefit from in-memory data and fast response time, Hive and MapReduce are tried and proven for batch workloads where reliability and stability are of highest importance, Similarly, Presto is a proven scalable SQL engine for simple, interactive analysis at companies such as Facebook, Netflix, Airbnb, and more.
Get Started Now
To help accelerate adoption of big data tools such as Spark running on the AWS Cloud, Qubole is offering a promotion for commercial AWS users in which AWS will cover two weeks of AWS usage for Proof-of-Concepts based on eligibility. Let us fund your POC!