Real SQL Programming

目录

SQL in Real Programs

Options

Stored Procedures

Advantages of Stored Procedures

Parameters in PSM


SQL in Real Programs

  • We have seen only how SQL is used at the generic query interface --- an environment where we sit at a terminal and ask queries of a database.

  • Reality is almost always different:

conventional programs interacting with SQL.

(我们只是看见了SQL是如何在常规的查询接口中的使用,也就是直接通过终端 对数据库进行交互,但实际上通常是常规程序与SQL语言的交互)

Options

  • Code in a specialized language is stored in the database itself (e.g.PSM, PL/SQL).
  • SQL statements are embedded in a host language (e.g., C).
  • Connection tools are used to allow a conventional language to access a database (e.g., CLI, JDBC, PHP/DB).

(常规程序与SQL语言的交互模式: 数据库储存代码、SQL内嵌、连接工具)

Stored Procedures

  • PSM, or “persistent stored modules, ” allows us to store procedures as database schema elements.(持久储存模块允许我们将储存模式作为数据库模式元素)

  • PSM = a mixture of conventional statements (if, while, etc.) and SQL.

  • Lets us do things we cannot do in SQL alone.

数据库中的储存过程和函数的区别在于,函数可以直接通过函数名的引用得到函数值,但是储存过程不行

Advantages of Stored Procedures

  • Share Application Logic(共享应用逻辑)
  • Shield Database Schema Details(屏蔽数据库细节)
  • Provide Security Mechanisms(提供安全机制 )
  • Improve Performance(提升性能)
  • Reduce Network Traffic(减少网络流量)

Parameters in PSM

Unlike the usual name-type pairs in languages like C, PSM uses mode- name-type triples, where the mode can be:

  • IN = procedure uses value, does not change value.(传入参数,只能调用不能修改)

  • OUT = procedure changes, does not use.(传出参数,可以在储存过程被修改)

  • INOUT = both

想得到储存过程的返回值,虽然不能直接通过调用储存过程名字,但是可以调用OUT参数

### HiveSQL vs SparkSQL: Differences and Similarities HiveSQL and SparkSQL are both SQL-like query languages used for processing and analyzing large datasets stored in distributed systems. However, they differ in terms of performance, execution engine, integration, and optimization techniques. #### Similarities Both HiveSQL and SparkSQL are built to work seamlessly with Hadoop ecosystems and support querying data stored in HDFS or other compatible storage systems. They also provide support for structured data processing and allow users to write SQL-like queries to interact with data[^1]. Additionally, both systems can integrate with Hive Metastore to access metadata and tables stored in Hive. HiveSQL was originally designed to run on top of MapReduce, while SparkSQL can also work with Hive by enabling Hive support during the build process using the `-Phive` and `-Phive-thriftserver` flags[^1]. #### Differences One of the key differences between HiveSQL and SparkSQL lies in their execution engines. HiveSQL traditionally uses MapReduce for query execution, which can be slower due to disk-based processing. In contrast, SparkSQL uses an in-memory computing engine, which significantly improves performance by reducing disk I/O operations[^1]. Another major difference is in how they handle optimization. HiveSQL relies on the Hive optimizer, which may not be as advanced as the Catalyst optimizer used by SparkSQL. The Catalyst optimizer applies a series of rule-based and cost-based optimizations to improve query execution plans[^1]. SparkSQL also provides better integration with programming languages like Scala, Java, Python, and R, allowing developers to combine SQL queries with procedural code more easily. HiveSQL primarily focuses on batch processing, while SparkSQL supports both batch and streaming data processing. In terms of deployment, HiveSQL requires a separate Hive Metastore service to manage metadata, while SparkSQL can work with or without Hive Metastore depending on the configuration. When using Hive Metastore, SparkSQL needs the `hive-site.xml` file in the `conf/` directory along with the necessary JAR files for YARN clusters. #### Query Execution Example Here’s a simple example of executing a query in both HiveSQL and SparkSQL: **HiveSQL Example:** ```sql SELECT * FROM employees WHERE department = 'HR'; ``` **SparkSQL Example (in Scala):** ```scala val df = spark.sql("SELECT * FROM employees WHERE department = 'HR'") df.show() ``` Both examples demonstrate how to filter records based on a department condition, but the underlying execution and performance characteristics differ significantly. #### Performance Comparison SparkSQL generally outperforms HiveSQL in terms of query execution speed, especially for iterative algorithms and complex transformations. This is due to Spark's in-memory caching capabilities and efficient DAG execution model[^1]. HiveSQL, on the other hand, may require multiple MapReduce jobs to complete the same task, leading to higher latency[^1]. #### Use Cases HiveSQL is commonly used for data warehousing tasks where latency is acceptable, such as ETL processes and historical data analysis. SparkSQL is preferred for real-time analytics, machine learning pipelines, and applications requiring fast iterative processing. #### Integration with External Tools Both HiveSQL and SparkSQL can be integrated with BI tools like Tableau and Power BI for data visualization. However, SparkSQL offers better support for modern data processing frameworks and libraries, making it easier to build end-to-end data pipelines. #### Conclusion While HiveSQL and SparkSQL share similarities in their SQL-like syntax and ability to process structured data, their differences in execution engine, optimization, and integration make them suitable for different use cases. Choosing between the two depends on specific requirements such as performance needs, data processing complexity, and existing infrastructure.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值