关于NoSQL的一些理论

本文探讨了大数据的三大特征:容量、种类和速度,并详细介绍了NoSQL数据库的发展背景及四种主要类型:键值存储、列族存储、文档存储和图数据库。此外,还对比了ACID与BASE数据存储模型的特点,并解释了CAP理论在分布式系统中的应用。

1.     The three Vs of big data

Volume: High volumes of data ranging from dozens of terabytes, and even petabytes.

Variety: Data that's organized in multiple structures, ranging from raw text (which, from a computer's perspective, has little or no discernible structure — many people call this unstructured data) to log files (commonly referred to as being semistructured) to data ordered in strongly typed rows and columns (structured data). To make things even more confusing, some data sets even include portions of all three kinds of data. (This is known as multistructured data.)

Velocity: Data that enters your organization and has some kind of value for a limited window of time — a window that usually shuts well before the data has been transformed and loaded into a data warehouse for deeper analysis (for example, financial securities ticker data, which may reveal a buying opportunity, but only for a short while). The higher the volumes of data entering your organization per second, the bigger your velocity challenge.

2.     NoSQL Theories

NoSQL Data Stores

NoSQL data stores originally subscribed to the notion "Just Say No to SQL" (to paraphrase from an anti-drug advertising campaign in the 1980s), and they were a reaction to the perceived limitations of (SQL-based) relational databases. It's not that these folks hated SQL, but they were tired of forcing square pegs into round holes by solving problems that relational databases weren't designed for. A relational database is a powerful tool, but for some kinds of data (like key-value pairs, or graphs) and some usage patterns (like extremely large scale storage) a relational database just isn't practical. And when it comes to high-volume storage, relational database can be expensive, both in terms of database license costs and hardware costs. (Relational databases are designed to work with enterprise-grade hardware.) So, with the NoSQL movement, creative programmers developed dozens of solutions for different kinds of thorny data storage and processing problems. These NoSQL databases typically provide massive scalability by way of clustering, and are often designed to enable high throughput and low latency.

 REMEMBER  The name NoSQL is somewhat misleading because many databases that fit the category do have SQL support (rather than "NoSQL" support). Think of its name instead as "Not Only SQL."

The NoSQL offerings available today can be broken down into four distinct categories, based on their design and purpose:

·         Key-value stores: This offering provides a way to store any kind of data without having to use a schema. This is in contrast to relational databases, where you need to define the schema (the table structure) before any data is inserted. Since key-value stores don't require a schema, you have great flexibility to store data in many formats. In a key-value store, a row simply consists of a key (an identifier) and a value, which can be anything from an integer value to a large binary data string. Many implementations of key-value stores are based on Amazon's Dynamo paper.

·         Column family stores: Here you have databases in which columns are grouped into column families and stored together on disk.

 TECHNICAL STUFF  Strictly speaking, many of these databases aren't column-oriented, because they're based on Google's BigTable paper, which stores data as a multidimensional sorted map. (For more on the role of Google's BigTable paper on database design, see Chapter 12.)

·         Document stores: This offering relies on collections of similarly encoded and formatted documents to improve efficiencies. Document stores enable individual documents in a collection to include only a subset of fields, so only the data that's needed is stored. For sparse data sets, where many fields are often not populated, this can translate into significant space savings. By contrast, empty columns in relational database tables do take up space. Document stores also enables schema flexibility, because only the fields that are needed are stored, and new fields can be added. Again, in contrast to relational databases, table structures are defined up front before data is stored, and changing columns is a tedious task that impacts the entire data set.

·         Graph databases: Here you have databases that store graph structures — representations that show collections of entities (vertices or nodes) and their relationships (edges) with each other. These structures enable graph databases to be extremely well suited for storing complex structures, like the linking relationships between all known web pages. (For example, individual web pages are nodes, and the edges connecting them are links from one page to another.) Google, of course, is all over graph technology, and invented a graph processing engine called Pregel to power its PageRank algorithm. (And yes, there's a white paper on Pregel.) In the Hadoop community, there's an Apache project called Giraph (based on the Pregel paper), which is a graph processing engine designed to process graphs stored in HDFS.

 REMEMBER  The data storage and processing options available in Hadoop are in many cases implementations of the NoSQL categories listed here. This will help you better evaluate solutions that are available to you and see how Hadoop can complement traditional data warehouses.

ACID versus BASE Data Stores

One hallmark of relational database systems is something known as ACID compliance. As you might have guessed, ACID is an acronym — the individual letters, meant to describe a characteristic of individual database transactions, can be expanded as described in this list:

·         Atomicity: The database transaction must completely succeed or completely fail. Partial success is not allowed.

·         Consistency: During the database transaction, the RDBMS progresses from one valid state to another. The state is never invalid.

·         Isolation: The client's database transaction must occur in isolation from other clients attempting to transact with the RDBMS.

·         Durability: The data operation that was part of the transaction must be reflected in nonvolatile storage (computer memory that can retrieve stored information even when not powered – like a hard disk) and persist after the transaction successfully completes. Transaction failures cannot leave the data in a partially committed state.

Certain use cases for RDBMSs, like online transaction processing, depend on ACID-compliant transactions between the client and the RDBMS for the system to function properly. A great example of an ACID-compliant transaction is a transfer of funds from one bank account to another. This breaks down into two database transactions, where the originating account shows a withdrawal, and the destination account shows a deposit. Obviously, these two transactions have to be tied together in order to be valid so that if either of them fail, the whole operation must fail to ensure both balances remain valid.

Hadoop itself has no concept of transactions (or even records, for that matter), so it clearly isn't an ACID-compliant system. Thinking more specifically about data storage and processing projects in the entire Hadoop ecosystem (we tell you more about these projects later in this chapter), none of them is fully ACID-compliant, either. However, they do reflect properties that you often see in NoSQL data stores, so there is some precedent to the Hadoop approach.

One key concept behind NoSQL data stores is that not every application truly needs ACID-compliant transactions. Relaxing on certain ACID properties (and moving away from the relational model) has opened up a wealth of possibilities, which have enabled some NoSQL data stores to achieve massive scalability and performance for their niche applications. Whereas ACID defines the key characteristics required for reliable transaction processing, the NoSQL world requires different characteristics to enable flexibility and scalability. These opposing characteristics are cleverly captured in the acronym BASE:

·         Basically Available: The system is guaranteed to be available for querying by all users. (No isolation here.)

·         Soft State: The values stored in the system may change because of the eventual consistency model, as described in the next bullet.

·         Eventually Consistent: As data is added to the system, the system's state is gradually replicated across all nodes. For example, in Hadoop, when a file is written to the HDFS, the replicas of the data blocks are created in different data nodes after the original data blocks have been written. For the short period before the blocks are replicated, the state of the file system isn't consistent.

The acronym BASE is a bit contrived, as most NoSQL data stores don't completely abandon all the ACID characteristics — it's not really the polar opposite concept that the name implies, in other words. Also, the Soft State and Eventually Consistent characteristics amount to the same thing, but the point is that by relaxing consistency, the system can horizontally scale (many nodes) and ensure availability.

CAP Theory

TECHNICAL STUFF  No discussion of NoSQL would be complete without mentioning the CAP theorem, which represents the three kinds of guarantees that architects aim to provide in their systems:

·         Consistency: Similar to the C in ACID, all nodes in the system would have the same view of the data at any time.

·         Availability: The system always responds to requests.

·         Partition tolerance: The system remains online if network problems occur between system nodes.

The CAP theorem states that in distributed networked systems, architects have to choose two of these three guarantees — you can't promise your users all three. That leaves you with the three possibilities shown in Figure 11-1:

·         Systems using traditional relational technologies normally aren't partition tolerant, so they can guarantee consistency and availability. In short, if one part of these traditional relational technologies systems is offline, the whole system is offline.

·         Systems where partition tolerance and availability are of primary importance can't guarantee consistency, because updates (that destroyer of consistency) can be made on either side of the partition. The key-value stores Dynamo and CouchDB and the column-family store Cassandra are popular examples of partition tolerant/availability (PA) systems.

·         Systems where partition tolerance and consistency are of primary importance can't guarantee availability because the systems return errors until the partitioned state is resolved.

 REMEMBER  Hadoop-based data stores are considered CP systems (consistent and partition tolerant). With data stored redundantly across many slave nodes, outages to large portions (partitions) of a Hadoop cluster can be tolerated. Hadoop is considered to be consistent because it has a central metadata store (the NameNode) which maintains a single, consistent view of data stored in the cluster. We can't say that Hadoop guarantees availability, because if the NameNode fails applications cannot access data in the cluster.

 

本 PPT 介绍了制药厂房中供配电系统的总体概念与设计要点,内容包括: 洁净厂房的特点及其对供配电系统的特殊要求; 供配电设计的一般原则与依据的国家/行业标准; 从上级电网到工厂变电所、终端配电的总体结构与模块化设计思路; 供配电范围:动力配电、照明、通讯、接地、防雷与消防等; 动力配电中电压等级、接地系统形式(如 TN-S)、负荷等级与可靠性、UPS 配置等; 照明的电源方式、光源选择、安装方式、应急与备用照明要求; 通讯系统、监控系统在生产管理与消防中的作用; 接地与等电位连接、防雷等级与防雷措施; 消防设施及其专用供电(消防泵、排烟风机、消防控制室、应急照明等); 常见高压柜、动力柜、照明箱等配电设备案例及部分设计图纸示意; 公司已完成的典型项目案例。 1. 工程背景与总体框架 所属领域:制药厂房工程的公用工程系统,其中本 PPT 聚焦于供配电系统。 放在整个公用工程中的位置:与给排水、纯化水/注射用水、气体与热力、暖通空调、自动化控制等系统并列。 2. Part 01 供配电概述 2.1 洁净厂房的特点 空间密闭,结构复杂、走向曲折; 单相设备、仪器种类多,工艺设备昂贵、精密; 装修材料与工艺材料种类多,对尘埃、静电等更敏感。 这些特点决定了:供配电系统要安全可靠、减少积尘、便于清洁和维护。 2.2 供配电总则 供配电设计应满足: 可靠、经济、适用; 保障人身与财产安全; 便于安装与维护; 采用技术先进的设备与方案。 2.3 设计依据与规范 引用了大量俄语标准(ГОСТ、СНиП、SanPiN 等)以及国家、行业和地方规范,作为设计的法规基础文件,包括: 电气设备、接线、接地、电气安全; 建筑物电气装置、照明标准; 卫生与安全相关规范等。 3. Part 02 供配电总览 从电源系统整体结构进行总览: 上级:地方电网; 工厂变电所(10kV 配电装置、变压
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值