关于NoSQL的一些理论

本文探讨了大数据的三大特征:容量、种类和速度,并详细介绍了NoSQL数据库的发展背景及四种主要类型:键值存储、列族存储、文档存储和图数据库。此外,还对比了ACID与BASE数据存储模型的特点,并解释了CAP理论在分布式系统中的应用。

1.     The three Vs of big data

Volume: High volumes of data ranging from dozens of terabytes, and even petabytes.

Variety: Data that's organized in multiple structures, ranging from raw text (which, from a computer's perspective, has little or no discernible structure — many people call this unstructured data) to log files (commonly referred to as being semistructured) to data ordered in strongly typed rows and columns (structured data). To make things even more confusing, some data sets even include portions of all three kinds of data. (This is known as multistructured data.)

Velocity: Data that enters your organization and has some kind of value for a limited window of time — a window that usually shuts well before the data has been transformed and loaded into a data warehouse for deeper analysis (for example, financial securities ticker data, which may reveal a buying opportunity, but only for a short while). The higher the volumes of data entering your organization per second, the bigger your velocity challenge.

2.     NoSQL Theories

NoSQL Data Stores

NoSQL data stores originally subscribed to the notion "Just Say No to SQL" (to paraphrase from an anti-drug advertising campaign in the 1980s), and they were a reaction to the perceived limitations of (SQL-based) relational databases. It's not that these folks hated SQL, but they were tired of forcing square pegs into round holes by solving problems that relational databases weren't designed for. A relational database is a powerful tool, but for some kinds of data (like key-value pairs, or graphs) and some usage patterns (like extremely large scale storage) a relational database just isn't practical. And when it comes to high-volume storage, relational database can be expensive, both in terms of database license costs and hardware costs. (Relational databases are designed to work with enterprise-grade hardware.) So, with the NoSQL movement, creative programmers developed dozens of solutions for different kinds of thorny data storage and processing problems. These NoSQL databases typically provide massive scalability by way of clustering, and are often designed to enable high throughput and low latency.

 REMEMBER  The name NoSQL is somewhat misleading because many databases that fit the category do have SQL support (rather than "NoSQL" support). Think of its name instead as "Not Only SQL."

The NoSQL offerings available today can be broken down into four distinct categories, based on their design and purpose:

·         Key-value stores: This offering provides a way to store any kind of data without having to use a schema. This is in contrast to relational databases, where you need to define the schema (the table structure) before any data is inserted. Since key-value stores don't require a schema, you have great flexibility to store data in many formats. In a key-value store, a row simply consists of a key (an identifier) and a value, which can be anything from an integer value to a large binary data string. Many implementations of key-value stores are based on Amazon's Dynamo paper.

·         Column family stores: Here you have databases in which columns are grouped into column families and stored together on disk.

 TECHNICAL STUFF  Strictly speaking, many of these databases aren't column-oriented, because they're based on Google's BigTable paper, which stores data as a multidimensional sorted map. (For more on the role of Google's BigTable paper on database design, see Chapter 12.)

·         Document stores: This offering relies on collections of similarly encoded and formatted documents to improve efficiencies. Document stores enable individual documents in a collection to include only a subset of fields, so only the data that's needed is stored. For sparse data sets, where many fields are often not populated, this can translate into significant space savings. By contrast, empty columns in relational database tables do take up space. Document stores also enables schema flexibility, because only the fields that are needed are stored, and new fields can be added. Again, in contrast to relational databases, table structures are defined up front before data is stored, and changing columns is a tedious task that impacts the entire data set.

·         Graph databases: Here you have databases that store graph structures — representations that show collections of entities (vertices or nodes) and their relationships (edges) with each other. These structures enable graph databases to be extremely well suited for storing complex structures, like the linking relationships between all known web pages. (For example, individual web pages are nodes, and the edges connecting them are links from one page to another.) Google, of course, is all over graph technology, and invented a graph processing engine called Pregel to power its PageRank algorithm. (And yes, there's a white paper on Pregel.) In the Hadoop community, there's an Apache project called Giraph (based on the Pregel paper), which is a graph processing engine designed to process graphs stored in HDFS.

 REMEMBER  The data storage and processing options available in Hadoop are in many cases implementations of the NoSQL categories listed here. This will help you better evaluate solutions that are available to you and see how Hadoop can complement traditional data warehouses.

ACID versus BASE Data Stores

One hallmark of relational database systems is something known as ACID compliance. As you might have guessed, ACID is an acronym — the individual letters, meant to describe a characteristic of individual database transactions, can be expanded as described in this list:

·         Atomicity: The database transaction must completely succeed or completely fail. Partial success is not allowed.

·         Consistency: During the database transaction, the RDBMS progresses from one valid state to another. The state is never invalid.

·         Isolation: The client's database transaction must occur in isolation from other clients attempting to transact with the RDBMS.

·         Durability: The data operation that was part of the transaction must be reflected in nonvolatile storage (computer memory that can retrieve stored information even when not powered – like a hard disk) and persist after the transaction successfully completes. Transaction failures cannot leave the data in a partially committed state.

Certain use cases for RDBMSs, like online transaction processing, depend on ACID-compliant transactions between the client and the RDBMS for the system to function properly. A great example of an ACID-compliant transaction is a transfer of funds from one bank account to another. This breaks down into two database transactions, where the originating account shows a withdrawal, and the destination account shows a deposit. Obviously, these two transactions have to be tied together in order to be valid so that if either of them fail, the whole operation must fail to ensure both balances remain valid.

Hadoop itself has no concept of transactions (or even records, for that matter), so it clearly isn't an ACID-compliant system. Thinking more specifically about data storage and processing projects in the entire Hadoop ecosystem (we tell you more about these projects later in this chapter), none of them is fully ACID-compliant, either. However, they do reflect properties that you often see in NoSQL data stores, so there is some precedent to the Hadoop approach.

One key concept behind NoSQL data stores is that not every application truly needs ACID-compliant transactions. Relaxing on certain ACID properties (and moving away from the relational model) has opened up a wealth of possibilities, which have enabled some NoSQL data stores to achieve massive scalability and performance for their niche applications. Whereas ACID defines the key characteristics required for reliable transaction processing, the NoSQL world requires different characteristics to enable flexibility and scalability. These opposing characteristics are cleverly captured in the acronym BASE:

·         Basically Available: The system is guaranteed to be available for querying by all users. (No isolation here.)

·         Soft State: The values stored in the system may change because of the eventual consistency model, as described in the next bullet.

·         Eventually Consistent: As data is added to the system, the system's state is gradually replicated across all nodes. For example, in Hadoop, when a file is written to the HDFS, the replicas of the data blocks are created in different data nodes after the original data blocks have been written. For the short period before the blocks are replicated, the state of the file system isn't consistent.

The acronym BASE is a bit contrived, as most NoSQL data stores don't completely abandon all the ACID characteristics — it's not really the polar opposite concept that the name implies, in other words. Also, the Soft State and Eventually Consistent characteristics amount to the same thing, but the point is that by relaxing consistency, the system can horizontally scale (many nodes) and ensure availability.

CAP Theory

TECHNICAL STUFF  No discussion of NoSQL would be complete without mentioning the CAP theorem, which represents the three kinds of guarantees that architects aim to provide in their systems:

·         Consistency: Similar to the C in ACID, all nodes in the system would have the same view of the data at any time.

·         Availability: The system always responds to requests.

·         Partition tolerance: The system remains online if network problems occur between system nodes.

The CAP theorem states that in distributed networked systems, architects have to choose two of these three guarantees — you can't promise your users all three. That leaves you with the three possibilities shown in Figure 11-1:

·         Systems using traditional relational technologies normally aren't partition tolerant, so they can guarantee consistency and availability. In short, if one part of these traditional relational technologies systems is offline, the whole system is offline.

·         Systems where partition tolerance and availability are of primary importance can't guarantee consistency, because updates (that destroyer of consistency) can be made on either side of the partition. The key-value stores Dynamo and CouchDB and the column-family store Cassandra are popular examples of partition tolerant/availability (PA) systems.

·         Systems where partition tolerance and consistency are of primary importance can't guarantee availability because the systems return errors until the partitioned state is resolved.

 REMEMBER  Hadoop-based data stores are considered CP systems (consistent and partition tolerant). With data stored redundantly across many slave nodes, outages to large portions (partitions) of a Hadoop cluster can be tolerated. Hadoop is considered to be consistent because it has a central metadata store (the NameNode) which maintains a single, consistent view of data stored in the cluster. We can't say that Hadoop guarantees availability, because if the NameNode fails applications cannot access data in the cluster.

 

内容概要:本文介绍了一个基于MATLAB实现的无人机三维路径规划项目,采用蚁群算法(ACO)与多层感知机(MLP)相结合的混合模型(ACO-MLP)。该模型通过三维环境离散化建模,利用ACO进行全局路径搜索,并引入MLP对环境特征进行自适应学习与启发因子优化,实现路径的动态调整与多目标优化。项目解决了高维空间建模、动态障碍规避、局部最优陷阱、算法实时性及多目标权衡等关键技术难题,结合并行计算与参数自适应机制,提升了路径规划的智能性、安全性和工程适用性。文中提供了详细的模型架构、核心算法流程及MATLAB代码示例,涵盖空间建模、信息素更新、MLP训练与融合优化等关键步骤。; 适合人群:具备一定MATLAB编程基础,熟悉智能优化算法与神经网络的高校学生、科研人员及从事无人机路径规划相关工作的工程师;适合从事智能无人系统、自动驾驶、机器人导航等领域的研究人员; 使用场景及目标:①应用于复杂三维环境下的无人机路径规划,如城市物流、灾害救援、军事侦察等场景;②实现飞行安全、能耗优化、路径平滑与实时避障等多目标协同优化;③为智能无人系统的自主决策与环境适应能力提供算法支持; 阅读建议:此资源结合理论模型与MATLAB实践,建议读者在理解ACO与MLP基本原理的基础上,结合代码示例进行仿真调试,重点关注ACO-MLP融合机制、多目标优化函数设计及参数自适应策略的实现,以深入掌握混合智能算法在工程中的应用方法。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值