Design Goals
ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace
ZooKeeperdata is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers.
ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated overa sets of hosts called an ensemble.
| ZooKeeper Service |
![]() |
Clients connect(TCP) to a single ZooKeeper server. If the TCP connection to the server breaks, the client will connect to a different server.
ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions.
ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
Data model and the hierarchical namespace
| ZooKeeper's Hierarchical Namespace |
![]() |
Nodes and ephemeral nodes
Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children.
Znodes maintain a stat structure that includes
version numbers for data changes,
ACL changes,
and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases.
For instance, whenever a client retrieves data it also receives the version of the data.
The data stored at each znode in a namespace is read and written atomically.
ZooKeeper also has the notion of ephemeral(临时) nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted.
Conditional updates and watches
Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification. These can be used to [tbd].
Simple API
it supports only these operations:
-
create
-
creates a node at a location in the tree
delete
-
deletes a node
exists
-
tests if a node exists at a location
get data
-
reads the data from a node
set data
-
writes data to a node
get children
-
retrieves a list of children of a node
sync
-
waits for data to be propagated //
Implementation
ZooKeeper Components shows the high-level components of the ZooKeeper service.
| ZooKeeper Components |
![]() |
The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, andwrites are serialized to disk before they are applied to the in-memory database.
As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.
ZooKeeper是一款高性能的分布式协调服务,通过内存中的复制数据库实现数据一致性。它支持简单的API,包括创建、删除节点等操作,并能为分布式应用提供有序、快速的数据访问。其数据模型基于层次化的命名空间,能够支持临时节点和条件更新。



509

被折叠的 条评论
为什么被折叠?



