The leveldb
Opening A Database
A leveldb
#include <assert>
#include "leveldb/include/db.h"
leveldb::DB* db;
leveldb::Options options;
options.create_if_missing = true;
leveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
assert(status.ok());
...
Ifyou want to raise an error if the database already exists, add thefollowing line before the leveldb::DB::Open
options.error_if_exists = true;
Status
You may have noticed the leveldb::Status
leveldb
leveldb::Status s = ...;
if (!s.ok()) cerr << s.ToString() << endl;
Closing A Database
When you are done with a database, just delete the database object.Example:
... open the db as described above ...
... do something with db ...
delete db;
Reads And Writes
The database provides Put
, Delete
, and Get
std::string value;
leveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value);
if (s.ok()) s = db->Put(leveldb::WriteOptions(), key2, value);
if (s.ok()) s = db->Delete(leveldb::WriteOptions(), key1);
Atomic Updates
Note that if the process dies after the Put of key2 but before thedelete of key1, the same value may be left stored under multiplekeys. Such problems can be avoided by usingthe WriteBatch
#include "leveldb/include/write_batch.h"
...
std::string value;
leveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value);
if (s.ok()) {
leveldb::WriteBatch batch;
batch.Delete(key1);
batch.Put(key2, value);
s = db->Write(leveldb::WriteOptions(), &batch);
}
The WriteBatch
Delete
Put
key1
key2
,we
do not end up erroneously dropping the value entirely.
Apart from its atomicity benefits, WriteBatch
Synchronous Writes
Bydefault, each write toleveldb
sync
fsync(...)
fdatasync(...)
msync(...,MS_SYNC)
leveldb::WriteOptions write_options;
write_options.sync = true;
db->Put(write_options, ...);
Asynchronouswrites are often more than a thousand times as fast as synchronouswrites. The downside of asynchronous writes is that a crash of themachine may cause the last few updates
to be lost. Note that acrash of just the writing process (i.e., not a reboot) will notcause any loss since even when sync
Asynchronous writes can often be used safely. For example, whenloading a large amount of data into the database you can handlelost updates by restarting the bulk load after a crash. A hybridscheme is also possible where every Nth write is synchronous, andin the event of a crash, the bulk load is restarted just after thelast synchronous write finished by the previous run. (Thesynchronous write can update a marker that describes where torestart on a crash.)
WriteBatch
WriteBatch
write_options.sync
Concurrency
A database may only be opened by one process at a time.The leveldb
leveldb::DB
Iteration
The following example demonstrates how to print all key,value pairsin a database.
leveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions());
for (it->SeekToFirst(); it->Valid(); it->Next()) {
cout << it->key().ToString() << ": " << it->value().ToString() << endl;
}
assert(it->status().ok()); // Check for any errors found during the scan
delete it;
Thefollowing variation shows how to process just the keys in therange [start,limit)
:
for (it->Seek(start);
it->Valid() && it->key().ToString() < limit;
it->Next()) {
...
}
Youcan also process entries in reverse order. (Caveat: reverseiteration may be somewhat slower than forward iteration.)
for (it->SeekToLast(); it->Valid(); it->Prev()) {
...
}
Snapshots
Snapshots provide consistent read-only views over the entire stateof the key-value store. ReadOptions::snapshot
ReadOptions::snapshot
Snapshots typically are created by the DB::GetSnapshot()method:
leveldb::ReadOptions options;
options.snapshot = db->GetSnapshot();
... apply some updates to db ...
leveldb::Iterator* iter = db->NewIterator(options);
... read using iter to view the state when the snapshot was created ...
delete iter;
db->ReleaseSnapshot(options.snapshot);
Notethat when a snapshot is no longer needed, it should be releasedusing the DB::ReleaseSnapshot interface. This allows theimplementation to get rid of state that was being maintained
justto support reading as of that snapshot.
A Write operation can also return a snapshot that represents thestate of the database just after applying a particular set ofupdates:
leveldb::Snapshot* snapshot;
leveldb::WriteOptions write_options;
write_options.post_write_snapshot = &snapshot;
leveldb::Status status = db->Write(write_options, ...);
... perform other mutations to db ...
leveldb::ReadOptions read_options;
read_options.snapshot = snapshot;
leveldb::Iterator* iter = db->NewIterator(read_options);
... read as of the state just after the Write call returned ...
delete iter;
db->ReleaseSnapshot(snapshot);
Slice
The return value of the it->key()
it->value()
leveldb::Slice
Slice
Slice
std::string
leveldb
leveldb
C++ strings and null-terminated C-style strings can be easilyconverted to a Slice:
leveldb::Slice s1 = "hello";
std::string str("world");
leveldb::Slice s2 = str;
ASlice can be easily converted back to a C++ string:
std::string str = s1.ToString();
assert(str == std::string("hello"));
Becareful when using Slices since it is up to the caller to ensurethat the external byte array into which the Slice points remainslive while the Slice is in use. For example, the
following isbuggy:
leveldb::Slice slice;
if (...) {
std::string str = ...;
slice = str;
}
Use(slice);
Whenthe if
str
slice
Comparators
The preceding examples used the default ordering function for key,which orders bytes lexicographically. You can however supply acustom comparator when opening a database. For example, supposeeach database key consists of two numbers and we should sort by thefirst
number, breaking ties by the second number. First, define aproper subclass of leveldb::Comparator
class TwoPartComparator : public leveldb::Comparator {
public:
// Three-way comparison function:
// if a < b: negative result
// if a > b: positive result
// else: zero result
int Compare(const leveldb::Slice& a, const leveldb::Slice& b) const {
int a1, a2, b1, b2;
ParseKey(a, &a1, &a2);
ParseKey(b, &b1, &b2);
if (a1 < b1) return -1;
if (a1 > b1) return +1;
if (a2 < b2) return -1;
if (a2 > b2) return +1;
return 0;
}
// Ignore the following methods for now:
const char* Name() { return "TwoPartComparator"; }
void FindShortestSeparator(std::string*, const leveldb::Slice&) const { }
void FindShortSuccessor(std::string*) const { }
};
Nowcreate a database using this custom comparator:
TwoPartComparator cmp;
leveldb::DB* db;
leveldb::Options options;
options.create_if_missing = true;
options.comparator = &cmp;
leveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
...
Backwards compatibility
The result of the comparator's Name
leveldb::DB::Open
You can however still gradually evolve your key format over timewith a little bit of pre-planning. For example, you could store aversion number at the end of each key (one byte should suffice formost uses). When you wish to switch to a new key format (e.g.,adding
an optional third part to the keys processed byTwoPartComparator
), (a) keep the samecomparator name (b) increment the version number for new keys (c)change the comparator function so it uses the version numbers foundin the keys
to decide how to interpret them.
Performance
Performance can be tuned by changing the default values of thetypes defined in leveldb/include/options.h
.
Block size
leveldb
Compression
Each block is individually compressed before being written topersistent storage. Compression is on by default since the defaultcompression method is very fast, and is automatically disabled foruncompressible data. In rare cases, applications may want todisable compression entirely, but should only do so if benchmarksshow a performance improvement:
leveldb::Options options;
options.compression = leveldb::kNoCompression;
... leveldb::DB::Open(options, name, ...) ....
Cache
The contents of the database are stored in a set of files in thefilesystem and each file stores a sequence of compressed blocks.If options.cache
#include "leveldb/include/cache.h"
leveldb::Options options;
options.cache = leveldb::NewLRUCache(100 * 1048576); // 100MB cache
leveldb::DB* db;
leveldb::DB::Open(options, name, &db);
... use the db ...
delete db
delete options.cache;
Notethat the cache holds uncompressed data, and therefore it should besized according to application level data sizes, without anyreduction from compression. (Caching of compressed
blocks is leftto the operating system buffer cache, or anycustom Env
When performing a bulk read, the application may wish to disablecaching so that the data processed by the bulk read does not end updisplacing most of the cached contents. A per-iterator option canbe used to achieve this:
leveldb::ReadOptions options;
options.fill_cache = false;
leveldb::Iterator* it = db->NewIterator(options);
for (it->SeekToFirst(); it->Valid(); it->Next()) {
...
}
Key Layout
Note that the unit of disk transfer and caching is a block.Adjacent keys (according to the database sort order) will usuallybe placed in the same block. Therefore the application can improveits performance by placing keys that are accessed together neareach other and placing infrequently used keys in a separate regionof the key space.
For example, suppose we are implementing a simple file system ontop of leveldb
.The types of entries we might wish to store are:
filename -> permission-bits, length, list of file_block_ids
file_block_id -> data
Wemight want to prefix filename
file_block_id
Checksums
leveldb
ReadOptions::verify_checksums
maybe set to true to force checksum verification of all data that isread from the file system on behalf of a particular read. Bydefault, no such verification is done. Options::paranoid_checks
maybe set to true before opening a database to make the databaseimplementation raise an error as soon as it detects an internalcorruption. Depending on which portion of the database has beencorrupted, the error may be raised when the database is opened, orlater by another database operation. By default, paranoid checkingis off so that the database can be used even if parts of itspersistent storage have been corrupted. If a database is corrupted (perhaps it cannot be opened whenparanoid checking is turned on), the
leveldb::RepairDB
functionmay be used to recover as much of the data as possible
Approximate Sizes
The GetApproximateSizes
leveldb::Range ranges[2];
ranges[0] = leveldb::Range("a", "c");
ranges[1] = leveldb::Range("x", "z");
uint64_t sizes[2];
leveldb::Status s = db->GetApproximateSizes(ranges, 2, sizes);
Thepreceding call will set sizes[0]
[a..c)
sizes[1]
[x..z)
.
Environment
All file operations (and other operating system calls) issued bythe leveldb
leveldb::Env
Env
leveldb
onother activities
in the system.
class SlowEnv : public leveldb::Env {
.. implementation of the Env interface ...
};
SlowEnv env;
leveldb::Options options;
options.env = &env;
Status s = leveldb::DB::Open(options, ...);
Porting
leveldb
leveldb/port/port.h
. Seeleveldb/port/port_example.h
In addition, the new platform may need a newdefault leveldb::Env
leveldb/util/env_posix.h
Other Information
Details about the leveldb