本文翻译自:Ways to implement data versioning in MongoDB
Can you share your thoughts how would you implement data versioning in MongoDB. 您能否分享您的想法,如何在MongoDB中实现数据版本控制。 (I've asked similar question regarding Cassandra . If you have any thoughts which db is better for that please share) (我也曾问过有关Cassandra的类似问题 。如果您有任何想法,哪个数据库更好,请分享)
Suppose that I need to version records in an simple address book. 假设我需要在一个简单的通讯簿中对记录进行版本控制。 (Address book records are stored as flat json objects). (地址簿记录存储为平面json对象)。 I expect that the history: 我希望这段历史:
- will be used infrequently 将很少使用
- will be used all at once to present it in a "time machine" fashion 将一次全部使用,以“时间机器”的方式呈现
- there won't be more versions than few hundred to a single record. 单个记录的版本不会超过几百个。 history won't expire. 历史不会过期。
I'm considering the following approaches: 我正在考虑以下方法:
Create a new object collection to store history of records or changes to the records. 创建一个新的对象集合以存储记录的历史记录或对记录的更改。 It would store one object per version with a reference to the address book entry. 它将在每个版本中存储一个对象,并引用地址簿条目。 Such records would looks as follows: 这样的记录如下:
\n{ {\n '_id': 'new id', '_id':'新ID',\n 'user': user_id, '用户':user_id,\n 'timestamp': timestamp, '时间戳':时间戳,\n 'address_book_id': 'id of the address book record' 'address_book_id':'通讯录记录的ID' \n 'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...} 'old_record':{'first_name':'Jon','last_name':'Doe'...}\n} }\n
This approach can be modified to store an array of versions per document. 可以修改此方法以存储每个文档的版本数组。 But this seems to be slower approach without any advantages. 但这似乎是较慢的方法,没有任何优势。
Store versions as serialized (JSON) object attached to address book entries. 将版本存储为附加到地址簿条目的序列化(JSON)对象。 I'm not sure how to attach such objects to MongoDB documents. 我不确定如何将此类对象附加到MongoDB文档。 Perhaps as an array of strings. 也许作为字符串数组。 ( Modelled after Simple Document Versioning with CouchDB ) ( 以使用CouchDB的简单文档版本控制为模型 )
#1楼
参考:https://stackoom.com/question/hyJH/在MongoDB中实现数据版本控制的方法
#2楼
If you're looking for a ready-to-roll solution - 如果您正在寻找现成的解决方案-
Mongoid has built in simple versioning Mongoid内置了简单的版本控制
http://mongoid.org/en/mongoid/docs/extras.html#versioning http://mongoid.org/en/mongoid/docs/extras.html#versioning
mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo mongoid-history是一个Ruby插件,提供了更加复杂的解决方案,包括审计,撤消和重做
https://github.com/aq1018/mongoid-history https://github.com/aq1018/mongoid-history
#3楼
There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies. 有一个称为“ Vermongo”的版本控制方案,该方案解决了其他答复中未涉及的某些方面。
One of these issues is concurrent updates, another one is deleting documents. 这些问题之一是并发更新,另一问题是删除文档。
Vermongo stores complete document copies in a shadow collection. Vermongo将完整的文档副本存储在影子集合中。 For some use cases this might cause too much overhead, but I think it also simplifies many things. 在某些用例中,这可能会导致过多的开销,但是我认为这也简化了很多事情。
https://github.com/thiloplanz/v7files/wiki/Vermongo https://github.com/thiloplanz/v7files/wiki/Vermongo
#4楼
I worked through this solution that accommodates a published, draft and historical versions of the data: 我研究了该解决方案,该解决方案可容纳数据的已发布,草稿和历史版本:
{
published: {},
draft: {},
history: {
"1" : {
metadata: <value>,
document: {}
},
...
}
}
I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/ 我在这里进一步解释该模型: http : //software.danielwatrous.com/representing-revision-data-in-mongodb/
For those that may implement something like this in Java , here's an example: 对于那些可能在Java中实现类似代码的人 ,下面是一个示例:
http://software.danielwatrous.com/using-java-to-work-with-versioned-data/ http://software.danielwatrous.com/using-java-to-work-with-versioned-data/
Including all the code that you can fork, if you like 如果愿意,包括所有可以分叉的代码
https://github.com/dwatrous/mongodb-revision-objects https://github.com/dwatrous/mongodb-revision-objects
#5楼
Here's another solution using a single document for the current version and all old versions: 这是针对当前版本和所有旧版本使用单个文档的另一种解决方案:
{
_id: ObjectId("..."),
data: [
{ vid: 1, content: "foo" },
{ vid: 2, content: "bar" }
]
}
data
contains all versions. data
包含所有版本。 The data
array is ordered , new versions will only get $push
ed to the end of the array. data
数组是有序的 ,新版本只会将$push
pushed到数组末尾。 data.vid
is the version id, which is an incrementing number. data.vid
是版本ID,是递增的数字。
Get the most recent version: 获取最新版本:
find(
{ "_id":ObjectId("...") },
{ "data":{ $slice:-1 } }
)
Get a specific version by vid
: 通过vid
获取特定版本:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } } }
)
Return only specified fields: 仅返回指定的字段:
find(
{ "_id":ObjectId("...") },
{ "data":{ $elemMatch:{ "vid":1 } }, "data.content":1 }
)
Insert new version: (and prevent concurrent insert/update) 插入新版本:(并防止并发插入/更新)
update(
{
"_id":ObjectId("..."),
$and:[
{ "data.vid":{ $not:{ $gt:2 } } },
{ "data.vid":2 }
]
},
{ $push:{ "data":{ "vid":3, "content":"baz" } } }
)
2
is the vid
of the current most recent version and 3
is the new version getting inserted. 2
是当前最新版本的vid
, 3
是插入的新版本。 Because you need the most recent version's vid
, it's easy to do get the next version's vid
: nextVID = oldVID + 1
. 因为您需要最新版本的vid
,所以很容易获得下一个版本的vid
: nextVID = oldVID + 1
。
The $and
condition will ensure, that 2
is the latest vid
. $and
条件将确保2
是最新的vid
。
This way there's no need for a unique index, but the application logic has to take care of incrementing the vid
on insert. 这样,就不需要唯一索引,但是应用程序逻辑必须注意在插入时增加vid
。
Remove a specific version: 删除特定版本:
update(
{ "_id":ObjectId("...") },
{ $pull:{ "data":{ "vid":2 } } }
)
That's it! 而已!
(remember the 16MB per document limit) (记住每个文档16MB的限制)
#6楼
The first big question when diving in to this is "how do you want to store changesets" ? 深入探讨这个问题时,第一个大问题是“您如何存储变更集” ?
- Diffs? 差吗?
- Whole record copies? 完整记录副本?
My personal approach would be to store diffs. 我个人的方法是存储差异。 Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection. 因为这些差异的显示确实是一个特殊的动作,所以我会将差异放入另一个“历史”集合中。
I would use the different collection to save memory space. 我将使用其他集合来节省内存空间。 You generally don't want a full history for a simple query. 通常,您不希望简单查询的完整历史记录。 So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried. 因此,通过将历史记录保留在对象之外,您还可以在查询数据时将其保留在通常访问的内存之外。
To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. 为了使我的生活更轻松,我将使历史文档包含带有时间戳记的差异字典。 Something like this: 像这样:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. 为了使我的生活变得真正轻松,我将把这一部分用于访问数据的DataObjects(EntityWrapper,无论如何)。 Generally these objects have some form of history, so that you can easily override the save()
method to make this change at the same time. 通常,这些对象具有某种形式的历史记录,因此您可以轻松地覆盖save()
方法来同时进行此更改。
UPDATE: 2015-10 更新:2015-10
It looks like there is now a spec for handling JSON diffs . 看起来现在有了处理JSON差异的规范 。 This seems like a more robust way to store the diffs / changes. 这似乎是存储差异/更改的更可靠的方法。