在MongoDB中实现数据版本控制的方法

本文讨论如何在MongoDB中实现数据版本控制,提出了多种方法,包括创建新对象集合存储历史记录、将版本作为序列化JSON对象附加到条目、使用Mongoid的内置版本控制、采用Vermongo方案以及利用单个文档存储所有版本。同时,文章还提到了并发更新和删除文档的处理,以及如何选择存储差异或完整记录副本。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文翻译自:Ways to implement data versioning in MongoDB

Can you share your thoughts how would you implement data versioning in MongoDB. 您能否分享您的想法,如何在MongoDB中实现数据版本控制。 (I've asked similar question regarding Cassandra . If you have any thoughts which db is better for that please share) (我也曾问过有关Cassandra的类似问题 。如果您有任何想法,哪个数据库更好,请分享)

Suppose that I need to version records in an simple address book. 假设我需要在一个简单的通讯簿中对记录进行版本控制。 (Address book records are stored as flat json objects). (地址簿记录存储为平面json对象)。 I expect that the history: 我希望这段历史:

  • will be used infrequently 将很少使用
  • will be used all at once to present it in a "time machine" fashion 将一次全部使用,以“时间机器”的方式呈现
  • there won't be more versions than few hundred to a single record. 单个记录的版本不会超过几百个。 history won't expire. 历史不会过期。

I'm considering the following approaches: 我正在考虑以下方法:

  • Create a new object collection to store history of records or changes to the records. 创建一个新的对象集合以存储记录的历史记录或对记录的更改。 It would store one object per version with a reference to the address book entry. 它将在每个版本中存储一个对象,并引用地址簿条目。 Such records would looks as follows: 这样的记录如下:

    \n{ {\n '_id': 'new id', '_id':'新ID',\n 'user': user_id, '用户':user_id,\n 'timestamp': timestamp, '时间戳':时间戳,\n 'address_book_id': 'id of the address book record' 'address_book_id':'通讯录记录的ID' \n 'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...} 'old_record':{'first_name':'Jon','last_name':'Doe'...}\n} }\n

    This approach can be modified to store an array of versions per document. 可以修改此方法以存储每个文档的版本数组。 But this seems to be slower approach without any advantages. 但这似乎是较慢的方法,没有任何优势。

  • Store versions as serialized (JSON) object attached to address book entries. 将版本存储为附加到地址簿条目的序列化(JSON)对象。 I'm not sure how to attach such objects to MongoDB documents. 我不确定如何将此类对象附加到MongoDB文档。 Perhaps as an array of strings. 也许作为字符串数组。 ( Modelled after Simple Document Versioning with CouchDB ) 以使用CouchDB的简单文档版本控制为模型


#1楼

参考:https://stackoom.com/question/hyJH/在MongoDB中实现数据版本控制的方法


#2楼

If you're looking for a ready-to-roll solution - 如果您正在寻找现成的解决方案-

Mongoid has built in simple versioning Mongoid内置了简单的版本控制

http://mongoid.org/en/mongoid/docs/extras.html#versioning http://mongoid.org/en/mongoid/docs/extras.html#versioning

mongoid-history is a Ruby plugin that provides a significantly more complicated solution with auditing, undo and redo mongoid-history是一个Ruby插件,提供了更加复杂的解决方案,包括审计,撤消和重做

https://github.com/aq1018/mongoid-history https://github.com/aq1018/mongoid-history


#3楼

There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies. 有一个称为“ Vermongo”的版本控制方案,该方案解决了其他答复中未涉及的某些方面。

One of these issues is concurrent updates, another one is deleting documents. 这些问题之一是并发更新,另一问题是删除文档。

Vermongo stores complete document copies in a shadow collection. Vermongo将完整的文档副本存储在影子集合中。 For some use cases this might cause too much overhead, but I think it also simplifies many things. 在某些用例中,这可能会导致过多的开销,但是我认为这也简化了很多事情。

https://github.com/thiloplanz/v7files/wiki/Vermongo https://github.com/thiloplanz/v7files/wiki/Vermongo


#4楼

I worked through this solution that accommodates a published, draft and historical versions of the data: 我研究了该解决方案,该解决方案可容纳数据的已发布,草稿和历史版本:

{
  published: {},
  draft: {},
  history: {
    "1" : {
      metadata: <value>,
      document: {}
    },
    ...
  }
}

I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/ 我在这里进一步解释该模型: http : //software.danielwatrous.com/representing-revision-data-in-mongodb/

For those that may implement something like this in Java , here's an example: 对于那些可能在Java中实现类似代码的人 ,下面是一个示例:

http://software.danielwatrous.com/using-java-to-work-with-versioned-data/ http://software.danielwatrous.com/using-java-to-work-with-versioned-data/

Including all the code that you can fork, if you like 如果愿意,包括所有可以分叉的代码

https://github.com/dwatrous/mongodb-revision-objects https://github.com/dwatrous/mongodb-revision-objects


#5楼

Here's another solution using a single document for the current version and all old versions: 这是针对当前版本和所有旧版本使用单个文档的另一种解决方案:

{
    _id: ObjectId("..."),
    data: [
        { vid: 1, content: "foo" },
        { vid: 2, content: "bar" }
    ]
}

data contains all versions. data包含所有版本。 The data array is ordered , new versions will only get $push ed to the end of the array. data数组是有序的 ,新版本只会将$push pushed到数组末尾。 data.vid is the version id, which is an incrementing number. data.vid是版本ID,是递增的数字。

Get the most recent version: 获取最新版本:

find(
    { "_id":ObjectId("...") },
    { "data":{ $slice:-1 } }
)

Get a specific version by vid : 通过vid获取特定版本:

find(
    { "_id":ObjectId("...") },
    { "data":{ $elemMatch:{ "vid":1 } } }
)

Return only specified fields: 仅返回指定的字段:

find(
    { "_id":ObjectId("...") },
    { "data":{ $elemMatch:{ "vid":1 } }, "data.content":1 }
)

Insert new version: (and prevent concurrent insert/update) 插入新版本:(并防止并发插入/更新)

update(
    {
        "_id":ObjectId("..."),
        $and:[
            { "data.vid":{ $not:{ $gt:2 } } },
            { "data.vid":2 }
        ]
    },
    { $push:{ "data":{ "vid":3, "content":"baz" } } }
)

2 is the vid of the current most recent version and 3 is the new version getting inserted. 2是当前最新版本的vid3是插入的新版本。 Because you need the most recent version's vid , it's easy to do get the next version's vid : nextVID = oldVID + 1 . 因为您需要最新版本的vid ,所以很容易获得下一个版本的vidnextVID = oldVID + 1

The $and condition will ensure, that 2 is the latest vid . $and条件将确保2是最新的vid

This way there's no need for a unique index, but the application logic has to take care of incrementing the vid on insert. 这样,就不需要唯一索引,但是应用程序逻辑必须注意在插入时增加vid

Remove a specific version: 删除特定版本:

update(
    { "_id":ObjectId("...") },
    { $pull:{ "data":{ "vid":2 } } }
)

That's it! 而已!

(remember the 16MB per document limit) (记住每个文档16MB的限制)


#6楼

The first big question when diving in to this is "how do you want to store changesets" ? 深入探讨这个问题时,第一个大问题是“您如何存储变更集”

  1. Diffs? 差吗?
  2. Whole record copies? 完整记录副本?

My personal approach would be to store diffs. 我个人的方法是存储差异。 Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection. 因为这些差异的显示确实是一个特殊的动作,所以我会将差异放入另一个“历史”集合中。

I would use the different collection to save memory space. 我将使用其他集合来节省内存空间。 You generally don't want a full history for a simple query. 通常,您不希望简单查询的完整历史记录。 So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried. 因此,通过将历史记录保留在对象之外,您还可以在查询数据时将其保留在通常访问的内存之外。

To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. 为了使我的生活更轻松,我将使历史文档包含带有时间戳记的差异字典。 Something like this: 像这样:

{
    _id : "id of address book record",
    changes : { 
                1234567 : { "city" : "Omaha", "state" : "Nebraska" },
                1234568 : { "city" : "Kansas City", "state" : "Missouri" }
               }
}

To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. 为了使我的生活变得真正轻松,我将把这一部分用于访问数据的DataObjects(EntityWrapper,无论如何)。 Generally these objects have some form of history, so that you can easily override the save() method to make this change at the same time. 通常,这些对象具有某种形式的历史记录,因此您可以轻松地覆盖save()方法来同时进行此更改。

UPDATE: 2015-10 更新:2015-10

It looks like there is now a spec for handling JSON diffs . 看起来现在有了处理JSON差异的规范 This seems like a more robust way to store the diffs / changes. 这似乎是存储差异/更改的更可靠的方法。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值