版权声明:HanseyLee 欢迎转载,请注明原址: https://blog.youkuaiyun.com/fzlulee/article/details/85944967 </div>
<div id="content_views" class="markdown_views prism-github-gist">
<!-- flowchart 箭头图标 勿删 -->
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
<p>最近,需要使用 Python 对 MongodB 做一些简单的操作,不想使用各种繁重的框架。出于可重用性的考虑,想对 MongoDB Python 官方驱动 PyMongo 做下简单封装,百度一如既往的未能给我一个满意的结果,于是有了下文。</p>
【正文】
PyMongo,MongoDB Python官方驱动
PyMongo 驱动几乎支持 MongoDB 的全部特性,可以连接单个的 MongoDB 数据库、副本集和分片集群。从提供的API角度来看,pymongo package是其核心,包含对数据库的各种操作。本文将介绍一个简单封装类 DBManager。主要特性:对数据库和集合的操作确保其存在性;支持PyMongo的原生操作,包括基本的CRUD操作、批量操作、MapReduce、多线程和多进程等;支持因果一致性会话和事务的流水线操作,并给出简单示例。
MongoClient
mongo_client 提供了连接 MongoDB 的MongoClient类:
class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)
每个 MongoClient 实例 client (下文简称 client)都维护一个内建的连接池,默认 maxPoolsize 大小100。对于多线程的操作,连接池会给每一个线程一个 socket 连接,直到达到最大的连接数,后续的线程会阻塞以等待有可用的连接被释放。client 对 MongoDB 拓扑结构中的每个server 还维护一个额外的连接来监听 server 的状态。
下面的 new_mongo_client
函数用于获取一个数据库连接的 client。其中,client.admin.command('ismaster')
用来检查 server 的可用状态,简单省事不需要认证。
def new_mongo_client(uri, **kwargs): """Create new pymongo.mongo_client.MongoClient instance. DO NOT USE IT DIRECTLY."""
<span class="token keyword">try</span><span class="token punctuation">:</span> client <span class="token operator">=</span> MongoClient<span class="token punctuation">(</span>uri<span class="token punctuation">,</span> maxPoolSize<span class="token operator">=</span><span class="token number">1024</span><span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> client<span class="token punctuation">.</span>admin<span class="token punctuation">.</span>command<span class="token punctuation">(</span><span class="token string">'ismaster'</span><span class="token punctuation">)</span> <span class="token comment"># The ismaster command is cheap and does not require auth.</span> <span class="token keyword">except</span> ConnectionFailure<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">"new_mongo_client(): Server not available, Please check you uri: {}"</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>uri<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">else</span><span class="token punctuation">:</span> <span class="token keyword">return</span> client
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
PyMongo 不是进程(fork-safe)安全的,但在一个进程中是线程安全(thread-safe)的。因此常见的场景是,对于一个MongoDB 环境,为每一个进程中创建一个 client ,后面所有的数据库操作都使用这一个实例,包括多线程操作。永远不要为每一次操作都创建一个 MongoClient 实例,使用完后调用 MongoClient.close() 方法,这样没有必要而且会非常浪费性能。
鉴于以上原因,一般不宜直接使用new_mongo_client
函数获取 client,而是进一步封装为get_mongo_client
方法。 其中全局常量 URI_CLIENT_DICT
保持着数据库 URI 字符串与对应 clinet 的字典,一个 URI 对应一个 client 。代码如下:
MONGO_URI_DEFAULT = 'mongodb://localhost:27017/admin' URI_CLIENT_DICT = {} # a dictionary hold all client with uri as key def get_mongo_client(uri=MONGO_URI_DEFAULT, fork=False, **kwargs): """Get pymongo.mongo_client.MongoClient instance. One mongodb uri, one client.
@:param uri: mongodb uri @:param fork: for fork-safe in multiprocess case, if fork=True, return a new MongoClient instance, default False. @:param kwargs: refer to pymongo.mongo_client.MongoClient kwargs """</span> <span class="token keyword">if</span> fork<span class="token punctuation">:</span> <span class="token keyword">return</span> new_mongo_client<span class="token punctuation">(</span>uri<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">global</span> URI_CLIENT_DICT matched_client <span class="token operator">=</span> URI_CLIENT_DICT<span class="token punctuation">.</span>get<span class="token punctuation">(</span>uri<span class="token punctuation">)</span> <span class="token keyword">if</span> matched_client <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> <span class="token comment"># no matched client</span> new_client <span class="token operator">=</span> new_mongo_client<span class="token punctuation">(</span>uri<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">if</span> new_client <span class="token keyword">is</span> <span class="token operator">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> URI_CLIENT_DICT<span class="token punctuation">[</span>uri<span class="token punctuation">]</span> <span class="token operator">=</span> new_client <span class="token keyword">return</span> new_client <span class="token keyword">return</span> matched_client
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
确保 Database 和 Collection 的存在
PyMongo 有个特性:对于不存在的数据库、集合上的查询不会报错。如下,Ipython中演示在不存在xxDB 数据库和 xxCollection 集合上的操作:
In [1]: from pymongo import MongoClient
In [2]: client = MongoClient() # default uri is 'mongodb://localhost:27017/admin'
In [3]: db = client.get_database('xxDB') # Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'xxDB')
In [4]: coll = db.get_collection('XXCollection') # Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'xxDB'), u'XXCollection')
In [5]: coll.find_one() # note: no tip, no error, no exception, return None
In [6]: coll.insert_one({'hello' : 'what a fucking feature'})
Out[6]: <pymongo.results.InsertOneResult at 0x524ccc8>
In [7]: coll.find_one()
Out[7]: {u'_id': ObjectId('5c31c807bb048515b814d719'), u'hello': u'what a fucking feature'}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
这对于手误写错数据库或集合名字后进行的后续操作,简直就是灾难。鉴于此因,有必要对获取数据库或集合时加上确认保护。
下面对于获取数据库,使用 MongoClient.list_database_names() 获取所有的数据库名字,如果数据库名称不在其中,则返回None。同样的道理,对于集合使用 Database.list_collection_names()。注:由于用户权限问题造成的获取数据库或集合列表的操作报错的情况,默认不加确认保护。
def get_existing_db(client, db_name): """Get existing pymongo.database.Database instance.
@:param client: pymongo.mongo_client.MongoClient instance @:param db_name: database name wanted """</span> <span class="token keyword">if</span> client <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'client {} is None'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>client<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">try</span><span class="token punctuation">:</span> db_available_list <span class="token operator">=</span> client<span class="token punctuation">.</span>list_database_names<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">except</span> PyMongoError <span class="token keyword">as</span> e<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'client: {}, db_name: {}, client.list_database_names() error: {}'</span><span class="token punctuation">.</span> <span class="token builtin">format</span><span class="token punctuation">(</span>client<span class="token punctuation">,</span> db_name<span class="token punctuation">,</span> <span class="token builtin">repr</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> <span class="token keyword">if</span> db_name <span class="token operator">not</span> <span class="token keyword">in</span> db_available_list<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'client {} has no db named {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>client<span class="token punctuation">,</span> db_name<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> db <span class="token operator">=</span> client<span class="token punctuation">.</span>get_database<span class="token punctuation">(</span>db_name<span class="token punctuation">)</span> <span class="token keyword">return</span> db
def get_existing_coll(db, coll_name):
“”"Get existing pymongo.collection.Collection instance.
@:param client: pymongo.mongo_client.MongoClient instance
@:param coll_name: collection name wanted
"""</span>
<span class="token keyword">if</span> db <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span>
logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'db {} is None'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>db<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> <span class="token boolean">None</span>
<span class="token keyword">try</span><span class="token punctuation">:</span>
coll_available_list <span class="token operator">=</span> db<span class="token punctuation">.</span>list_collection_names<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">except</span> PyMongoError <span class="token keyword">as</span> e<span class="token punctuation">:</span>
logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'db: {}, coll_name: {}, db.list_collection_names() error: {}'</span><span class="token punctuation">.</span>
<span class="token builtin">format</span><span class="token punctuation">(</span>db<span class="token punctuation">,</span> coll_name<span class="token punctuation">,</span> <span class="token builtin">repr</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token keyword">if</span> coll_name <span class="token operator">not</span> <span class="token keyword">in</span> coll_available_list<span class="token punctuation">:</span>
logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'db {} has no collection named {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>db<span class="token punctuation">,</span> coll_name<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> <span class="token boolean">None</span>
coll <span class="token operator">=</span> db<span class="token punctuation">.</span>get_collection<span class="token punctuation">(</span>coll_name<span class="token punctuation">)</span>
<span class="token keyword">return</span> coll
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
PyMongo 封装类 DBManger
前文的冗长铺垫主要是为了引入这个 PyMongo 驱动封装类 DBManger。
DBManger 类的实例保持的状态有MongoClient实例 self.client
, 数据库self.db
和 集合self.coll
,并通过属性(property)对外开放。PyMongo 原生的方法对这里的 client, db 和 coll 同样适用。client 由类的构造器调用上文的get_mongo_client
方法获取,db 和 coll 即可通过类的构造器获取也可通过 self.db_name
和 self.coll_name
这些 setter 来切换。
DBManger 类的实例持有的方法 self.create_coll(self, db_name, coll_name)
, session_pipeline(self, pipeline)
和 transaction_pipeline(self, pipeline)
。后两种方法在下一节再具体解释。
class DBManager: """A safe and simple pymongo packaging class ensuring existing database and collection.
Operations: MongoClient level operations: https://api.mongodb.com/python/current/api/pymongo/mongo_client.html Database level operations: https://api.mongodb.com/python/current/api/pymongo/database.html Collection level operations: https://api.mongodb.com/python/current/api/pymongo/collection.html """</span> __default_uri <span class="token operator">=</span> <span class="token string">'mongodb://localhost:27017/admin'</span> __default_db_name <span class="token operator">=</span> <span class="token string">'test'</span> __default_coll_name <span class="token operator">=</span> <span class="token string">'test'</span> <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> uri<span class="token operator">=</span>__default_uri<span class="token punctuation">,</span> db_name<span class="token operator">=</span>__default_db_name<span class="token punctuation">,</span> coll_name<span class="token operator">=</span>__default_coll_name<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>__uri <span class="token operator">=</span> uri self<span class="token punctuation">.</span>__db_name <span class="token operator">=</span> db_name self<span class="token punctuation">.</span>__coll_name <span class="token operator">=</span> coll_name self<span class="token punctuation">.</span>__client <span class="token operator">=</span> get_mongo_client<span class="token punctuation">(</span>uri<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> self<span class="token punctuation">.</span>__db <span class="token operator">=</span> get_existing_db<span class="token punctuation">(</span>self<span class="token punctuation">.</span>__client<span class="token punctuation">,</span> db_name<span class="token punctuation">)</span> self<span class="token punctuation">.</span>__coll <span class="token operator">=</span> get_existing_coll<span class="token punctuation">(</span>self<span class="token punctuation">.</span>__db<span class="token punctuation">,</span> coll_name<span class="token punctuation">)</span> <span class="token keyword">def</span> <span class="token function">__str__</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> u<span class="token string">'uri: {}, db_name: {}, coll_name: {}, id_client: {}, client: {}, db: {}, coll: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span> self<span class="token punctuation">.</span>uri<span class="token punctuation">,</span> self<span class="token punctuation">.</span>db_name<span class="token punctuation">,</span> self<span class="token punctuation">.</span>coll_name<span class="token punctuation">,</span> <span class="token builtin">id</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>client<span class="token punctuation">)</span><span class="token punctuation">,</span> self<span class="token punctuation">.</span>client<span class="token punctuation">,</span> self<span class="token punctuation">.</span>db<span class="token punctuation">,</span> self<span class="token punctuation">.</span>coll<span class="token punctuation">)</span> @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">uri</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__uri @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">db_name</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__db_name @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">coll_name</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__coll_name @db_name<span class="token punctuation">.</span>setter <span class="token keyword">def</span> <span class="token function">db_name</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> db_name<span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>__db_name <span class="token operator">=</span> db_name self<span class="token punctuation">.</span>__db <span class="token operator">=</span> get_existing_db<span class="token punctuation">(</span>self<span class="token punctuation">.</span>__client<span class="token punctuation">,</span> db_name<span class="token punctuation">)</span> @coll_name<span class="token punctuation">.</span>setter <span class="token keyword">def</span> <span class="token function">coll_name</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> coll_name<span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>__coll_name <span class="token operator">=</span> coll_name self<span class="token punctuation">.</span>__coll <span class="token operator">=</span> get_existing_coll<span class="token punctuation">(</span>self<span class="token punctuation">.</span>__db<span class="token punctuation">,</span> coll_name<span class="token punctuation">)</span> @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">client</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__client @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">db</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__db @<span class="token builtin">property</span> <span class="token keyword">def</span> <span class="token function">coll</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token comment"># always use the current instance self.__db</span> self<span class="token punctuation">.</span>__coll <span class="token operator">=</span> get_existing_coll<span class="token punctuation">(</span>self<span class="token punctuation">.</span>__db<span class="token punctuation">,</span> self<span class="token punctuation">.</span>__coll_name<span class="token punctuation">)</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__coll <span class="token keyword">def</span> <span class="token function">create_coll</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> db_name<span class="token punctuation">,</span> coll_name<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token triple-quoted-string string">"""Create new collection with new or existing database"""</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>__client <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">try</span><span class="token punctuation">:</span> <span class="token keyword">return</span> self<span class="token punctuation">.</span>__client<span class="token punctuation">.</span>get_database<span class="token punctuation">(</span>db_name<span class="token punctuation">)</span><span class="token punctuation">.</span>create_collection<span class="token punctuation">(</span>coll_name<span class="token punctuation">)</span> <span class="token keyword">except</span> CollectionInvalid<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'collection {} already exists in database {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>coll_name<span class="token punctuation">,</span> db_name<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">def</span> <span class="token function">session_pipeline</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> pipeline<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>__client <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'client is None in session_pipeline: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>__client<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">with</span> self<span class="token punctuation">.</span>__client<span class="token punctuation">.</span>start_session<span class="token punctuation">(</span>causal_consistency<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span> result <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token keyword">for</span> operation <span class="token keyword">in</span> pipeline<span class="token punctuation">:</span> <span class="token keyword">try</span><span class="token punctuation">:</span> <span class="token keyword">if</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'client'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__client <span class="token keyword">elif</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'db'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__db <span class="token keyword">elif</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'coll'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__coll operation_name <span class="token operator">=</span> operation<span class="token punctuation">.</span>operation_name args <span class="token operator">=</span> operation<span class="token punctuation">.</span>args kwargs <span class="token operator">=</span> operation<span class="token punctuation">.</span>kwargs operator <span class="token operator">=</span> <span class="token builtin">getattr</span><span class="token punctuation">(</span>target<span class="token punctuation">,</span> operation_name<span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token builtin">type</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token builtin">tuple</span><span class="token punctuation">:</span> ops_rst <span class="token operator">=</span> operator<span class="token punctuation">(</span><span class="token operator">*</span>args<span class="token punctuation">,</span> session<span class="token operator">=</span>session<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> ops_rst <span class="token operator">=</span> operator<span class="token punctuation">(</span>args<span class="token punctuation">,</span> session<span class="token operator">=</span>session<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">if</span> operation<span class="token punctuation">.</span>callback <span class="token keyword">is</span> <span class="token operator">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> operation<span class="token punctuation">.</span>out <span class="token operator">=</span> operation<span class="token punctuation">.</span>callback<span class="token punctuation">(</span>ops_rst<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> operation<span class="token punctuation">.</span>out <span class="token operator">=</span> ops_rst <span class="token keyword">except</span> Exception <span class="token keyword">as</span> e<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'{} {} Exception, session_pipeline args: {}, kwargs: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span> target<span class="token punctuation">,</span> operation<span class="token punctuation">,</span> args<span class="token punctuation">,</span> kwargs<span class="token punctuation">)</span><span class="token punctuation">)</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'session_pipeline Exception: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span><span class="token builtin">repr</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> result<span class="token punctuation">.</span>append<span class="token punctuation">(</span>operation<span class="token punctuation">)</span> <span class="token keyword">return</span> result <span class="token comment"># https://api.mongodb.com/python/current/api/pymongo/client_session.html#transactions</span> <span class="token keyword">def</span> <span class="token function">transaction_pipeline</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> pipeline<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">if</span> self<span class="token punctuation">.</span>__client <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'client is None in transaction_pipeline: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span>self<span class="token punctuation">.</span>__client<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">None</span> <span class="token keyword">with</span> self<span class="token punctuation">.</span>__client<span class="token punctuation">.</span>start_session<span class="token punctuation">(</span>causal_consistency<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span> <span class="token keyword">as</span> session<span class="token punctuation">:</span> <span class="token keyword">with</span> session<span class="token punctuation">.</span>start_transaction<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span> result <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token keyword">for</span> operation <span class="token keyword">in</span> pipeline<span class="token punctuation">:</span> <span class="token keyword">try</span><span class="token punctuation">:</span> <span class="token keyword">if</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'client'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__client <span class="token keyword">elif</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'db'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__db <span class="token keyword">elif</span> operation<span class="token punctuation">.</span>level <span class="token operator">==</span> <span class="token string">'coll'</span><span class="token punctuation">:</span> target <span class="token operator">=</span> self<span class="token punctuation">.</span>__coll operation_name <span class="token operator">=</span> operation<span class="token punctuation">.</span>operation_name args <span class="token operator">=</span> operation<span class="token punctuation">.</span>args kwargs <span class="token operator">=</span> operation<span class="token punctuation">.</span>kwargs operator <span class="token operator">=</span> <span class="token builtin">getattr</span><span class="token punctuation">(</span>target<span class="token punctuation">,</span> operation_name<span class="token punctuation">)</span> <span class="token keyword">if</span> <span class="token builtin">type</span><span class="token punctuation">(</span>args<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token builtin">tuple</span><span class="token punctuation">:</span> ops_rst <span class="token operator">=</span> operator<span class="token punctuation">(</span><span class="token operator">*</span>args<span class="token punctuation">,</span> session<span class="token operator">=</span>session<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> ops_rst <span class="token operator">=</span> operator<span class="token punctuation">(</span>args<span class="token punctuation">,</span> session<span class="token operator">=</span>session<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token keyword">if</span> operation<span class="token punctuation">.</span>callback <span class="token keyword">is</span> <span class="token operator">not</span> <span class="token boolean">None</span><span class="token punctuation">:</span> operation<span class="token punctuation">.</span>out <span class="token operator">=</span> operation<span class="token punctuation">.</span>callback<span class="token punctuation">(</span>ops_rst<span class="token punctuation">)</span> <span class="token keyword">else</span><span class="token punctuation">:</span> operation<span class="token punctuation">.</span>out <span class="token operator">=</span> ops_rst <span class="token keyword">except</span> Exception <span class="token keyword">as</span> e<span class="token punctuation">:</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'{} {} Exception, transaction_pipeline args: {}, kwargs: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span> target<span class="token punctuation">,</span> operation<span class="token punctuation">,</span> args<span class="token punctuation">,</span> kwargs<span class="token punctuation">)</span><span class="token punctuation">)</span> logging<span class="token punctuation">.</span>error<span class="token punctuation">(</span><span class="token string">'transaction_pipeline Exception: {}'</span><span class="token punctuation">.</span><span class="token builtin">format</span><span class="token punctuation">(</span><span class="token builtin">repr</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">raise</span> Exception<span class="token punctuation">(</span><span class="token builtin">repr</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span> result<span class="token punctuation">.</span>append<span class="token punctuation">(</span>operation<span class="token punctuation">)</span> <span class="token keyword">return</span> result
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
这里给出一些例子来说明 DBManager的使用方法。
- 创建集合、切换数据库或集合:
# get DBManger instance
var dbm = DBManager('mongodb://localhost:27017/admin') # db_name, coll_name default 'test'
dbm.create_coll('testDB', 'testCollection')
# change db or coll
dbm.db_name = 'testDB' # dbm.db (test -> testDB) and dbm.coll (test.testCollection-> testDB.testCollection) will be changed at the same time
dbm.coll_nmae = 'testCollection' # dbm.coll (test.test-> test.testCollection) will be change at the same time
- 1
- 2
- 3
- 4
- 5
- 6
- 基本的操作,CRUD:
# simple manipulation operation
dbm.coll.insert_one({'hello': 'world'})
print(dbm.coll.find_one()) # {'_id': ObjectId('...'), 'hello': 'world'}
dbm.coll.update_one({'hello': 'world'}, {'hello': 'hell'})
# bulk operation
from pymongo import InsertOne, DeleteOne, ReplaceOne, ReplaceOne
dbm.coll.bulk_write([InsertOne({‘y’:1}), DeleteOne({‘x’:1}), ReplaceOne({{‘w’:1}, {‘z’:1}, upsert=True})])
# simple managing operation
import pymongo
dbm.coll.create_index([(‘hello’, pymongo.DESCENDING)], background=True)
dbm.client.list_database_names()
dbm.db.list_collection_names()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 线程并发,进程并行:
# thread concurrent
import threading
def fun(uri, db_name, coll_name):
# new DBManager instance avoid variable competition
dbm = DBManager(uri, db_name, coll_name)
pass
t = threading.Thread(target=func, args=(uri, db_name, coll_name))
t.start()
# multiprocess parallel
import multiprocessing
def func(uri, db_name, coll_name):
# new process, new client with fork=True parameter, and new DBManager instance.
dbm = DBManager(uri, db_name, coll_name, fork=True)
# Do something with db.
pass
proc = multiprocessing.Process(target=func, args=(uri, db_name, coll_name))
proc.start()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- MapReduce :
# MapReduce
from bson.code import Code
mapper = Code('''
function () {...}
''')
reducer = Code('''
function (key, value) {...}
''')
rst = dbm.coll.inline_map_reduce(mapper, reducer)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
对 MongoDB 一致性会话(session)和 事务(transaction)的支持
MongoDB Reference
会话(session),是对数据库连接的一种逻辑表示。从MongoDB 3.6开始,MongoDB引入了客户端会话(client session),并在其中加入了对操作的因果一致性(causal-consistency)的支持。因此,更准确地说,这里 DBManger 类封装的其实是因果一致性的会话,即client.start_session(causal_consistency=True)
。不过,一致性能够保证的前提是客户端的应用应保证在一个会话中只有一个线程(thread)在做这些操作。在一个客户端会话中,多个顺序的读写操作得到的结果与它们的执行顺序将是因果一致的,读写的设置都自动设为 “majority”。应用场景:先写后读,先读后写,一致性的写,一致性的读(Read your writes,Writes follow reads,Monotonic writes, Monotonic reads)。客户端会话与服务端会话(server session)进行交互。从3.6版本开始,MongoDB驱动将所有的操作都关联到服务端会话。服务端会话是客户端会话顺序操作因果一致性和重试写操作的得以支持的底层框架。
MongoDB 对单个文档的操作时是原子性的(atomic)。原子性是指一个操作的结果要么有要么没有,不可再切割,换句话说叫 “all or nothing”。从MongoDB 4.0开始,副本集(Replica set)开始支持多个文档级别的原子性,即多文档事务(muti-document transaction)。在同一个事务中,对跨越不同数据库或集合下的多个文档操作,如果全部操作成功,则该事务被成功提交(commit);如果某些操作出现失败,则整个事务会终止(abort),操作中对数据库的改动会被丢弃。只有在事务被成功提交之后,操作的结果才能被事务外看到,事务正在进行或者事务失败,其中的操作对外都不可见。单个mongod服务和分片集群(sharded cluster)暂不支持事务。MongoDB官方预计在4.2版本左右对分片集群加入对事务的支持。另外,需要注意的是,多文档事务会引入更大的性能开销,在场景允许的情况下,尽可能考虑用嵌套文档或数组的单文档操作方式来解决问题。
会话和事务的主要应用场景其实都是多个的时序性操作,即流水线形式。因此 DBManager 加入了session_pipeline(self, pipeline)
和 transaction_pipeline(self, pipeline)
的操作方法。首先引入表征操作的类Operation,描述一个操作作用的层次(client, db或coll)、操作方法、参数和操作结果需要调用的回调函数,见名知意,不再赘解。多个操作 Operation 类的实例构成的list 为pipeline, 作为session_pipeline(self, pipeline)
和 transaction_pipeline(self, pipeline)
的输入参数。pipeline 操作的每一步的输出会写入到对应Operation 类的实例的out属性中。
class Operation: """Operation for constructing sequential pipeline. Only used in DBManager.session_pipeline() or transaction_pipeline().
Constructor parameters: level: <'client' | 'db' | 'coll'> indicating different operation level, MongoClient, Database, Collection operation_name: Literally, the name of operation on specific level args: position arguments the operation need. Require the first parameter or a tuple of parameters of the operation. kwargs: key word arguments the operation need. callback: callback function for operation result Examples: # pymongo.collection.Collection.find(filter, projection, skip=None, limit=None,...) Operation('coll', 'find', {'x': 5}) only filter parameter, equivalent to: Operation('coll', 'find', args={'x': 5}) or Operation('coll', 'find', kwargs={filter: {'x': 5}}) Operation('coll', 'find', ({'x': 5},{'_id': 0}) {'limit':100}), equivalent to: Operation('coll', 'find', args=({'x': 5},{'_id': 0}, None, {'limit':100}) ), OR Operation('coll', 'find', kwargs={'filter':{'x': 5}, 'projection': {'_id': 0},'limit':100}) def cursor_callback(cursor): return cursor.distinct('hello') Operation('coll', 'find', kwargs={'limit': 2}, callback=cursor_callback) """</span> <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> level<span class="token punctuation">,</span> operation_name<span class="token punctuation">,</span> args<span class="token operator">=</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> kwargs<span class="token operator">=</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">,</span> callback<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">)</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>level <span class="token operator">=</span> level self<span class="token punctuation">.</span>operation_name <span class="token operator">=</span> operation_name self<span class="token punctuation">.</span>args <span class="token operator">=</span> args <span class="token keyword">if</span> kwargs <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>kwargs <span class="token operator">=</span> <span class="token boolean">None</span> <span class="token keyword">else</span><span class="token punctuation">:</span> self<span class="token punctuation">.</span>kwargs <span class="token operator">=</span> kwargs self<span class="token punctuation">.</span>callback <span class="token operator">=</span> callback self<span class="token punctuation">.</span>out <span class="token operator">=</span> <span class="token boolean">None</span>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
基于 DBManager 和 Operation 的因果一致性的会话和事务的简单示例如下:
# causal-consistency session or transaction pipeline operation
def cursor_callback(cursor):
return cursor.distinct('hello')
op_1 = Operation('coll', 'insert_one', {'hello': 'heaven'})
op_2 = Operation('coll', 'insert_one', {'hello': 'hell'})
op_3 = Operation('coll', 'insert_one', {'hello': 'god'})
op_4 = Operation('coll', 'find', kwargs={'limit': 2}, callback=cursor_callback)
op_5 = Operation('coll', 'find_one', {'hello': 'god'})
pipeline = [op_1, op_2, op_3, op_4, op_5]
ops = dbm.transaction_pipeline(pipeline) # only on replica set deployment
# ops = dbm.session_pipeline(pipeline) # can be standalone, replica set or sharded cluster.
for op in ops:
print(op.out)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
【正文完】
</div>
<link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-df60374684.css" rel="stylesheet">
</div>