GridFS

GridFS Specification

When to use GridFS

This page is under construction

When to use GridFS

  • Lots of files. GridFS tends to handle large numbers (many thousands) of files better than many file systems.
  • User uploaded files. When users upload files you tend to have a lot of files, and want them replicated and backed up. GridFS is a perfect place to store these as then you can manage them the same way you manage your data. You can also query by user, upload date, etc... directly in the file store, without a layer of indirection
  • Files that often change. If you have certain files that change a lot - it makes sense to store them in GridFS so you can modify them in one place and all clients will get the updates. Also can be better than storing in source tree so you don't have to deploy app to update files.

When not to use GridFS

  • Few small static files. If you just have a few small files for a website (js,css,images) its probably easier just to use the file system.
  • Note that if you need to update a binary object atomically, and the object is under the document size limit for your version of MongoDB (16MB for 1.8), then you might consider storing the object manually within a single document. This can be accomplished using the BSON bindata type. Check your driver's docs for details on using this type.

File Tools

mongofiles is a tool for manipulating GridFS from the command line.



Introduction

It works by splitting large object into small chunks, usually 256k in size. (把一个文件切分成小块儿存在mongo的collection里)

Specification

Storage Collections

GridFS uses two collections to store data:

  • files contains the object metadata
  • chunks contains the binary chunks with some additional accounting information

the files and chunks collections are named with a prefix. (prefix相当于逻辑的文件系统)By default the prefix is fs.

Here's an example of the standard GridFS interface in Java:

/*
 * default root collection usage - must be supported
 */
GridFS myFS = new GridFS(myDatabase);              // returns a default GridFS (e.g. "fs" root collection)
myFS.storeFile(new File("/tmp/largething.mpg"));   // saves the file into the "fs" GridFS store

/*
 * specified root collection usage - optional
 */

GridFS myContracts = new GridFS(myDatabase, "contracts");             // returns a GridFS where  "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf"));  // retrieves object whose filename is "smithco"

files

Documents in the files collection require the following fields: 一个文件的metadata

{
  "_id" : <unspecified>,                  // unique ID for this file
  "length" : data_number,                 // size of the file in bytes
  "chunkSize" : data_number,              // size of each of the chunks.  Default is 256k
  "uploadDate" : data_date,               // date when object first stored
  "md5" : data_string                     // result of running the "filemd5" command on this file's chunks
}

chunks

The structure of documents from the chunks collection is as follows:

{
  "_id" : <unspecified>,         // object id of the chunk in the _chunks collection
  "files_id" : <unspecified>,    // _id of the corresponding files collection entry
  "n" : chunk_number,            // chunks are numbered in order, starting with 0
  "data" : data_binary,          // the chunk's payload as a BSON binary type
}


Indexes

GridFS implementations should create a unique, compound index in the chunks collection for files_id and n. Here's how you'd do that from the shell:

db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true});

This way, a chunk can be retrieved efficiently using it's files_id and n values. Note that GridFS implementations should use findOne operations to get chunks individually, and should not leave open a cursor to query for all chunks. So to get the first chunk, we could do:

db.fs.chunks.findOne({files_id: myFileID, n: 0});

### 使用 Python 和 MongoDB 的 GridFS GridFS 是一种用于存储和检索超过 BSON 文档大小限制(16MB)的文件的标准方式。它将大文件拆分为多个较小的部分并将其作为单独的数据块存储在 `fs.chunks` 集合中,同时元数据则保存在 `fs.files` 中。 以下是使用 Python 和 PyMongo 库操作 GridFS 的基本方法: #### 安装依赖库 要使用 GridFS,需先安装 pymongo 库: ```bash pip install pymongo ``` #### 连接到 MongoDB 并初始化 GridFS 通过 PyMongo 提供的 `gridfs.GridFS` 类可以轻松访问 GridFS 功能。 ```python from pymongo import MongoClient import gridfs # 创建 MongoDB 客户端实例 client = MongoClient('mongodb://localhost:27017/') db = client['mydatabase'] # 替换为实际使用的数据库名称 # 初始化 GridFS 实例 fs = gridfs.GridFS(db) ``` #### 存储文件GridFS 可以通过二进制流的方式上传任意类型的文件GridFS。 ```python with open('/path/to/file', 'rb') as f: file_id = fs.put(f, filename="example_file", content_type="text/plain") # 添加自定义元数据 print(file_id) # 返回 ObjectId 表示成功插入 ``` #### 获取文件GridFS 可以根据文件名或其他条件查找已存储的文件。 ```python file_from_gridfs = fs.find_one({"filename": "example_file"}) if file_from_gridfs is not None: with open("/path/to/save/example_file", "wb") as output_file: output_file.write(file_from_gridfs.read()) # 将读取的内容写入本地磁盘 else: print("File not found.") ``` #### 删除文件 删除指定 ID 对应的文件及其关联的数据块。 ```python result = fs.delete(file_id) # 根据之前返回的对象ID执行删除动作 print(result) # 成功时无任何输出;失败会抛异常[^4] ``` #### 列出所有文件 查询当前存在于 GridFS 文件系统中的所有条目。 ```python for grid_data in fs.list(): print(grid_data) ``` 以上展示了如何利用 Python 脚本实现对 MongoDB 数据库存储大型文件的支持功能[^5]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值