MongoDB Schema Design(MongoDB模式设计)

本文探讨了MongoDB的设计理念,对比了嵌入与引用的区别,并提供了集合选择、索引设计等实用建议。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MongoDB官网转载:http://www.mongodb.org/display/DOCS/Schema+Design

Schema Design

  • Introduction
  • Embed vs. Reference
  • Use Cases
  • Index Selection
  • How Many Collections?
  • See Also

Introduction

With Mongo, you do less "normalization" than you would perform designing a relational schema because there are no server-side "joins". Generally, you will want one database collection for each of your top level objects.

You do not want a collection for every "class" - instead, embed objects. For example, in the diagram below, we have two collections, students and courses. The student documents embed address documents and the "score" documents, which have references to the courses.

 

 

 

Compare this with a relational schema, where you would almost certainly put the scores in a separate table, and have a foreign-key relationship back to the students.

Embed vs. Reference

The key question in Mongo schema design is "does this object merit its own collection, or rather should it embed in objects in other collections?" In relational databases, each sub-item of interest typically becomes a separate table (unless denormalizing for performance). In Mongo, this is not recommended - embedding objects is much more efficient. Data is then colocated on disk; client-server turnarounds to the database are eliminated. So in general the question to ask is, "why would I not want to embed this object?"

So why are references slow? Let's consider our students example. If we have a student object and perform:

 

print( student.address.city );

 

This operation will always be fast as address is an embedded object, and is always in RAM if student is in RAM. However for

 

print( student.scores[0].for_course.name );

 

if this is the first access to scores[0], the shell or your driver must execute the query

// pseudocode for driver or framework, not user code

student.scores[0].for_course = db.courses.findOne({_id:_course_id_to_find_});

 

Thus, each reference traversal is a query to the database. Typically, the collection in question is indexed on _id. The query will then be reasonably fast. However, even if all data is in RAM, there is a certain latency given the client/server communication from appserver to database. In general, expect 1ms of time for such a query on a ram cache hit. Thus if we were iterating 1,000 students, looking up one reference per student would be quite slow - over 1 second to perform even if cached. However, if we only need to look up a single item, the time is on the order of 1ms, and completely acceptable for a web page load. (Note that if already in db cache, pulling the 1,000 students might actually take much less than 1 second, as the results return from the database in large batches.)

Some general rules on when to embed, and when to reference:

  • "First class" objects, that are at top level, typically have their own collection.
  • Line item detail objects typically are embedded.
  • Objects which follow an object modelling "contains" relationship should generally be embedded.
  • Many to many relationships are generally by reference.
  • Collections with only a few objects may safely exist as separate collections, as the whole collection is quickly cached in application server memory.
  • Embedded objects are harder to reference than "top level" objects in collections, as you cannot have a DBRef to an embedded object (at least not yet).
  • It is more difficult to get a system-level view for embedded objects. For example, it would be easier to query the top 100 scores across all students if Scores were not embedded.
  • If the amount of data to embed is huge (many megabytes), you may reach the limit on size of a single object.
  • If performance is an issue, embed.

Use Cases

Let's consider a few use cases now.

  1. Customer / Order / Order Line-Item
  • orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
  1. Blogging system.
  • posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.

Index Selection

A second aspect of schema design is index selection. As a general rule, where you want an index in a relational database, you want an index in Mongo.

  • The _id field is automatically indexed.
  • Fields upon which keys are looked up should be indexed.
  • Sort fields generally should be indexed.

The MongoDB profiling facility provides useful information for where an index should be added that is missing.

Note that adding an index slows writes to a collection, but not reads. Use lots of indexes for collections with a high read : write ratio (assuming one does not mind the storage overage). For collections with more writes than reads, indexes are very expensive.

How Many Collections?

As Mongo collections are polymorphic, one could have a collection objects and put everything in it! This approach is taken by some object databases. For performance reasons, we do not recommend this approach. Data within a Mongo collection tends to be contiguous on disk. Thus, table scans of the collection are possible, and efficient. Collections are very important for high throughput batch processing.

See Also

The Little MongoDB Schema Design Book, covers the fundamentals off Schema design with MongoDB, as well as several useful Schema design patters for your applications. I wrote this book to be a helpful and concise guide to MongoDB Schema design, as well as a repository to look up specific MongoDB Schema patterns. This book came around, due to my experiences teaching people about using MongoDB for application development. It tries to cover essential information that you can apply to your own applications. We cover a lot of different aspects of Schema Design in this book. These include. Schema Basics including one to one, one to many and many to many relationships Embedding versus linking Bucketing Strategy Understanding the MongoDB MMAP and WiredTiger storage engine MongoDB Indexes The Metadata Schema Pattern Time Series Schema Pattern Queues Schema Pattern Nested Categories Schema Pattern Account Transactions Schema Pattern Shopping Cart Schema Pattern with and without product reservation A Theater Ticket Reservation Schema Pattern An Embedded Array Cache Schema Pattern An Internationalization Schema Pattern Sharding The book aims to provide developers with a deep but concise understanding of how to efficiently work with MongoDB. Table of Contents Introduction Schema Basics One-To-One (1:1) One-To-Many (1:N) Many-To-Many (N:M) MMAP Storage Engine WiredTiger Storage Engine Indexes Sharding Schema Design Queue Topics Metadata Materialized Path Category Hierarchy Shopping Cart with Product Reservation Shopping Cart with No Product Reservation Theater Reservation Account Transactions Time Series Array Slice Cache Internationalization
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值