Practical Steps for Designing a Dimensional Model

本文介绍了一个从需求收集到最终审批的实用步骤,用于设计维度模型的过程。通过团队合作的方式,可以确保模型既符合业务需求又能反映源数据的真实情况。

Kimball University: Practical Steps for Designing a Dimensional Model

 

What does it take to develop a robust dimensional model? Here's how to get from requirements-gathering to final approval in a process that will ferret out the good, bad and ugly realities of your source data and help you avoid surprises, delays and cost overruns.


By

Margy Ross
September 29, 2008

 

Kimball Group has written more than 150 Intelligent Enterprise columns and one hundred Kimball Design Tips about dimensional modeling techniques, but we haven't written much about the dimensional modeling process. What are the tasks and deliverables required to create a robust design? This article presents best practices for designing a dimensional model.

Before embarking on a dimensional design project, you need a solid understanding of the business's requirements along with a reasonable assessment of the underlying source data. It's tempting to skip the requirements review, but resist this urge as doing so increases the risk of developing a source-driven model that falls short of business needs in numerous small but significant ways. Optimally, the requirements have been thoroughly researched and documented in a user-approved requirements finding document with the top priority needs clearly identified. This deliverable often also includes a preliminary data warehouse bus matrix. (More details about the bus matrix, a critical data architecture planning tool, are available in an Intelligent Enterprise column, "The Matrix: Revisited." )

In addition to the business requirements and bus matrix, you'll also want to review any source data profiling insights uncovered by the project team to date. The final item on your required-reading list is your organization's naming convention standards document. If you don't already have naming conventions to adapt for data warehousing and business intelligence, you'll need to establish them as you develop the dimensional model. For example, you may opt for a three-part column naming standard consisting of a prime word, zero or more qualifiers, and class word to yield a column name such as sales_dollar_amount.

 

Join the Party

Everyone knows that the key to a successful party is inviting the right people. The same holds true for the dimensional modeling festivities. Unfortunately, too many data modelers view the development of a dimensional model as a solo, independent activity; they retreat to their ivory cubicle to ponder the pros and cons of modeling alternatives on their own, emerging weeks later to unfurl their masterpiece for others to review. While the data modeler should be leading the charge and retain primary responsibility for the deliverables, the best way to arrive at a sound dimensional design is through a collaborative team effort since no one person likely has detailed knowledge of both the business requirements and source system idiosyncrasies.

The modeling team should also include people who can accurately represent the business users' analytic needs, such as the business analyst, power user, BI application developer, or all of the above. In particular, power user involvement is valuable as they've likely already identified busines rules to convert source data into more meaningful information for decision making; their insights often result in a richer, more analytically complete design. The designated data stewards should be involved in the modeling process to drive to organizational agreement on names, definitions, and business rules (If you haven't already established a data stewardship program, there's no time like the present to do so).

It's also beneficial to have some source experts at least intermittently involved in the process to quickly answer questions and resolve issues regarding data timing and content nuances. Finally, you should invite members of the ETL team to the party to gain early insights about the model and its source-to-target mapping. ETL team members typically have more to gain than contribute to the modeling process, but getting their buy-in to the design saves valuable time and avoids tire-spinning down the road. For the same reason, you should also involve the DBAs who will implement the physical design.

 

diagram
With the team assembled and prerequisite reading completed, it may be appropriate to briefly introduce dimensional modeling so everyone on the team understands its core principles (and appreciates that denormalization is not always evil). The team's first objective is to reach agreement on a high-level model diagram, or bubble chart, as illustrated (POSITION). The bubble chart represents a fact table corresponding to a single business process; it includes a clearly articulated grain declaration and identification of core dimensions. The bubble chart can often be derived largely from the preliminary bus matrix. It serves to get everyone on the same page regarding the scope of the model without getting unnecessarily mired in the details. Because it's easy-to-understand, the high-level diagram also facilitates discussion between the design team and business partners.

Dive Into the Details

With consensus on the bubble chart, the team then launches into table-by-table and column-by-column discussion, drilling into more detail about the requisite attributes and metrics, including definitions, transformation rules and data-quality concerns. The dimensional model will unfold through a series of design sessions, with each pass producing a more detailed design that's been repeatedly tested against your understanding of the business needs.

Don't attempt to schedule all-day design meetings; structure several one-hour morning and afternoon sessions so the lead modeler has time to update the documentation before the next meeting (and so team members can deal with the demands of their day jobs). Your initial sessions should focus on a more straightforward dimension table so the team experiences a quick win before dealing with the more controversial dimensions.

As you're iteratively fleshing out the design, team members will need to do more interrogation of the data. Thus, ongoing access to a profiling tool (or less sophisticated method) will be critical. Ferreting out the good, bad and ugly realities of your source data during the design process will minimize the surprises and corresponding overruns during the ETL design and development activity.

Throughout the design sessions, the data modeler or designated scribe should be filling in a detailed worksheet for each table with information such as the attribute or fact name, column description, sample values and decodes, change tracking rules for dimension attributes, and preliminary transformation business rules. These worksheets form the basis for the source-to-target mapping, which is further embellished by the physical designer and then ultimately handed off to the ETL team. In addition to the detailed worksheets, a member of the design team should also be logging open issues so they're captured in a single document to facilitate review and assignment at the end of every session.

Review the Results

The last phase of the modeling process involves reviewing and validating the model with interested parties, starting with the project team, extending to those in IT with intimate knowledge of the source systems, and concluding with the broader business community. Approval from this last group is critical before time and money is invested in the data implementation. Plan to devote much of this review to illustrating how the model will address sample questions from the requirements findings.

A typical design effort usually takes three to four weeks for a single business process dimensional model, but the time required will vary depending on the complexity of the business process, availability of preexisting conformed dimensions, experience of the modeling team, existence of well-documented business requirements, and the difficulty reaching consensus.

Designing a dimensional model with interested parties representing diverse skills requires commitment and cooperation, but the end result is a robust model that has been rigorously tested against both the business needs and data realities. That's exactly what you want before you move ahead with the implementation.

For more information about the dimensional modeling process, attend a Kimball University class such as the Dimensional Modeling in Depth or Data Warehouse Lifecycle in Depth courses. You can read more about the dimensional modeling process discussed above in The Data Warehouse Lifecycle Toolkit 2nd Edition , co-authored by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker of the Kimball Group .


在使用函数式方法设计 CIM(Computer Integrated Manufacturing)架构时,需要考虑多个方面,包括系统模块化、可扩展性、数据流处理和交互逻辑等。CIM 架构的核心目标是实现制造过程中信息流的高效整合,涵盖从设计、生产、管理到服务的全生命周期。函数式编程的特性,如不可变数据、纯函数和高阶函数,能够为架构设计提供良好的抽象和模块化能力。 ### 函数式方法的优势 函数式方法在设计 CIM 架构时具有以下优势: 1. **模块化与解耦**:通过将系统划分为多个独立的纯函数模块,可以降低模块间的耦合度,使系统更易于维护和扩展。 2. **数据流处理**:函数式编程擅长处理数据流,特别适用于 CIM 架构中涉及的大量实时数据采集、处理和分析任务。 3. **并发性与可扩展性**:由于函数式编程避免了副作用,可以更容易地实现并行计算和分布式处理,提高系统的吞吐量和响应能力。 ### 核心组件设计 在设计 CIM 架构时,可以采用以下函数式组件: 1. **数据采集层**:使用纯函数处理传感器数据的采集和初步清洗,确保数据的准确性和一致性。 2. **业务逻辑层**:通过高阶函数实现业务规则的动态配置,支持灵活的流程调整。 3. **决策支持层**:利用函数式编程的不可变特性,确保分析结果的可靠性和可追溯性。 ### 示例代码 以下是一个简单的函数式代码示例,展示了如何使用 Python 的函数式编程特性实现 CIM 架构中的数据处理模块: ```python from functools import reduce # 数据采集层:模拟传感器数据采集 def collect_sensor_data(): # 模拟从传感器获取数据 return [10, 20, 30, 40, 50] # 数据处理层:清洗和转换数据 def clean_data(data): # 移除异常值 return [x for x in data if x > 0] def transform_data(data): # 将数据转换为平方值 return list(map(lambda x: x ** 2, data)) # 业务逻辑层:计算数据总和 def calculate_total(data): return reduce(lambda a, b: a + b, data) # 决策支持层:基于总和做出决策 def make_decision(total): if total > 1000: return "High" else: return "Low" # 主函数:组合所有模块 def main(): raw_data = collect_sensor_data() cleaned_data = clean_data(raw_data) transformed_data = transform_data(cleaned_data) total = calculate_total(transformed_data) decision = make_decision(total) print(f"Decision: {decision}") if __name__ == "__main__": main() ``` ### 架构设计注意事项 1. **状态管理**:在函数式编程中,尽量避免使用可变状态,而是通过不可变数据结构和递归函数实现状态传递。 2. **错误处理**:使用函数式编程中的 `Either` 或 `Option` 类型来处理可能的错误或缺失值。 3. **性能优化**:对于需要高性能的模块,可以结合函数式编程和命令式编程的优势,采用混合编程方式。 ### 相关问题 1. 如何在函数式编程中实现 CIM 架构的实时数据流处理? 2. 函数式方法如何提高 CIM 架构的可扩展性和维护性? 3. 在 CIM 架构中,如何结合函数式编程和传统命令式编程的优势? 4. 函数式编程在 CIM 架构中的状态管理有哪些最佳实践?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值