Practical Steps for Designing a Dimensional Model

本文介绍了一个从需求收集到最终审批的实用步骤,用于设计维度模型的过程。通过团队合作的方式,可以确保模型既符合业务需求又能反映源数据的真实情况。

Kimball University: Practical Steps for Designing a Dimensional Model

 

What does it take to develop a robust dimensional model? Here's how to get from requirements-gathering to final approval in a process that will ferret out the good, bad and ugly realities of your source data and help you avoid surprises, delays and cost overruns.


By

Margy Ross
September 29, 2008

 

Kimball Group has written more than 150 Intelligent Enterprise columns and one hundred Kimball Design Tips about dimensional modeling techniques, but we haven't written much about the dimensional modeling process. What are the tasks and deliverables required to create a robust design? This article presents best practices for designing a dimensional model.

Before embarking on a dimensional design project, you need a solid understanding of the business's requirements along with a reasonable assessment of the underlying source data. It's tempting to skip the requirements review, but resist this urge as doing so increases the risk of developing a source-driven model that falls short of business needs in numerous small but significant ways. Optimally, the requirements have been thoroughly researched and documented in a user-approved requirements finding document with the top priority needs clearly identified. This deliverable often also includes a preliminary data warehouse bus matrix. (More details about the bus matrix, a critical data architecture planning tool, are available in an Intelligent Enterprise column, "The Matrix: Revisited." )

In addition to the business requirements and bus matrix, you'll also want to review any source data profiling insights uncovered by the project team to date. The final item on your required-reading list is your organization's naming convention standards document. If you don't already have naming conventions to adapt for data warehousing and business intelligence, you'll need to establish them as you develop the dimensional model. For example, you may opt for a three-part column naming standard consisting of a prime word, zero or more qualifiers, and class word to yield a column name such as sales_dollar_amount.

 

Join the Party

Everyone knows that the key to a successful party is inviting the right people. The same holds true for the dimensional modeling festivities. Unfortunately, too many data modelers view the development of a dimensional model as a solo, independent activity; they retreat to their ivory cubicle to ponder the pros and cons of modeling alternatives on their own, emerging weeks later to unfurl their masterpiece for others to review. While the data modeler should be leading the charge and retain primary responsibility for the deliverables, the best way to arrive at a sound dimensional design is through a collaborative team effort since no one person likely has detailed knowledge of both the business requirements and source system idiosyncrasies.

The modeling team should also include people who can accurately represent the business users' analytic needs, such as the business analyst, power user, BI application developer, or all of the above. In particular, power user involvement is valuable as they've likely already identified busines rules to convert source data into more meaningful information for decision making; their insights often result in a richer, more analytically complete design. The designated data stewards should be involved in the modeling process to drive to organizational agreement on names, definitions, and business rules (If you haven't already established a data stewardship program, there's no time like the present to do so).

It's also beneficial to have some source experts at least intermittently involved in the process to quickly answer questions and resolve issues regarding data timing and content nuances. Finally, you should invite members of the ETL team to the party to gain early insights about the model and its source-to-target mapping. ETL team members typically have more to gain than contribute to the modeling process, but getting their buy-in to the design saves valuable time and avoids tire-spinning down the road. For the same reason, you should also involve the DBAs who will implement the physical design.

 

diagram
With the team assembled and prerequisite reading completed, it may be appropriate to briefly introduce dimensional modeling so everyone on the team understands its core principles (and appreciates that denormalization is not always evil). The team's first objective is to reach agreement on a high-level model diagram, or bubble chart, as illustrated (POSITION). The bubble chart represents a fact table corresponding to a single business process; it includes a clearly articulated grain declaration and identification of core dimensions. The bubble chart can often be derived largely from the preliminary bus matrix. It serves to get everyone on the same page regarding the scope of the model without getting unnecessarily mired in the details. Because it's easy-to-understand, the high-level diagram also facilitates discussion between the design team and business partners.

Dive Into the Details

With consensus on the bubble chart, the team then launches into table-by-table and column-by-column discussion, drilling into more detail about the requisite attributes and metrics, including definitions, transformation rules and data-quality concerns. The dimensional model will unfold through a series of design sessions, with each pass producing a more detailed design that's been repeatedly tested against your understanding of the business needs.

Don't attempt to schedule all-day design meetings; structure several one-hour morning and afternoon sessions so the lead modeler has time to update the documentation before the next meeting (and so team members can deal with the demands of their day jobs). Your initial sessions should focus on a more straightforward dimension table so the team experiences a quick win before dealing with the more controversial dimensions.

As you're iteratively fleshing out the design, team members will need to do more interrogation of the data. Thus, ongoing access to a profiling tool (or less sophisticated method) will be critical. Ferreting out the good, bad and ugly realities of your source data during the design process will minimize the surprises and corresponding overruns during the ETL design and development activity.

Throughout the design sessions, the data modeler or designated scribe should be filling in a detailed worksheet for each table with information such as the attribute or fact name, column description, sample values and decodes, change tracking rules for dimension attributes, and preliminary transformation business rules. These worksheets form the basis for the source-to-target mapping, which is further embellished by the physical designer and then ultimately handed off to the ETL team. In addition to the detailed worksheets, a member of the design team should also be logging open issues so they're captured in a single document to facilitate review and assignment at the end of every session.

Review the Results

The last phase of the modeling process involves reviewing and validating the model with interested parties, starting with the project team, extending to those in IT with intimate knowledge of the source systems, and concluding with the broader business community. Approval from this last group is critical before time and money is invested in the data implementation. Plan to devote much of this review to illustrating how the model will address sample questions from the requirements findings.

A typical design effort usually takes three to four weeks for a single business process dimensional model, but the time required will vary depending on the complexity of the business process, availability of preexisting conformed dimensions, experience of the modeling team, existence of well-documented business requirements, and the difficulty reaching consensus.

Designing a dimensional model with interested parties representing diverse skills requires commitment and cooperation, but the end result is a robust model that has been rigorously tested against both the business needs and data realities. That's exactly what you want before you move ahead with the implementation.

For more information about the dimensional modeling process, attend a Kimball University class such as the Dimensional Modeling in Depth or Data Warehouse Lifecycle in Depth courses. You can read more about the dimensional modeling process discussed above in The Data Warehouse Lifecycle Toolkit 2nd Edition , co-authored by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker of the Kimball Group .


内容概要:本文详细介绍了一种基于Simulink的表贴式永磁同步电机(SPMSM)有限控制集模型预测电流控制(FCS-MPCC)仿真系统。通过构建PMSM数学模型、坐标变换、MPC控制器、SVPWM调制等模块,实现了对电机定子电流的高精度跟踪控制,具备快速动态响应和低稳态误差的特点。文中提供了完整的仿真建模步骤、关键参数设置、核心MATLAB函数代码及仿真结果分析,涵盖转速、电流、转矩和三相电流波形,验证了MPC控制策略在动态性能、稳态精度和抗负载扰动方面的优越性,并提出了参数自整定、加权代价函数、模型预测转矩控制和弱磁扩速等优化方向。; 适合人群:自动化、电气工程及其相关专业本科生、研究生,以及从事电机控制算法研究与仿真的工程技术人员;具备一定的电机原理、自动控制理论和Simulink仿真基础者更佳; 使用场景及目标:①用于永磁同步电机模型预测控制的教学演示、课程设计或毕业设计项目;②作为电机先进控制算法(如MPC、MPTC)的仿真验证平台;③支撑科研中对控制性能优化(如动态响应、抗干扰能力)的研究需求; 阅读建议:建议读者结合Simulink环境动手搭建模型,深入理解各模块间的信号流向与控制逻辑,重点掌握预测模型构建、代价函数设计与开关状态选择机制,并可通过修改电机参数或控制策略进行拓展实验,以增强实践与创新能力。
根据原作 https://pan.quark.cn/s/23d6270309e5 的源码改编 湖北省黄石市2021年中考数学试卷所包含的知识点广泛涉及了中学数学的基础领域,涵盖了实数、科学记数法、分式方程、几何体的三视图、立体几何、概率统计以及代数方程等多个方面。 接下来将对每道试题所关联的知识点进行深入剖析:1. 实数与倒数的定义:该题目旨在检验学生对倒数概念的掌握程度,即一个数a的倒数表达为1/a,因此-7的倒数可表示为-1/7。 2. 科学记数法的运用:科学记数法是一种表示极大或极小数字的方法,其形式为a×10^n,其中1≤|a|<10,n为整数。 此题要求学生运用科学记数法表示一个天文单位的距离,将1.4960亿千米转换为1.4960×10^8千米。 3. 分式方程的求解方法:考察学生解决包含分母的方程的能力,题目要求找出满足方程3/(2x-1)=1的x值,需通过消除分母的方式转化为整式方程进行解答。 4. 三视图的辨认:该题目测试学生对于几何体三视图(主视图、左视图、俯视图)的认识,需要识别出具有两个相同视图而另一个不同的几何体。 5. 立体几何与表面积的计算:题目要求学生计算由直角三角形旋转形成的圆锥的表面积,要求学生对圆锥的底面积和侧面积公式有所了解并加以运用。 6. 统计学的基础概念:题目涉及众数、平均数、极差和中位数的定义,要求学生根据提供的数据信息选择恰当的统计量。 7. 方程的整数解求解:考察学生在实际问题中进行数学建模的能力,通过建立方程来计算在特定条件下帐篷的搭建方案数量。 8. 三角学的实际应用:题目通过在直角三角形中运用三角函数来求解特定线段的长度。 利用正弦定理求解AD的长度是解答该问题的关键。 9. 几何变换的应用:题目要求学生运用三角板的旋转来求解特定点的...
在使用函数式方法设计 CIM(Computer Integrated Manufacturing)架构时,需要考虑多个方面,包括系统模块化、可扩展性、数据流处理和交互逻辑等。CIM 架构的核心目标是实现制造过程中信息流的高效整合,涵盖从设计、生产、管理到服务的全生命周期。函数式编程的特性,如不可变数据、纯函数和高阶函数,能够为架构设计提供良好的抽象和模块化能力。 ### 函数式方法的优势 函数式方法在设计 CIM 架构时具有以下优势: 1. **模块化与解耦**:通过将系统划分为多个独立的纯函数模块,可以降低模块间的耦合度,使系统更易于维护和扩展。 2. **数据流处理**:函数式编程擅长处理数据流,特别适用于 CIM 架构中涉及的大量实时数据采集、处理和分析任务。 3. **并发性与可扩展性**:由于函数式编程避免了副作用,可以更容易地实现并行计算和分布式处理,提高系统的吞吐量和响应能力。 ### 核心组件设计 在设计 CIM 架构时,可以采用以下函数式组件: 1. **数据采集层**:使用纯函数处理传感器数据的采集和初步清洗,确保数据的准确性和一致性。 2. **业务逻辑层**:通过高阶函数实现业务规则的动态配置,支持灵活的流程调整。 3. **决策支持层**:利用函数式编程的不可变特性,确保分析结果的可靠性和可追溯性。 ### 示例代码 以下是一个简单的函数式代码示例,展示了如何使用 Python 的函数式编程特性实现 CIM 架构中的数据处理模块: ```python from functools import reduce # 数据采集层:模拟传感器数据采集 def collect_sensor_data(): # 模拟从传感器获取数据 return [10, 20, 30, 40, 50] # 数据处理层:清洗和转换数据 def clean_data(data): # 移除异常值 return [x for x in data if x > 0] def transform_data(data): # 将数据转换为平方值 return list(map(lambda x: x ** 2, data)) # 业务逻辑层:计算数据总和 def calculate_total(data): return reduce(lambda a, b: a + b, data) # 决策支持层:基于总和做出决策 def make_decision(total): if total > 1000: return "High" else: return "Low" # 主函数:组合所有模块 def main(): raw_data = collect_sensor_data() cleaned_data = clean_data(raw_data) transformed_data = transform_data(cleaned_data) total = calculate_total(transformed_data) decision = make_decision(total) print(f"Decision: {decision}") if __name__ == "__main__": main() ``` ### 架构设计注意事项 1. **状态管理**:在函数式编程中,尽量避免使用可变状态,而是通过不可变数据结构和递归函数实现状态传递。 2. **错误处理**:使用函数式编程中的 `Either` 或 `Option` 类型来处理可能的错误或缺失值。 3. **性能优化**:对于需要高性能的模块,可以结合函数式编程和命令式编程的优势,采用混合编程方式。 ### 相关问题 1. 如何在函数式编程中实现 CIM 架构的实时数据流处理? 2. 函数式方法如何提高 CIM 架构的可扩展性和维护性? 3. 在 CIM 架构中,如何结合函数式编程和传统命令式编程的优势? 4. 函数式编程在 CIM 架构中的状态管理有哪些最佳实践?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值