How to handle Slowly Changing Dimensions (SCDs) in data model design?

本文介绍了数据仓库中处理缓慢变化维度(SCD)的三种主要方法:覆盖旧数据、添加新记录及使用单独列跟踪变更,并详细解释了每种方法的特点及适用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

There are multiple methods to handle the slowly changing dimensions. Which technique to use depends on your business requirements. The choice among these three methods are not a technical design decision since their behaviors are different.

Type One: Overwite the old data with new data

Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count all the sales done by Sandy as Lanura’s.

The above example may not sound making business sense. However, if you only report the sales of the current period, and salesrep does not change during the period, this method is ok to be used.

Mary OLTP tables does not need to track the history of changes and thus this method may be used by the source application. However, if you want to report the historical data, even your OLTP does not track history, the data warehouse can still use other methods to track the history.

Type Two: Add a new record at the timeof the change

Using this method, all priorhistory are saved. There are two alternative methods to model the key of this table.

Method A – No surrogate key – Use timestamp

When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records.

When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns – the effective start date and the effective end date.

Method B – No surrogate key – Use version number

Instead of using the date column, a version number is used to differentiate the different versions of the records.

This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date.

Method C – Use a surrogate key

When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key.

Type Three: Track changes using a separate column

Using this method, you use a separate column of dimension table to store the values of previous years, in addition to the current year data.

This method does not track all the history, but just one prior version.

If the data is changed, the old value need to be moved from the current value column to the prior column and the new value overwrites the current column.

This method is used when the changes is not randon but a predefined interval such as annual.

出处: http://dylanwan.wordpress.com/2007/01/13/how-to-handle-slowly-changing-dimensions-scds-in-data-model-design/
内容概要:本文介绍了多种开发者工具及其对开发效率的提升作用。首先,介绍了两款集成开发环境(IDE):IntelliJ IDEA 以其智能代码补全、强大的调试工具和项目管理功能适用于Java开发者;VS Code 则凭借轻量级和多种编程语言的插件支持成为前端开发者的常用工具。其次,提到了基于 GPT-4 的智能代码生成工具 Cursor,它通过对话式编程显著提高了开发效率。接着,阐述了版本控制系统 Git 的重要性,包括记录代码修改、分支管理和协作功能。然后,介绍了 Postman 作为 API 全生命周期管理工具,可创建、测试和文档化 API,缩短前后端联调时间。再者,提到 SonarQube 这款代码质量管理工具,能自动扫描代码并检测潜在的质量问题。还介绍了 Docker 容器化工具,通过定义应用的运行环境和依赖,确保环境一致性。最后,提及了线上诊断工具 Arthas 和性能调优工具 JProfiler,分别用于生产环境排障和性能优化。 适合人群:所有希望提高开发效率的程序员,尤其是有一定开发经验的软件工程师和技术团队。 使用场景及目标:①选择合适的 IDE 提升编码速度和代码质量;②利用 AI 编程助手加快开发进程;③通过 Git 实现高效的版本控制和团队协作;④使用 Postman 管理 API 的全生命周期;⑤借助 SonarQube 提高代码质量;⑥采用 Docker 实现环境一致性;⑦运用 Arthas 和 JProfiler 进行线上诊断和性能调优。 阅读建议:根据个人或团队的需求选择适合的工具,深入理解每种工具的功能特点,并在实际开发中不断实践和优化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值