Data Warehousing数据仓库

本文探讨了企业级数据仓库的概念、发展变化,强调其在商业智能中的作用,以及与OLTP系统的区别。介绍了数据集市、数据可视化的多维模型、建模方法和主要问题,包括ETL挑战。同时,讨论了大数据对数据仓库的影响以及如何确保OLTP系统有效支持数据仓库建设。

What Is Data Warehousing? • Enterprise level data management — used in business intelligence (BI) • Includes copied transactional data • Used to monitor and make strategic decisions • Aggregated or summarised data • Documentation about metadata 企业级数据管理 用于汇总数据做出决策

How Data Warehousing Has Changed? • Since 2001 the amount of data held in data warehouses has tended to triple every two years • Terabytes-sized databases now commonplace • Size, prevalence, scope and complex of such systems expanding • Available to internal and external users可供内外部用户使用

What Are The Benefits? • Potential high returns on investment • Competitive advantage • Increased productivity of corporate decision makers高回报 竞争优势 提高决策者效率

What Are The Differences Between OLTP Systems And Data Warehousing?OLTP系统和数据仓库的区别552aa68f310f449c9c7c1bab3dee3920.png

1aa68e965ea54876a70976fd9a34620c.png

What Are Data Marts? “A database that contains a subset of corporate data to support the analytical requirements of a particular business unit (such as the Sales department), or to support users who share the same requirements to analyse a particular business process (such as property sales).”数据集市 包含企业数据子集的数据库 支持特定部门或用户分析特定业务流程

How Can Data Be Visualised Conceptually?  • Multidimensional data model • Data can then be:  — Used with standard reporting tools  ◦ Who... ?  ◦ What... ?  — Queried using OLAP (online analytical  processing) techniques: slice, dice,  roll-up and drill-down  ◦ What if... ?  ◦ Why... ?  — Subject to data mining  ◦ Unknown patterns概念上数据可视化 多维数据模型 使用OLAP

How To Model Schema In A Data Warehouse:  Dimensional Model (DM)  • Typical relational schema (also known as star schema)在数据仓库中建立唯独模型DM典型的关系模式(也称为星形模式)51cde558d0144e0e9cd2a0e2aef2ec8f.png

• Snowflake schema雪花模式

d967c49ae145407092309123e2d829c1.png

• Galaxy (starflake) schema 银河模式4a9c91db3f5b41aaaddcf3a738bb4ba6.pngWhat Are The Main Differences Between DMs And  ERMs?  ERM  • Designed for OLTP  • Identifying entities and their  relationship  • Predictive data retrieval  • Transactions made very simple and  deterministic  DM  • Designed for data warehousing  • Identifying facts  • Intuitive, high-performance retrieval  • Supports ad hoc end-user queries ERM 为OLTP设计 识别实体及关系 预测性数据检索 简单准确 DM 为数据仓库设计 识别事实 直观高效率检索 支持临时用户查询

How To Create A Dimensional Model? • Step 1: Select business process • Step 2: Declare grain • Step 3: Choose dimensions • Step 4: Identify facts • Step 5: Identify all dimensional attributes for the dimensional model 创建维度模型DM 1选择业务流程 - 2 确定粒度 - 3 选择维度 - 4 确定事实 - 5确定维度模式的所有维度属性

What Are Some Of The Problems With Data Warehousing? • Underestimation of resources for data ETL • Hidden problems with source systems • Required data not captured • Increased end-user demands • Data homogenization • High demand for resources • Data ownership • High maintenance • Log-duration projects • Complexity of integration数据仓库的问题 低估DLT数据占用资源 源系统隐患 所需数据不能获取 终端客户需求增加 数据同质化 资源需求量大 数据所有权 维护量大 日志持续时间长 集成复杂

What Are Some Specific ETL Problems? • Batch-oriented (i.e., not timely) • Scan for changes (i.e., performance hit) • Date columns (i.e., intrusive)ELT问题 批处理导向(不及时) 更改扫描(影响性能) 日期列(侵入性)

Summary • Many databases today support management decision making • Important to ensure OLTP systems effective first • Seek to build data warehousing system on top of OLTP system What impact can big data have on data warehousing? 确保OLTP系统 OLTP系统上建立数据仓库系统

 

 

 

评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值