大数据-数据仓库（原理+实战）

weixin_43952924

于 2024-07-24 12:24:03 发布

阅读量2.1k

点赞数 25

分类专栏：大数据文章标签：大数据数据仓库

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/weixin_43952924/article/details/140561830

版权

1、简介

诞生原因：历史数据积存+企业分析数据需要（统一，不用建立多个数据抽取系统而且可以保证数据的一致性）
data warehose DW
数据集合（面向主题，集成，非易失，时变性）
不允许修改

数据库	数据仓库
OLTP 在线事务处理随机读取注重冗余，范式规范基于ER模型，面向应用 GB-TB	OLAP 在线分析批量读写注重数据整合，引入冗余，反范式基于星形/雪花，面向主题 >=TB

	传统数据仓库	大数据数据仓库
定义	多个关系型数据库组成MPP集群（大规模并行处理），一个数据多个节点，结果是汇总	分布式SQL引擎（SQL向大数据的转换）-大数据计算引擎-分布式文件系统
优缺点	扩展性有限：需要用到数据交换，要用高速网络，限制节点上线分库分表也存在上限，力度越细，性能越差热点问题：如果高频访问数据只存在了一个节点，会容易出问题	可扩展：文件系统，把结构数据变成文件，很粗犷，不细分，利于扩展性解决热点：对数据进行备份，备份三份，分发任务的时候可以选择一个空闲的数据节点。问题在于SQL支持率较低，缺少事务支持，数据量较小的时候慢。
两个架构区别	单机数据库节点组成集群非共享，每个节点有独立的磁盘存储和内存系统，不关心其他节点。但是只能作为一个整体去提供服务。通过专用网络连接，速度很快。架构上遵从数据一致性（C）事务、然后A可用性、然后P分区容错性。所以更注重锁、事务啥的。太精细了，只适合中等的。缺陷：数据存储不透明，分配的时候用的是HASH,但是查询时候所有节点都进行。扩展性问题，单个节点一定成为系统的短板。随着集群增大，节点故障率会越来越高。	也称为批处理、Hadoop 场地自治，可以单独运行局部应用。数据是共享的。计算的时候，访问公共存储系统，找到位置。通过局域网、广域网，所以在运算的时候要减少数据移动。优先考虑P（分区容错性）、A可用性、C一致性。（数据存在多个节点上，备份。）这两个合起来：数据存储采用分布式架构中的公共存储，提高分区容错性，但是上层用MPP，减少运算延迟
常见产品	oracle：单个集群只能支持100左右，适合数据量不大的场景 DB2：半身是mpp架构，并不占优势。 teradata：商业数据库，一体机，自带数据引擎和查询 greeplum：开源。学习资料多。稳定性。易用性，性能比teradata差	hive：SQL转成MapReduce，也支持转spark。海量数据 hql sparkSQL：

最低0.47元/天解锁文章

博客等级

码龄1天

19
原创

84
点赞

172
收藏

57
粉丝

关注

私信

热门文章

分类专栏

大数据 2篇
思考 5篇
stata 2篇
测试 1篇
算法 1篇
rasa 1篇
统计 1篇

展开全部收起

上一篇：: 西瓜书+南瓜书第六章

下一篇：: leetcode刷题—数组（二分查找/双指针）

最新评论

stata domin
xiaomodee: 安装命令在STATA18中已经报错了，更新的安装domin命令是ssc install domin, all replace
大数据-数据仓库（原理+实战）
一起搞IT吧: ？HelloHello, you can use AI tools such as deepseek to try to help you answer related questioHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseHello, you can use AI tools such as deepseek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.ns. Generally, there will be more comprehensive responses. You can try to see., you can use AI tools such as deepseek to try to help you answer related questions. Generally, there will be more comprehensive responses. You can try to see.
大数据-数据仓库（原理+实战）
qq_42870851: 2、数据仓库架构 ETL 里面kettle写成kattle，写错了
leetcode刷题——链表（快慢指针/虚拟头节点）
优快云-Ada助手: 不知道算法技能树是否可以帮到你：https://edu.youkuaiyun.com/skill/algorithm?utm_source=AI_act_algorithm
概览西瓜书+南瓜书第1、2章
优快云-Ada助手: 恭喜您写了第11篇博客！标题“概览西瓜书+南瓜书第1、2章”让我对您的内容产生了浓厚的兴趣。您的持续创作可见您的努力和热情。对于下一步的创作建议，我谨慎地建议您探索更多深入的主题，例如对于这两本书中的关键概念进行更详细的解读，或者分享您在阅读过程中的思考和体验。期待您的博客能够继续为读者带来新的见解和启发！

大家在看

最新文章

目录

展开全部

收起

评论 2

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。