Job Test List

本文包含DataWarehousing的基础概念与应用介绍,并提供了一系列关于ETL过程的问题及解答,涵盖从数据抽取到加载的各个环节,是了解数据仓库建设和ETL流程的重要资源。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

data warehousing 小测试

入门小测试 看你能答上多少?

Data Warehousing concept
1. What is Data Warehousing? Why do we need it?
2. What is ad hoc analysis?
3. Describe OLAP and OLTP systems? OLAP vs. OTLP: list five main differences.
4. What is a fact table in a dimensional model? What is a dimension table?
5. Characterize STAR and SNOWFLAKE schemas.
6. What is a Slow Changing Dimension? List all known types; describe implementation techniques.
7. What is a lookup table?
A lookup table is a finite set of ordered pairs {(x, f(x))}, that is, it is a function on a finite domain (the first column) to some range (the second column).
8. Why do we need Dimension Normalization?
9. List and describe storage models supporting multidimensional analysis (list pros and cons for each).  
10. What is
ETL?
11. What is a Data Mart?
12.  Describe Slice and Dice techniques.
13.  What is the main difference between Kimball
and Inmon approaches?...PLEASE STUDY ALOT ABOUT THESE TWO MODELS
14. What is a
data mining? List and describe data mining tasks.

 

ETL 面试问题 (会做的请给答案)

1. 分析 a. What is a logical data mapping and what does it mean to ETL team? b. What are the primary goals of the data discovery phase of the data warehouse project? c. How is the system-of-record determined?

2. 结构 a. What are the four basic data flow steps of an ETL process? b. What are the permissible data structures for the data staging area? Firefly describe the pros and cons of each c. When should data be set to disk for safekeeping during ETL?

3. 抽取 a. Describe techniques for extracting from heterogeneous data source b. What is the best approach for handling ERP source data? c. Explain the pros and cons of communication with databases natively versus ODBC d. Describe three change data capture(CDC) practices and the pros and cons of each

4. 数据质量 a. What are the four broad categories of data quality checks? Provide an implementation technique for each. b. At which stage of the ETL should data be profiled? c. What are the essential deliverables of the data quality portion of ETL? d. How can data quality be quantified in the data warehouse?

 5. 建立对应 a. What are surrogate keys? Explain how the surrogate key pipeline works. b. Why do dates require special treatment during process. c. Explain the three basic delivery steps for conformed dimensions. d. Name the three fundamental fact grains and describe an ETL approach for each. e. How are bridge tables delivered to classify groups of dimension records associated to a single fact? f. How does late arriving data affect dimension facts? Share techniques for handling each

6. metadata a. Describe the different type of ETL metadata and provide examples of each. b. Share acceptable mechanisms for capture operational metadata. c. Offer techniques for sharing business and technical metadata.

7. 优化 a. state the primary types of tables found in a data warehourse and the order which they mush be loaded to enforce referential integrity. b. What are the characteristics of the four levels of the ETL support model? c. What steps do you take for determine the bottleneck of a slow running ETL process? Describe how to estimate the load time of a large ETL job?

 

ETL架构师面试题

 

Analysis

1. What is a logical data mapping and what does it mean to the ETL team?

2. What are the primary goals of the data discovery phase of the data warehouse project?

3. How is the system-of-record determined?

Architecture

4. What are the four basic Data Flow steps of an ETL process?

5. What are the permissible data structures for the data staging area? Briefly describe the pros and cons of each.

6. When should data be set to disk for safekeeping during the ETL?

Extract

7. Describe techniques for extracting from heterogeneous data sources.

8. What is the best approach for handling ERP source data?

9. Explain the pros and cons of communicating with databases natively versus ODBC.

10. Describe three change data capture (CDC) practices and the pros and cons of each.

Data Quality

11. What are the four broad categories of data quality checks? Provide an implementation technique for each.

12. At which stage of the ETL should data be profiled?

13. What are the essential deliverables of the data quality portion of ETL?

14. How can data quality be quantified in the data warehouse?

Building mappings

15. What are surrogate keys? Explain how the surrogate key pipeline works.

16. Why do dates require special treatment during the ETL process?

17. Explain the three basic delivery steps for conformed dimensions.

18. Name the three fundamental fact grains and describe an ETL approach for each.

19. How are bridge tables delivered to classify groups of dimension records associated to a single fact?

20. How does late arriving data affect dimensions and facts? Share techniques for handling each.

Metadata

21. Describe the different types of ETL metadata and provide examples of each.

22. Share acceptable mechanisms for capturing operational metadata.

23. Offer techniques for sharing business and technical metadata.

Optimization/Operations

24. State the primary types of tables found in a data warehouse and the order which they must be loaded to enforce referential integrity.

25. What are the characteristics of the four levels of the ETL support model?

26. What steps do you take to determine the bottleneck of a slow running ETL process?

27. Describe how to estimate the load time of a large ETL job.

Real Time ETL

28. Describe the architecture options for implementing real-time ETL.

29. Explain the different real-time approaches and how they can be applied in different business scenarios.

30. Outline some challenges faced by real-time ETL and describe how to overcome them.

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值