data warehousing 小测试
入门小测试 看你能答上多少?
Data Warehousing concept
1. What is Data Warehousing? Why do we need it?
2. What is ad hoc analysis?
3. Describe OLAP and OLTP systems? OLAP vs. OTLP: list five main differences.
4. What is a fact table in a dimensional model? What is a dimension table?
5. Characterize STAR and SNOWFLAKE schemas.
6. What is a Slow Changing Dimension? List all known types; describe implementation techniques.
7. What is a lookup table?
A lookup table is a finite set of ordered pairs {(x, f(x))}, that is, it is a function on a finite domain (the first column) to some range (the second column).
8. Why do we need Dimension Normalization?
9. List and describe storage models supporting multidimensional analysis (list pros and cons for each).
10. What is ETL?
11. What is a Data Mart?
12. Describe Slice and Dice techniques.
13. What is the main difference between Kimball抯 and Inmon抯 approaches?...PLEASE STUDY ALOT ABOUT THESE TWO MODELS
14. What is a data mining? List and describe data mining tasks.
ETL 面试问题 (会做的请给答案)
1. 分析 a. What is a logical data mapping and what does it mean to ETL team? b. What are the primary goals of the data discovery phase of the data warehouse project? c. How is the system-of-record determined?
2. 结构 a. What are the four basic data flow steps of an ETL process? b. What are the permissible data structures for the data staging area? Firefly describe the pros and cons of each c. When should data be set to disk for safekeeping during ETL?
3. 抽取 a. Describe techniques for extracting from heterogeneous data source b. What is the best approach for handling ERP source data? c. Explain the pros and cons of communication with databases natively versus ODBC d. Describe three change data capture(CDC) practices and the pros and cons of each
4. 数据质量 a. What are the four broad categories of data quality checks? Provide an implementation technique for each. b. At which stage of the ETL should data be profiled? c. What are the essential deliverables of the data quality portion of ETL? d. How can data quality be quantified in the data warehouse?
5. 建立对应 a. What are surrogate keys? Explain how the surrogate key pipeline works. b. Why do dates require special treatment during process. c. Explain the three basic delivery steps for conformed dimensions. d. Name the three fundamental fact grains and describe an ETL approach for each. e. How are bridge tables delivered to classify groups of dimension records associated to a single fact? f. How does late arriving data affect dimension facts? Share techniques for handling each
6. metadata a. Describe the different type of ETL metadata and provide examples of each. b. Share acceptable mechanisms for capture operational metadata. c. Offer techniques for sharing business and technical metadata.
7. 优化 a. state the primary types of tables found in a data warehourse and the order which they mush be loaded to enforce referential integrity. b. What are the characteristics of the four levels of the ETL support model? c. What steps do you take for determine the bottleneck of a slow running ETL process? Describe how to estimate the load time of a large ETL job?
|
Analysis
1. What is a logical data mapping and what does it mean to the ETL team?
2. What are the primary goals of the data discovery phase of the data warehouse project?
3. How is the system-of-record determined?
Architecture
4. What are the four basic Data Flow steps of an ETL process?
6. When should data be set to disk for safekeeping during the ETL?
Extract
7. Describe techniques for extracting from heterogeneous data sources.
8. What is the best approach for handling ERP source data?
9. Explain the pros and cons of communicating with databases natively versus ODBC.
10. Describe three change data capture (CDC) practices and the pros and cons of each.
Data Quality
12. At which stage of the ETL should data be profiled?
13. What are the essential deliverables of the data quality portion of ETL?
14. How can data quality be quantified in the data warehouse?
Building mappings
15. What are surrogate keys? Explain how the surrogate key pipeline works.
16. Why do dates require special treatment during the ETL process?
17. Explain the three basic delivery steps for conformed dimensions.
18. Name the three fundamental fact grains and describe an ETL approach for each.
19. How are bridge tables delivered to classify groups of dimension records associated to a single fact?
20. How does late arriving data affect dimensions and facts? Share techniques for handling each.
Metadata
21. Describe the different types of ETL metadata and provide examples of each.
22. Share acceptable mechanisms for capturing operational metadata.
23. Offer techniques for sharing business and technical metadata.
Optimization/Operations
25. What are the characteristics of the four levels of the ETL support model?
26. What steps do you take to determine the bottleneck of a slow running ETL process?
27. Describe how to estimate the load time of a large ETL job.
Real Time ETL
28. Describe the architecture options for implementing real-time ETL.
30. Outline some challenges faced by real-time ETL and describe how to overcome them.