DataCamp课程:Database Design

1. Processing, Storing, and Organizing Data

</> OLAP vs. OLTP

在这里插入图片描述
Categorize the cards into the approach that they describe best.

OLAP OLTP
Queries a larger amount of data Most likely to have data from the past hour
Helps businesses with decision making and problem solving Typically uses an operational database
Typically uses a data warehouse Data is inserted and updated more often

</> Which is better?

Explore the dataset. What data processing approach is this larger repository most likely using?

  • OLTP because this table could not be used for any analysis.
  • OLAP because each record has a unique service request number.
  • OLTP because this table’s structure appears to require frequent updates.
  • OLAP because this table focuses on pothole requests only.

</> Name that data type!

Each of these cards hold a type of data. Place them in the correct category.

Unstructured Semi-Structured Structured
To-do notes in a text editor CSVs of open data downloaded from your local government websites A relational database with latest withdrawals and deposits made by clients
Images in your photo library JSON object of tweets outputted in real-time by the Twitter API
Zip file of all text messages ever received 在这里插入图片描述

</> Ordering ETL Tasks

In the ETL flow you design, different steps will take place. Place the steps in the most appropriate order.

eCommerce API outputs real time data of transactions

Python script drops null rows and clean data into pre-determined columns

Resulting dataframe is written into an AWS Redshift Warehouse

</> Recommend a storage solution

When should you choose a data warehouse over a data lake?

  • To train a machine learning model with a 150 GB of raw image data.
  • To store real-time social media posts that may be used for future analysis
  • To store customer data that needs to be updated regularly
  • To create accessible and isolated data repositories for other analysts

</> Classifying data models

Each of these cards hold a tool or concept that fits into a certain type of data model. Place the cards in the correct category.

Conceptual Data Model Logical Data Model Physical Data Model
Gathers business requirements Relational model File structure of data storage
Entities, attributes, and relationships Determining tables and columns

</> Deciding fact and dimension tables

Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?

  • A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively.
  • A fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively.
  • A fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.

Create a dimension table called route that will hold the route information.

Create a dimension table called week that will hold the week information.

CREATE TABLE route(
	route_id INTEGER PRIMARY KEY,
    park_name VARCHAR(160) NOT NULL,
    city_name VARCHAR(160) NOT NULL,
    distance_km FLOAT NOT NULL,
    route_name VARCHAR(160) NOT NULL
);

CREATE TABLE week(
	week_id INTEGER PRIMARY KEY,
    week INTEGER NOT NULL,
    month VARCHAR(160) NOT NULL,
    year INTEGER NOT NULL
);

</> Querying the dimensional model

Calculate the sum of the duration_mins column.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact;

sum
1172.16

Join week_dim and runs_fact.

Get all the week_id’s from July, 2019.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';

sum
381.46

2. Database Schemas and Normalization

</> Running from star to snowflake

After learning about the snowflake schema, you convert the current star schema into a snowflake schema. To do this, you normalize route_dim and week_dim. Which option best describes the resulting new tables after doing this?

The tables runs_fact, route_dim, and week_dim have been loaded.

  • week_dim is extended two dimensions with new tables for month and year. route_dim is extended one dimension with a new table for city.
  • week_dim is extended two dimensions with new tables for month and year. route_dim is extended two dimensions with new tables for city and park.
  • week_dim is extended three dimensions with new tables for week, month and year. route_dim is extended one dimension with new tables for city and park.

</> Adding foreign keys

In the constraint called sales_book, set book_id as a foreign key.

In the constraint called sales_time, set time_id as a foreign key.

In the constraint called sales_store, set store_id as a foreign key.

-- Add the book_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_book
    FOREIGN KEY (book_id) REFERENCES dim_book_star (book_id);
-- Add the time_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_time
    FOREIGN KEY (time_id) REFERENCES dim_time_star (time_id);
-- Add the store_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_store
    FOREIGN KEY (store_id) REFERENCES dim_store_star (store_id);

</> Extending the book dimension

Create dim_author with a column for author.

Insert all the distinct authors from dim_book_star into dim_

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值