DataCamp课程：Database Design

Daisy Lee

于 2020-07-26 18:09:51 发布

阅读量3.8k

点赞数

分类专栏： DataCamp课程文章标签： sql

本文链接：https://blog.youkuaiyun.com/weixin_42871941/article/details/107407270

版权

1. Processing, Storing, and Organizing Data

</> OLAP vs. OLTP

在这里插入图片描述
Categorize the cards into the approach that they describe best.

OLAP	OLTP
Queries a larger amount of data	Most likely to have data from the past hour
Helps businesses with decision making and problem solving	Typically uses an operational database
Typically uses a data warehouse	Data is inserted and updated more often

</> Which is better?

Explore the dataset. What data processing approach is this larger repository most likely using?

OLTP because this table could not be used for any analysis.
OLAP because each record has a unique service request number.
OLTP because this table’s structure appears to require frequent updates.
OLAP because this table focuses on pothole requests only.

</> Name that data type!

Each of these cards hold a type of data. Place them in the correct category.

Unstructured	Semi-Structured	Structured
To-do notes in a text editor	CSVs of open data downloaded from your local government websites	A relational database with latest withdrawals and deposits made by clients
Images in your photo library	JSON object of tweets outputted in real-time by the Twitter API
Zip file of all text messages ever received

</> Ordering ETL Tasks

In the ETL flow you design, different steps will take place. Place the steps in the most appropriate order.

eCommerce API outputs real time data of transactions

Python script drops null rows and clean data into pre-determined columns

Resulting dataframe is written into an AWS Redshift Warehouse

</> Recommend a storage solution

When should you choose a data warehouse over a data lake?

To train a machine learning model with a 150 GB of raw image data.
To store real-time social media posts that may be used for future analysis
To store customer data that needs to be updated regularly
To create accessible and isolated data repositories for other analysts

</> Classifying data models

Each of these cards hold a tool or concept that fits into a certain type of data model. Place the cards in the correct category.

Conceptual Data Model	Logical Data Model	Physical Data Model
Gathers business requirements	Relational model	File structure of data storage
Entities, attributes, and relationships	Determining tables and columns

</> Deciding fact and dimension tables

Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?

A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively.
A fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively.
A fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.

Create a dimension table called route that will hold the route information.

Create a dimension table called week that will hold the week information.

CREATE TABLE route(
	route_id INTEGER PRIMARY KEY,
    park_name VARCHAR(160) NOT NULL,
    city_name VARCHAR(160) NOT NULL,
    distance_km FLOAT NOT NULL,
    route_name VARCHAR(160) NOT NULL
);

CREATE TABLE week(
	week_id INTEGER PRIMARY KEY,
    week INTEGER NOT NULL,
    month VARCHAR(160) NOT NULL,
    year INTEGER NOT NULL
);

</> Querying the dimensional model

Calculate the sum of the duration_mins column.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact;

sum
1172.16

Join week_dim and runs_fact.

Get all the week_id’s from July, 2019.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';

sum
381.46

2. Database Schemas and Normalization

</> Running from star to snowflake

After learning about the snowflake schema, you convert the current star schema into a snowflake schema. To do this, you normalize route_dim and week_dim. Which option best describes the resulting new tables after doing this?

The tables runs_fact, route_dim, and week_dim have been loaded.

week_dim is extended two dimensions with new tables for month and year. route_dim is extended one dimension with a new table for city.
week_dim is extended two dimensions with new tables for month and year. route_dim is extended two dimensions with new tables for city and park.
week_dim is extended three dimensions with new tables for week, month and year. route_dim is extended one dimension with new tables for city and park.

</> Adding foreign keys

In the constraint called sales_book, set book_id as a foreign key.

In the constraint called sales_time, set time_id as a foreign key.

In the constraint called sales_store, set store_id as a foreign key.

-- Add the book_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_book
    FOREIGN KEY (book_id) REFERENCES dim_book_star (book_id);
-- Add the time_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_time
    FOREIGN KEY (time_id) REFERENCES dim_time_star (time_id);
-- Add the store_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_store
    FOREIGN KEY (store_id) REFERENCES dim_store_star (store_id);