1. Processing, Storing, and Organizing Data
</> OLAP vs. OLTP
Categorize the cards into the approach that they describe best.
OLAP | OLTP |
---|---|
Queries a larger amount of data | Most likely to have data from the past hour |
Helps businesses with decision making and problem solving | Typically uses an operational database |
Typically uses a data warehouse | Data is inserted and updated more often |
</> Which is better?
Explore the dataset. What data processing approach is this larger repository most likely using?
- OLTP because this table could not be used for any analysis.
- OLAP because each record has a unique service request number.
- OLTP because this table’s structure appears to require frequent updates.
- OLAP because this table focuses on pothole requests only.
</> Name that data type!
Each of these cards hold a type of data. Place them in the correct category.
Unstructured | Semi-Structured | Structured |
---|---|---|
To-do notes in a text editor | CSVs of open data downloaded from your local government websites | A relational database with latest withdrawals and deposits made by clients |
Images in your photo library | JSON object of tweets outputted in real-time by the Twitter API | |
Zip file of all text messages ever received | ![]() |
</> Ordering ETL Tasks
In the ETL flow you design, different steps will take place. Place the steps in the most appropriate order.
eCommerce API outputs real time data of transactions
Python script drops null rows and clean data into pre-determined columns
Resulting dataframe is written into an AWS Redshift Warehouse
</> Recommend a storage solution
When should you choose a data warehouse over a data lake?
- To train a machine learning model with a 150 GB of raw image data.
- To store real-time social media posts that may be used for future analysis
- To store customer data that needs to be updated regularly
- To create accessible and isolated data repositories for other analysts
</> Classifying data models
Each of these cards hold a tool or concept that fits into a certain type of data model. Place the cards in the correct category.
Conceptual Data Model | Logical Data Model | Physical Data Model |
---|---|---|
Gathers business requirements | Relational model | File structure of data storage |
Entities, attributes, and relationships | Determining tables and columns |
</> Deciding fact and dimension tables
Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?
- A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively.
- A fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively.
- A fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.
Create a dimension table called route that will hold the route information.
Create a dimension table called week that will hold the week information.
CREATE TABLE route(
route_id INTEGER PRIMARY KEY,
park_name VARCHAR(160) NOT NULL,
city_name VARCHAR(160) NOT NULL,
distance_km FLOAT NOT NULL,
route_name VARCHAR(160) NOT NULL
);
CREATE TABLE week(
week_id INTEGER PRIMARY KEY,
week INTEGER NOT NULL,
month VARCHAR(160) NOT NULL,
year INTEGER NOT NULL
);
</> Querying the dimensional model
Calculate the sum of the duration_mins column.
SELECT
SUM(duration_mins)
FROM
runs_fact;
sum
1172.16
Join week_dim and runs_fact.
Get all the week_id’s from July, 2019.
SELECT
SUM(duration_mins)
FROM
runs_fact
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';
sum
381.46
2. Database Schemas and Normalization
</> Running from star to snowflake
After learning about the snowflake schema, you convert the current star schema into a snowflake schema. To do this, you normalize route_dim and week_dim. Which option best describes the resulting new tables after doing this?
The tables runs_fact, route_dim, and week_dim have been loaded.
- week_dim is extended two dimensions with new tables for month and year. route_dim is extended one dimension with a new table for city.
- week_dim is extended two dimensions with new tables for month and year. route_dim is extended two dimensions with new tables for city and park.
- week_dim is extended three dimensions with new tables for week, month and year. route_dim is extended one dimension with new tables for city and park.
</> Adding foreign keys
In the constraint called sales_book, set book_id as a foreign key.
In the constraint called sales_time, set time_id as a foreign key.
In the constraint called sales_store, set store_id as a foreign key.
-- Add the book_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_book
FOREIGN KEY (book_id) REFERENCES dim_book_star (book_id);
-- Add the time_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_time
FOREIGN KEY (time_id) REFERENCES dim_time_star (time_id);
-- Add the store_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_store
FOREIGN KEY (store_id) REFERENCES dim_store_star (store_id);
</> Extending the book dimension
Create dim_author with a column for author.
Insert all the distinct authors from dim_book_star into dim_