This is a solution to a problem I encountered in one of my projects: how to store incoming API data in an SQLite database without having to write schema manually every time a source changes.
Imagine we have user data as follows:
For this tutorial, we use dummy data from dummyjson.com.
This data is a nested structure and has 30 first level columns and a number of second and even third level columns. How do we store this data in an SQL database without manually writing the schema, which is tedious and error-prone?
There are 3 approaches to schema generation:
- keep the first level and store other levels as JSON. Many databases actually allow storing JSON and query it like a regular column:
INSERT INTO products (name, details)
VALUES
('iPhone 13', '{"category": "Electronics", "price": 999, "colors": ["Black", "Blue", "White"]}'),
('Samsung Galaxy S21', '{"category": "Electronics", "price": 899, "colors": ["Phantom Black", "Phantom Silver"]}'),
('Nike Air Force 1', '{"category": "Shoes", "price": 100, "colors": ["White", "Black"]}'),
('Adidas Ultraboost', '{"category": "Shoes", "price": 180, "colors": ["Core Black", "Cloud White"]}'),
('MacBook Pro', '{"category": "Electronics", "price": 1299, "colors": ["Silver", "Space Gray"]}'),
('Amazon Kindle', '{"category": "Electronics", "price": 79, "colors": ["Black"]}'),
('Sony PlayStation 5', '{"category": "Electronics", "price": 499, "colors": ["White"]}'),
('Cuisinart Coffee Maker', '{"category": "Home & Kitchen", "price": 99, "colors": ["Stainless Steel", "Black"]}'),
('Dyson V11 Vacuum Cleaner', '{"category": "Home & Kitchen", "price": 599, "colors": ["Iron", "Nickel"]}');
SELECT
name,
json_extract (details, '$.price') AS price
FROM
products;
Examples are taken from https://www.sqlitetutorial.net/sqlite-json/
- Flatten the data, so that every second and third level nested column becomes the first level column:
{
"bank":
{
'cardExpire': '03/26',
'cardNumber': '9289760655481815',
'cardType': 'Elo',
'currency': 'CNY'
}
}
{
'bank_cardExpire': '03/26',
'bank_cardNumber': '9289760655481815',
'bank_cardType': 'Elo',
'bank_currency': 'CNY'}
- The third way is to separate nested levels into individual tables and connect them with foreign keys. However, this approach is much more complex and doesn’t apply to every form of data. For example, it doesn’t make sense to store
hair
in a separate table like in the dummy data above.
Given that flat data is preferred for data analysis, we will use the second approach - flatten the structure so that second and third nested columns become first level columns.
Project imports
import pprint
import sqlite3
from collections import defaultdict
from collections.abc import MutableMapping
import requests # install with `pip install --upgrade requests`