在Python中从API数据自动创建数据库模式

This is a solution to a problem I encountered in one of my projects: how to store incoming API data in an SQLite database without having to write schema manually every time a source changes.

Imagine we have user data as follows:

在这里插入图片描述
For this tutorial, we use dummy data from dummyjson.com.

This data is a nested structure and has 30 first level columns and a number of second and even third level columns. How do we store this data in an SQL database without manually writing the schema, which is tedious and error-prone?

There are 3 approaches to schema generation:

  1. keep the first level and store other levels as JSON. Many databases actually allow storing JSON and query it like a regular column:
INSERT INTO products (name, details)
VALUES 
    ('iPhone 13', '{"category": "Electronics", "price": 999, "colors": ["Black", "Blue", "White"]}'),
    ('Samsung Galaxy S21', '{"category": "Electronics", "price": 899, "colors": ["Phantom Black", "Phantom Silver"]}'),
    ('Nike Air Force 1', '{"category": "Shoes", "price": 100, "colors": ["White", "Black"]}'),
    ('Adidas Ultraboost', '{"category": "Shoes", "price": 180, "colors": ["Core Black", "Cloud White"]}'),
    ('MacBook Pro', '{"category": "Electronics", "price": 1299, "colors": ["Silver", "Space Gray"]}'),
    ('Amazon Kindle', '{"category": "Electronics", "price": 79, "colors": ["Black"]}'),
    ('Sony PlayStation 5', '{"category": "Electronics", "price": 499, "colors": ["White"]}'),
    ('Cuisinart Coffee Maker', '{"category": "Home & Kitchen", "price": 99, "colors": ["Stainless Steel", "Black"]}'),
    ('Dyson V11 Vacuum Cleaner', '{"category": "Home & Kitchen", "price": 599, "colors": ["Iron", "Nickel"]}');
SELECT
  name,
  json_extract (details, '$.price') AS price
FROM
  products;

Examples are taken from https://www.sqlitetutorial.net/sqlite-json/

  1. Flatten the data, so that every second and third level nested column becomes the first level column:
{
   
	"bank":
		{
   
			'cardExpire': '03/26',
			'cardNumber': '9289760655481815',
			'cardType': 'Elo',
			'currency': 'CNY'
		}
}
{
   'bank_cardExpire': '03/26',
'bank_cardNumber': '9289760655481815',
'bank_cardType': 'Elo',
'bank_currency': 'CNY'}
  1. The third way is to separate nested levels into individual tables and connect them with foreign keys. However, this approach is much more complex and doesn’t apply to every form of data. For example, it doesn’t make sense to store hair in a separate table like in the dummy data above.

在这里插入图片描述
Given that flat data is preferred for data analysis, we will use the second approach - flatten the structure so that second and third nested columns become first level columns.

Project imports

import pprint
import sqlite3
from collections import defaultdict
from collections.abc import MutableMapping

import requests  # install with `pip install --upgrade requests`

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值