DB4AI-Query:模型训练和推断
openGauss当前版本支持了原生DB4AI能力,通过引入原生AI算子,简化操作流程,充分利用数据库优化器、执行器的优化与执行能力,获得高性能的数据库内模型训练能力。更简化的模型训练与预测流程、更高的性能表现,让开发者在更短时间内能更专注于模型的调优与数据分析上,而避免了碎片化的技术栈与冗余的代码实现。
关键字解析
表 1 DB4AI语法及关键字
使用指导
-
本版本支持的算法概述。
当前版本的DB4AI支持基于SGD算子的逻辑回归(目前支持二分类任务)、线性回归和支持向量机算法(分类任务),以及基于K-Means算子的Kmeans聚类算法。
-
模型训练语法说明。
-
CREATE MODEL
使用“CREATE MODEL”语句可以进行模型的创建和训练。模型训练SQL语句,现有一个数据集为kmeans_2d,该表的数据内容如下:
openGauss=# select * from kmeans_2d; id | position ----+------------------------------------- 1 | {74.5268815685995,88.2141939294524} 2 | {70.9565760521218,98.8114827475511} 3 | {76.2756086327136,23.8387574302033} 4 | {17.8495847294107,81.8449544720352} 5 | {81.2175785354339,57.1677675866522} 6 | {53.97752255667,49.3158342130482} 7 | {93.2475341879763,86.934042100329} 8 | {72.7659293473698,19.7020415100269} 9 | {16.5800288529135,75.7475957670249} 10 | {81.8520747194998,40.3476078575477} 11 | {76.796671198681,86.3827232690528} 12 | {59.9231450678781,90.9907738864422} 13 | {70.161884885747,19.7427458665334} 14 | {11.1269539105706,70.9988166182302} 15 | {80.5005071521737,65.2822235273197} 16 | {54.7030725912191,52.151339428965} 17 | {103.059707058128,80.8419883321039} 18 | {85.3574452036992,14.9910179991275} 19 | {28.6501615960151,76.6922890325077} 20 | {69.7285806713626,49.5416352967732} (20 rows)
该表的字段position的数据类型为 double precision[].
-
以Kmeans为例,训练一个模型。从kmeans_2d训练集中指定position为特征列,使用kmeans算法,创建并保存模型point_kmeans。
openGauss=# CREATE MODEL point_kmeans USING kmeans FEATURES position FROM kmeans_2d WITH num_centroids=3; NOTICE: Hyperparameter max_iterations takes value DEFAULT (10) NOTICE: Hyperparameter num_centroids takes value 3 NOTICE: Hyperparameter tolerance takes value DEFAULT (0.000010) NOTICE: Hyperparameter batch_size takes value DEFAULT (10) NOTICE: Hyperparameter num_features takes value DEFAULT (2) NOTICE: Hyperparameter distance_function takes value DEFAULT (L2_Squared) NOTICE: Hyperparameter seeding_function takes value DEFAULT (Random++) NOTICE: Hyperparameter verbose takes value DEFAULT (0) NOTICE: Hyperparameter seed takes value DEFAULT (0) MODEL CREATED. PROCESSED 1
上述命令中:
-