PyCuda学习笔记之KNN加速

最新推荐文章于 2025-05-20 23:51:55 发布

星空彡

最新推荐文章于 2025-05-20 23:51:55 发布

阅读量3.5k

点赞数 3

CC 4.0 BY-SA版权

分类专栏： PyCuda 文章标签： PyCuda

本文链接：https://blog.youkuaiyun.com/jsmok_xingkong/article/details/98943925

KNN 算法是看B站的视频:https://www.bilibili.com/video/av52220223?t=1413

前几天简单的学习了PyCuda, 所以就想应用一下, 然后就选了kNN作为加速对象, 其中也有一些坑, 所以就总结一下.

kNN算法

KNN算法是机器学习中一个非常简单的算法,它是一个分类算法,也叫k-近邻算法.大致意思就是查找所求点的周围最近的k个点, 看看哪一类在这k个点中占比最多, 那么所求点就属于最多的那一类.

代码也是上述视频的代码, 总共分两部分一部分是生成数据并调用kNN测试, 一部分是kNN算法

train.py 是生成数据并测试的, 源码如下

import numpy as np
import matplotlib.pyplot as plt
from knn import *


# data generation
np.random.seed(314)
data_size_1 = 300
x1_1 = np.random.normal(loc=5.0, scale=1.0, size=data_size_1)
x2_1 = np.random.normal(loc=4.0, scale=1.0, size=data_size_1)
y_1 = [0 for _ in range(data_size_1)]

data_size_2 = 400
x1_2 = np.random.normal(loc=10.0, scale=2.0, size=data_size_2)
x2_2 = np.random.normal(loc=8.0, scale=2.0, size=data_size_2)
y_2 = [1 for _ in range(data_size_2)]

x1 = np.concatenate((x1_1, x1_2), axis=0)
x2 = np.concatenate((x2_1, x2_2), axis=0)
x = np.hstack((x1.reshape(-1,1), x2.reshape(-1,1)))
y = np.concatenate((y_1, y_2), axis=0)

data_size_all = data_size_1+data_size_2
shuffled_index = np.random.permutation(data_size_all)
x = x[shuffled_index]
y = y[shuffled_index]

split_index = int(data_size_all*0.7)
x_train = x[:split_index]
y_train = y[:split_index]
x_test = x[split_index:]
y_test = y[split_index:]

# visualize data
plt.scatter(x_train[:,0], x_train[:,1], c=y_train, marker='.')
plt.show()
plt.scatter(x_test[:,0], x_test[:,1], c=y_test, marker='.')
plt.show()

# data preprocessing
x_train = (x_train - np.min(x_train, axis=0)) / (np.max(x_train, axis=0) - np