csr_matrix参数解析

最新推荐文章于 2023-03-29 16:33:29 发布

转载最新推荐文章于 2023-03-29 16:33:29 发布 · 182 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://blog.youkuaiyun.com/leiting_imecas/article/details/52240622

文章标签：

#python

语法专栏收录该内容

50 篇文章

订阅专栏

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

（注：论文的行下标和列下标均从0开始）

data 表示元数据显然为1， 2， 3， 4， 5， 6

shape 表示矩阵的形状为 3 * 3

indices 表示各个数据在各行的下标，从该数据我们可以知道：数据1在某行的0位置处，数据2在某行的2位置处，6在某行的2位置处。

而各个数据在哪一行就要通过indptr参数得到的

indptr 表示每行数据的个数：[0 2 3 6]表示从第0行开始数据的个数，0表示默认起始点，0之后有几个数字就表示有几行，第一个数字2表示第一行有2 - 0 = 2个数字，因而数字1，2都第0行，第二行有3 - 2 = 1个数字，因而数字3在第1行，以此类推，我们能够知道所有数字的行号

Example: 数字6 ，indptr推出在第2行，indices推出在第2列。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

devil_son1234

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

csr_matrix(Compressed Sparse Row matrix)存储模式浅析

我真的是坑额的博客

08-14

4660

压缩稀疏矩阵的某种存储方式(开始不是很懂，后来发现网上解释的也不是很清楚，故来解释一发。以下为官方的例子) 此处官方文档介绍传送门 >>> indptr = np.array([0, 2, 3, 6]) >>> indices = np.array([0, 2, 2, 0, 1, 2]) >>> data = np.array([1, 2, 3, 4, 5, 6]) >>> csr_m

CUDA学习（十四） cuSolver学习中的 compressed sparse row format matrix(行存储的压缩矩阵 CSR)

zhouzhouasishuijiao的博客

12-11

1064

http://www.netlib.org/utk/people/JackDongarra/etemplates/node373.html 主要参考这里面的内容现有一个矩阵观察该矩阵可以发现，该矩阵有很多0，压缩的方式就是去掉这些0元素，所用的方法就是将这个矩阵的存储换为3个数组进行存储 val 10 -2 3 9 3 7 8 ...

参与评论您还未登录，请先登录后发表或查看评论

稀疏矩阵(Sparse Matrix)

最新发布

11-11

graph = csr_matrix(adj_matrix) # 使用连通分量算法聚类（无向图） n_components, labels = connected_components( csgraph=graph, directed=False, # 无向图 return_labels=True # 返回聚类标签 ) ...

go_p = dgl.bipartite(sparse.csr_matrix(Go_protein), 'GO', 'gop', 'protein')，假设Go_protein值为小数的矩阵，那么go_p的结果为什么？

04-18

关于DGL中`dgl.bipartite`函数使用浮点矩阵输入时的行为，其核心机制是将非零元素解析为边，并将数值存储为边特征。具体分析如下： ### 1. 边创建规则 - **非零即边**：输入矩阵中所有非零元素（包括正数、负数和...

from scipy.spatial import distance_matrix # 提取质心坐标列表 centroids = list(gdf1['centroid'].apply(lambda p: [p.x, p.y])) # 计算欧氏距离矩阵 dist_matrix = distance_matrix(centroids, centroids) print(dist_matrix)import numpy as np w = 1 / dist_matrix np.fill_diagonal(w, 0) # 对角线设为0报错：C:\Users\25636\AppData\Local\Temp\ipykernel_4832\3996330740.py:2: RuntimeWarning: divide by zero encountered in divide w = 1 / dist_matrix

03-22

from scipy.sparse import csr_matrix w_sparse = csr_matrix(w) ``` --- ### **五、常见问题处理** | 问题现象 | 解决方案 | 数学依据 | |------------------------|-----------------------------------|----...

import pandas as pd import numpy as np from sklearn.metrics.pairwise import cosine_similarity from sklearn.metrics import mean_squared_error # ================== 1. 数据加载与预处理 ================== def load_data(train_path, test_path=None, columns=['userId', 'movieId', 'rating']): """加载训练集和测试集，构建评分矩阵""" # 加载训练集 train_data = pd.read_csv(train_path, sep='\t', header=None, names=columns, dtype={'userId': int, 'movieId': int, 'rating': float}) # 构建训练评分矩阵 user_ids = train_data['userId'].unique() movie_ids = train_data['movieId'].unique() train_matrix = pd.DataFrame(0, index=user_ids, columns=movie_ids) for _, row in train_data.iterrows(): train_matrix.at[row['userId'], row['movieId']] = row['rating'] # 加载测试集（如果有） test_data = None if test_path: test_data = pd.read_csv(test_path, sep='\t', header=None, names=columns, dtype={'userId': int, 'movieId': int, 'rating': float}) return train_matrix, test_data # ================== 2. 相似度计算 ================== def calculate_similarity(matrix, mode='user', method='cosine'): """计算用户或物品的相似度矩阵""" if mode == 'user': if method == 'cosine': sim_matrix = pd.DataFrame( cosine_similarity(matrix), index=matrix.index, columns=matrix.index # 用户ID作为行列索引 ) elif method == 'pearson': sim_matrix = matrix.T.corr(method='pearson') elif mode == 'item': if method == 'cosine': sim_matrix = pd.DataFrame( cosine_similarity(matrix.T), index=matrix.columns, columns=matrix.columns # 电影ID作为行列索引 ) elif method == 'pearson': sim_matrix = matrix.corr(method='pearson') return sim_matrix # ================== 3. 评分预测 ================== def user_based_predict(target_user, target_movie, train_matrix, user_sim, k=50): """基于用户的评分预测""" if train_matrix.loc[target_user, target_movie] != 0: return 0 # 已评分则跳过 sim_users = user_sim[target_user].sort_values(ascending=False)[1:k + 1] weighted_sum, sim_sum = 0.0, 0.0 target_mean = train_matrix.loc[target_user].mean() for user, similarity in sim_users.items(): if train_matrix.loc[user, target_movie] == 0: continue user_mean = train_matrix.loc[user].mean() weighted_sum += similarity * (train_matrix.loc[user, target_movie] - user_mean) sim_sum += abs(similarity) return target_mean + (weighted_sum / sim_sum) if sim_sum != 0 else 0 def item_based_predict(target_user, target_movie, train_matrix, item_sim, k=50): """基于物品的评分预测""" if train_matrix.loc[target_user, target_movie] != 0: return 0 # 已评分则跳过 sim_movies = item_sim[target_movie].sort_values(ascending=False)[1:k + 1] weighted_sum, sim_sum = 0.0, 0.0 for movie, similarity in sim_movies.items(): if train_matrix.loc[target_user, movie] == 0: continue weighted_sum += similarity * train_matrix.loc[target_user, movie] sim_sum += abs(similarity) return weighted_sum / sim_sum if sim_sum != 0 else 0 # ================== 4. 推荐与评估 ================== def generate_recommendations(target_user, train_matrix, sim_matrix, predict_func, top_n=10): """生成Top-N推荐""" unrated_movies = train_matrix.columns[train_matrix.loc[target_user] == 0] predictions = [] for movie in unrated_movies: pred = predict_func(target_user, movie, train_matrix, sim_matrix) predictions.append((movie, pred)) predictions.sort(key=lambda x: x[1], reverse=True) return predictions[:top_n] def evaluate_model(test_data, train_matrix, sim_matrix, predict_func): """计算测试集的RMSE""" predictions, actuals = [], [] for _, row in test_data.iterrows(): user, movie, true_rating = row['userId'], row['movieId'], row['rating'] if user not in train_matrix.index or movie not in train_matrix.columns: continue # 忽略训练集中不存在的用户或电影 pred_rating = predict_func(user, movie, train_matrix, sim_matrix) predictions.append(pred_rating) actuals.append(true_rating) return np.sqrt(mean_squared_error(actuals, predictions)) # ================== 主程序 ================== if __name__ == "__main__": # 数据路径配置 train_path = './train_ratings.csv' # 训练集路径 test_path = './test_ratings.csv' # 测试集路径（可选） # 加载数据 train_matrix, test_data = load_data(train_path, test_path, columns=['userId', 'movieId', 'rating']) # 选择目标用户（示例） target_user = 1 # ------------------ 基于用户的协同过滤 ------------------ print("计算用户相似度...") user_sim = calculate_similarity(train_matrix, mode='user', method='cosine') user_recommendations = generate_recommendations( target_user, train_matrix, user_sim, user_based_predict, top_n=10 ) print(f"\n用户 {target_user} 的基于用户推荐：") for movie, score in user_recommendations: print(f"电影 {movie} \t预测评分：{score:.3f}") # ------------------ 基于物品的协同过滤 ------------------ print("\n计算电影相似度...") item_sim = calculate_similarity(train_matrix, mode='item', method='cosine') item_recommendations = generate_recommendations( target_user, train_matrix, item_sim, item_based_predict, top_n=10 ) print(f"\n用户 {target_user} 的基于电影推荐：") for movie, score in item_recommendations: print(f"电影 {movie} \t预测评分：{score:.3f}") # ------------------ 测试集评估（如果提供测试集） ------------------ if test_data is not None: print("\n评估基于用户的协同过滤模型...") user_rmse = evaluate_model(test_data, train_matrix, user_sim, user_based_predict) print(f"用户协同过滤的RMSE：{user_rmse:.4f}") print("\n评估基于物品的协同过滤模型...") item_rmse = evaluate_model(test_data, train_matrix, item_sim, item_based_predict) print(f"物品协同过滤的RMSE：{item_rmse:.4f}")根据上述描述修改这份代码

05-22

通过设置 `skiprows=1` 参数，可以直接跳过 CSV 文件的第一行（通常是表头），从而避免将表头作为数据的一部分进行解析。 ```python import pandas as pd # 跳过第一行（即表头） data = pd.read_csv('ratings.csv'...

稀疏满矩阵转CSR格式

11-16

稀疏满矩阵转按行压缩存储（CSR）格式，matlab源代码，用于学习和借鉴。

稀疏矩阵Compressed Row Storage存储格式

大猪的专栏

03-08

1万+

稀疏矩阵(Sparse Matrix)由于有很多0，为了节省空间，一般压缩存储。通常只需要保存非零元素及其位置即可。下面介绍Compressed Row Storage(CRS)格式或者称为 Compressed sparse row(CSR)格式，由名称可见，该格式是把行的信息压缩存储了，只显式保留每行第一个非零元素的位置，具体在例子中可以看到。假设有稀疏矩阵A，我们需要

CSR（compressed sparse row matrix）

CV_RC的博客

10-01

2194

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html CSR（compressed sparse row matrix） import numpy as np from scipy.sparse import csr_matrix indptr = n

csr_matrix用法

weixin_30475039的博客

06-17

428

1 csr_matrix默认对未填充的位置置为0, row = [0, 0, 0, 1, 1, 1, 2, 2, 2] # 行指标 col = [0, 1, 2, 0, 1, 2, 0, 1, 2] # 列指标 data = [1, 0, 1, 0, 1, 1, 1, 1, 0] # 在行指标列指标下的数字 team = csr_matrix((data, (row, ...

CSR稀疏矩阵存储方式

AplusX

11-04

3240

矩阵可分为稠密矩阵和稀疏矩阵，对于稀疏矩阵而言，使用同样的内存来存储这个矩阵显然是对内存的浪费，那么我们就可以想办法将矩阵中所有的o元素挥着不相关元素剔除，怎么剔除，第一种方法是通过三个一维矩阵来存储原二维矩阵中的所有非0元素，三个矩阵分别为value、column、row, value 数组存储所有的非零元素， column 数组存储所有非零元素的列下标 row 数组存储所有的非零元素的行下...

csr_matrix(Compressed Sparse Row matrix)存储模式 ---稀疏数据的压缩

qq_26645205的博客

08-13

854

压缩稀疏矩阵的某种存储方式 >>> indptr = np.array([0, 2, 3, 6]) >>> indices = np.array([0, 2, 2, 0, 1, 2]) >>> data = np.array([1, 2, 3, 4, 5, 6]) >>> csr_matrix((data, indices,...

【常见函数学习】sparse.csr_matrix矩阵的压缩存储

jqq125的博客

07-06

337

sparse.csr_matrix矩阵的压缩存储

dict.setdefault dict.get() csr_matrix的理解

qq_36033058的博客

01-03

259

dict.get()函数： get() 函数返回指定键的值，即键对应的value，如果值不在字典中返回默认值。语法：dict.get(key, default=None)。其中key为要查找的键，如果key对应的值不存在，则返回default默认值。 dict.setdefault()函数：源码：D.setdefault(k[,d]) -> D.get(k,d), also se...