关于python无效值存在时np.nanmean和mean的区别

本文介绍如何使用Numpy处理数组中的无效值,通过masked和nan填充两种方式实现统计量计算,比较不同函数的应用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

用了好多次了,记一下

如果要计算一个数组中存在无效值的统计量如mean,numpy提供了nanmean对数据中出现np.nan进行处理。但有时使用numpy.ma.masked_where将该值masked了,这时候mean和nanmean都可以得到相同的想要的结果,下面是代码

a=numpy.array([0.0,1,5,999])
b=numpy.ma.masked_equal(a,999)
c=b.filled(numpy.nan)

numpy.mean(a) #(0+1+5+999)/4
251.25

numpy.mean(b) #(0+1+5)/3
numpy.nanmean(b) #(0+1+5)/3 
2.0

numpy.mean(c) #参与了np.nan所以没有忽略nan的话计算出现错误
nan

numpy.nanmean(c) #(0+1+5)/3 
2.0

可以看到,只有使用np.nan填充了,计算统计量时需要考虑numpy.nan函数,如果使用mask的话,则可以随意使用两个函数,结果都是对的

import numpy as np import statsmodels.tsa.stattools as sts import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import statsmodels.api as sm # X = np.random.randn(1000) # Y = np.random.randn(1000) # plt.scatter(X,Y) # plt.show() data = pd.DataFrame(pd.read_excel(r'C:\Users\ivanss\Desktop\groud.xlsx')) # X = np.array(data[['Water heat']]) # Y = np.array(data[['pH']]) import numpy as np from scipy.stats import pearsonr # 输入数组 x = np.array(data[['Water heat']]) y = np.array(data[['pH']]) x = x.squeeze() y = y.squeeze() print(x.shape, y.shape) # 检测无效 mask = ~(np.isnan(x) | np.isinf(x) | np.isnan(y) | np.isinf(y)) x_clean = x[mask] y_clean = y[mask] x_filled = np.where(np.isnan(x), np.nanmean(x), x) y_filled = np.where(np.isnan(y), np.nanmean(y), y) x = np.log(x + 1e-10) y = np.log(y + 1e-10) # 检查长度一致性 if len(x_clean) == len(y_clean) and len(x_clean) > 0: corr, p_value = pearsonr(x_clean, y_clean) else: print("数据清洗后无效或长度不一致") from scipy.stats import pearsonr r,pvalue = pearsonr(x,y) print(pvalue) r,pvalue = pearsonr(x,y) Traceback (most recent call last): File "C:\Users\ivanss\PycharmProjects\learnPytorch\ALA.py", line 36, in <module> r,pvalue = pearsonr(x,y) ^^^^^^^^^^^^^ File "C:\Anaconda3\Lib\site-packages\scipy\stats\_stats_py.py", line 4793, in pearsonr normxm = linalg.norm(xm) ^^^^^^^^^^^^^^^ File "C:\Anaconda3\Lib\site-packages\scipy\linalg\_misc.py", line 146, in norm a = np.asarray_chkfinite(a) ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Anaconda3\Lib\site-packages\numpy\lib\function_base.py", line 630, in asarray_chkfinite raise ValueError( ValueError: array must not contain infs or NaNs
03-08
import os import rasterio import numpy as np from scipy.stats import pearsonr from tqdm import tqdm # ---------------------------- 通用函数定义 ---------------------------- def preprocess_data(data): """数据预处理,将无效替换为NaN""" invalid_value = -3.4028230607370965e+38 data[data == invalid_value] = np.nan return data def z_score_normalize(data): """Z-score标准化(含零标准差处理)""" mean = np.nanmean(data) std = np.nanstd(data) # 处理无效标准差 if std == 0 or np.isnan(std): normalized = np.zeros_like(data) else: normalized = (data - mean) / std # 保留原始NaN位置 normalized[np.isnan(data)] = np.nan return normalized def local_pearson(raster1, raster2, window_size=5): """计算局部Pearson相关系数(含窗口有效性检查)""" rows, cols = raster1.shape corr_matrix = np.full((rows, cols), np.nan) p_matrix = np.full((rows, cols), np.nan) half_win = window_size // 2 # 镜像填充处理边缘 raster1_padded = np.pad(raster1, half_win, mode='reflect') raster2_padded = np.pad(raster2, half_win, mode='reflect') for i in tqdm(range(rows), desc="计算进度"): for j in range(cols): # 提取窗口数据 win1 = raster1_padded[i:i + window_size, j:j + window_size].flatten() win2 = raster2_padded[i:i + window_size, j:j + window_size].flatten() # 剔除无效 mask = ~np.isnan(win1) & ~np.isnan(win2) valid_win1 = win1[mask] valid_win2 = win2[mask] # 打印窗口数据信息 valid_count = len(valid_win1) std_win1 = np.std(valid_win1) if valid_count > 0 else np.nan std_win2 = np.std(valid_win2) if valid_count > 0 else np.nan print(f"窗口 ({i}, {j}): 有效数据点数量: {valid_count}, 标准差1: {std_win1}, 标准差2: {std_win2}") # 严格有效性检查 if len(valid_win1) < 3 or len(valid_win2) < 3: continue if np.std(valid_win1) == 0 or np.std(valid_win2) == 0: continue # 跳过零标准差窗口 # 计算相关系数p
03-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值