np.mean(y_test == y_predict)的简要说明

本文探讨了Python中数组的比较操作,展示了如何使用numpy库进行数组元素级别的比较,并计算比较结果的平均值,以此衡量两个数组的相似度。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

# 例 1
x = [1, 0, 1, 1, 1, 1]
y = [0, 0, 0, 0, 0, 1]
print(x == y)
结果:
False
# 例 2
x = np.array([1, 0, 1, 1, 1, 1])
y = np.array([0, 0, 0, 0, 0, 1])
print(x == y)
结果:
[False True False False False True]
# 例 3
x = np.array([1, 0, 1, 1, 1, 1])
y = np.array([0, 0, 0, 0, 0, 1])
print("{:.2f}".format(np.mean(x == y)))
结果:
0.33

说明:
1、 x == y表示两个数组中的值相同时,输出True;否则输出False
2、例3对例2中结果取平均值,其中True=1,False=0;
193样本 正负比2比三 特征有53个 ,标签是乳腺癌患者her2是否表达大部分特征是心率变异的值,比如S1_Mean RR (ms) S1_SDNN (ms) S1_Mean HR (bpm) S1_SD HR (bpm) S1_Min HR (bpm) S1_Max HR (bpm)S1_RP_Lmean (beats) S1_RP_Lmax (beats) S1_RP_REC (%) S1_RP_DET (%) S1_RP_ShanEn S1_MSE_1 S1_MSE_2 S1_MSE_3 S1_MSE_4 S1_MSE_5 等,还有一些生理指标如年龄和bmi下面是我的数据操作和模型代码。写一个论文形式的模型搭建内容(包括使用了什么,为什么这么使用 对比其他这个方法好在哪里,以文本形式输出你的回答)data = pd.read_excel('C:/lydata/test4.xlsx') X = data.drop('HER2_G', axis=1) y = data['HER2_G'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=60) kf = KFold(n_splits=5, shuffle=True, random_state=91) accuracy_scores = [] precision_scores = [] recall_scores = [] f1_scores = [] auc_scores = [] total_confusion_matrix = np.zeros((len(np.unique(y_train)), len(np.unique(y_train))), dtype=int) pca = PCA(n_components=14) pipeline = Pipeline([ ('scaler', StandardScaler()), ('smote', SMOTE(k_neighbors=4, sampling_strategy=0.94, random_state=42)), ('pca', pca), ('gb', GradientBoostingClassifier( learning_rate=0.02, n_estimators=90, subsample=0.75, min_samples_split=5, min_samples_leaf=1, max_depth=6, random_state=42, warm_start=True, tol=0.0001, ccp_alpha=0, max_features=12, )) ]) for train_index, val_index in kf.split(X_train): X_train_fold, X_val = X_train.iloc[train_index], X_train.iloc[val_index] y_train_fold, y_val = y_train.iloc[train_index], y_train.iloc[val_index] pipeline.fit(X_train_fold, y_train_fold) y_pred = pipeline.predict(X_val) y_proba = pipeline.predict_proba(X_val)[:, 1] accuracy_scores.append(accuracy_score(y_val, y_pred)) precision_scores.append(precision_score(y_val, y_pred)) recall_scores.append(recall_score(y_val, y_pred)) f1_scores.append(f1_score(y_val, y_pred)) auc_scores.append(roc_auc_score(y_val, y_proba)) cm = confusion_matrix(y_val, y_pred) total_confusion_matrix += cm accuracy = np.mean(accuracy_scores) precision = np.mean(precision_scores) recall = np.mean(recall_scores) f1 = np.mean(f1_scores) auc = np.mean(auc_scores) print("Gradient Boosting 参数:") print(pipeline.named_steps['gb'].get_params()) print(f"Gradient Boosting 平均 accuracy: {accuracy:.2f}") print(f"Gradient Boosting 平均 precision: {precision:.2f}") print(f"Gradient Boosting 平均 recall: {recall:.2f}") print(f"Gradient Boosting 平均 F1 score: {f1:.2f}") print(f"Gradient Boosting 平均 AUC score: {auc:.2f}") print("综合混淆矩阵:") print(total_confusion_matrix) pipeline.fit(X_train, y_train) y_test_pred = pipeline.predict(X_test) y_test_proba = pipeline.predict_proba(X_test)[:, 1] accuracy_test = accuracy_score(y_test, y_test_pred) precision_test = precision_score(y_test, y_test_pred) recall_test = recall_score(y_test, y_test_pred) f1_test = f1_score(y_test, y_test_pred) auc_test = roc_auc_score(y_test, y_test_proba) print(f"测试集 accuracy: {accuracy_test:.2f}") print(f"测试集 precision: {precision_test:.2f}") print(f"测试集 recall: {recall_test:.2f}") print(f"测试集 F1 score: {f1_test:.2f}") print(f"测试集 AUC score: {auc_test:.2f}")
03-27
请基于下面的框架,写一段代码:数据准备阶段 首先需要收集与剪切力相关的特征变量以及对应的标签(即实际测量到的剪切力)。这些数据可以来源于实验记录或者仿真模拟结果。 Python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # 假设我们有一个CSV文件包含了输入参数和输出剪切力的数据集 data = pd.read_csv('shear_force_data.csv') X = data.drop(columns=['ShearForce']) # 特征列 y = data['ShearForce'] # 目标值 # 将数据划分为训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) 构建机器学习模型 这里选用随机森林作为基础模型之一,因为它具有较强的泛化能力和鲁棒性,在处理复杂关系方面表现良好2。 Python from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error model = RandomForestRegressor(n_estimators=100, random_state=42) model.fit(X_train_scaled, y_train) predictions = model.predict(X_test_scaled) mse = mean_squared_error(y_test, predictions) print(f'Mean Squared Error: {mse}') 上述代码展示了完整的流程:加载数据、预处理、划分数据集、标准化数值范围、定义模型架构及其超参数设置最后评估性能指标均方误差(MSE)3。
03-14
# 1.导入必要库(数据预处理、模型训练等) import pandas as pd # 用于数据处理和分析 import numpy as np # 用于数值计算 from sklearn.model_selection import train_test_split # 用于数据集划分 from sklearn.preprocessing import StandardScaler # 用于特征缩放 from sklearn.ensemble import RandomForestRegressor # 随机森林回归模型 from sklearn.metrics import mean_squared_error # 用于评估模型性能 # 2.数据加载与初步处理 # 读取CSV文件(假设目标列为'price') df = pd.read_csv('housing.csv') # 根据实际文件路径修改 # 3.数据预处理 # 处理缺失值(均值填充) df.fillna(df.mean(), inplace=True) # 用每列的均值填充缺失值 # 处理分类变量(示例列名为'city') df = pd.get_dummies(df, columns=['city']) # 将分类变量转换为哑变量(one-hot编码) # 分离特征和目标变量 X = df.drop('price', axis=1) # 假设目标列是price,X为特征矩阵 y = df['price'] # y为目标变量 # 4.数据集划分 X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, # 保留20%作为测试集 random_state=42 # 设置随机种子以确保结果可重复 ) # 5.特征缩放 scaler = StandardScaler() # 初始化标准化器 X_train_scaled = scaler.fit_transform(X_train) # 对训练集进行拟合并转换 X_test_scaled = scaler.transform(X_test) # 对测试集仅进行转换(使用训练集的参数) # 6.模型训练 model = RandomForestRegressor( n_estimators=100, # 设置决策树的数量为100 random_state=42 # 设置随机种子以确保结果可重复 ) model.fit(X_train_scaled, y_train) # 使用训练集训练模型 # 7.预测与评估 y_pred = model.predict(X_test_scaled) # 使用测试集进行预测 mse = mean_squared_error(y_test, y_pred) # 计算均方误差(MSE) print(f'模型MSE: {mse:.2f}') # 输出模型的均方误差,保留两位小数
03-15
import numpy as np class NeuralNetwork: def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1): # Initialize network structure self.input_size = input_size self.hidden_size = hidden_size self.output_size = output_size self.learning_rate = learning_rate # Initialize weights with random values, using normal distribution self.weights1 = np.random.randn(self.input_size, self.hidden_size) self.weights2 = np.random.randn(self.hidden_size, self.output_size) # Initialize biases as zeros self.bias1 = np.zeros((1, self.hidden_size)) self.bias2 = np.zeros((1, self.output_size)) def forward(self, X): """ Perform forward pass through the network """ self.z1 = np.dot(X, self.weights1) + self.bias1 self.a1 = self.sigmoid(self.z1) # Activation function for hidden layer self.z2 = np.dot(self.a1, self.weights2) + self.bias2 self.a2 = self.sigmoid(self.z2) # Output layer activation return self.a2 def backward(self, X, y, output): """ Perform backward pass to compute gradients and update weights """ # Compute error error = output - y # Compute deltas for output and hidden layers delta2 = error * self.sigmoid_derivative(self.z2) delta1 = np.dot(delta2, self.weights2.T) * self.sigmoid_derivative(self.z1) # Update weights and biases using gradient descent self.weights2 -= self.learning_rate * np.dot(self.a1.T, delta2) self.weights1 -= self.learning_rate * np.dot(X.T, delta1) self.bias2 -= self.learning_rate * np.sum(delta2, axis=0, keepdims=True) self.bias1 -= self.learning_rate * np.sum(delta1, axis=0, keepdims=True) @staticmethod def sigmoid(x): """ Sigmoid activation function """ return 1 / (1 + np.exp(-x)) @staticmethod def sigmoid_derivative(x): """ Derivative of the sigmoid function """ sig = NeuralNetwork.sigmoid(x) return sig * (1 - sig) def train(self, X, y, epochs): """ Training the network with specified epochs """ for epoch in range(epochs + 1): output = self.forward(X) self.backward(X, y, output) # Print loss every 1000 iterations if epoch % 1000 == 0: loss = np.mean((output - y) ** 2) # Mean Squared Error (MSE) print(f"Epoch: {epoch}, Loss: {loss:.6f}") def predict(self, X): """ Predict output for given input X """ return self.forward(X) def main(): # XOR training data X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) y = np.array([[0], [1], [1], [0]]) # Hyperparameters learning_rate = 0.1 epochs = 10000 # Initialize the neural network nn = NeuralNetwork(input_size=2, hidden_size=3, output_size=1, learning_rate=learning_rate) # Train the network nn.train(X, y, epochs) # Make predictions after training predictions = nn.predict(X) print("Final predictions:") print(predictions) if __name__ == '__main__': main() import numpy as np class NeuralNetwork: def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1): # Initialize network structure self.input_size = input_size self.hidden_size = hidden_size self.output_size = output_size self.learning_rate = learning_rate # Initialize weights with random values, using normal distribution self.weights1 = np.random.randn(self.input_size, self.hidden_size) self.weights2 = np.random.randn(self.hidden_size, self.output_size) # Initialize biases as zeros self.bias1 = np.zeros((1, self.hidden_size)) self.bias2 = np.zeros((1, self.output_size)) def forward(self, X): """ Perform forward pass through the network """ self.z1 = np.dot(X, self.weights1) + self.bias1 self.a1 = self.sigmoid(self.z1) # Activation function for hidden layer self.z2 = np.dot(self.a1, self.weights2) + self.bias2 self.a2 = self.sigmoid(self.z2) # Output layer activation return self.a2 def backward(self, X, y, output): """ Perform backward pass to compute gradients and update weights """ # Compute error error = output - y # Compute deltas for output and hidden layers delta2 = error * self.sigmoid_derivative(self.z2) delta1 = np.dot(delta2, self.weights2.T) * self.sigmoid_derivative(self.z1) # Update weights and biases using gradient descent self.weights2 -= self.learning_rate * np.dot(self.a1.T, delta2) self.weights1 -= self.learning_rate * np.dot(X.T, delta1) self.bias2 -= self.learning_rate * np.sum(delta2, axis=0, keepdims=True) self.bias1 -= self.learning_rate * np.sum(delta1, axis=0, keepdims=True) @staticmethod def sigmoid(x): """ Sigmoid activation function """ return 1 / (1 + np.exp(-x)) @staticmethod def sigmoid_derivative(x): """ Derivative of the sigmoid function """ sig = NeuralNetwork.sigmoid(x) return sig * (1 - sig) def train(self, X, y, epochs): """ Training the network with specified epochs """ for epoch in range(epochs + 1): output = self.forward(X) self.backward(X, y, output) # Print loss every 1000 iterations if epoch % 1000 == 0: loss = np.mean((output - y) ** 2) # Mean Squared Error (MSE) print(f"Epoch: {epoch}, Loss: {loss:.6f}") def predict(self, X): """ Predict output for given input X """ return self.forward(X) def main(): # XOR training data X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) y = np.array([[0], [1], [1], [0]]) # Hyperparameters learning_rate = 0.1 epochs = 10000 # Initialize the neural network nn = NeuralNetwork(input_size=2, hidden_size=3, output_size=1, learning_rate=learning_rate) # Train the network nn.train(X, y, epochs) # Make predictions after training predictions = nn.predict(X) print("Final predictions:") print(predictions) if __name__ == '__main__': main() 请根据下面的要求修改上述代码,改进完给出完整的代码,必要部分给出中文注释 1.请使用纯 Numpy 手动实现一个支持多层结构的前馈神经网络。要求包括: (1)支持至少两层隐藏层; (2)实现除 sigmoid 以外的激活函数(如 ReLU、tanh 等); (3)替换损失函数,使用如交叉熵(cross-entropy)代替均方误差(MSE); (4)网络结构、前向传播与反向传播均需手动实现。 训练后展示预测结果与损失变化曲线,并简要分析激活函数与损失函数对学习效果的影响
最新发布
06-23
import numpy as np from bs4 import BeautifulSoup import random def scrapePage(retX, retY, inFile, yr, numPce, origPrc): with open(inFile, encoding='utf-8') as f: html = f.read() soup = BeautifulSoup(html, 'html.parser') i = 1 currentRow = soup.find_all('table', r=str(i)) while len(currentRow) != 0: title = currentRow[0].find_all('a')[1].text.lower() newFlag = 1.0 if ('new' in title) or ('nisb' in title) else 0.0 soldUnicde = currentRow[0].find_all('td')[3].find_all('span') if len(soldUnicde) == 0: print(f"商品 #{i} 没有出售") else: soldPrice = currentRow[0].find_all('td')[4].text soldPrice = soldPrice.replace('$','').replace(',','').replace('Free shipping','') sellingPrice = float(soldPrice) if sellingPrice > origPrc * 0.5: retX.append([yr, numPce, newFlag, origPrc]) retY.append(sellingPrice) i += 1 currentRow = soup.find_all('table', r=str(i)) def ridgeRegres(xMat, yMat, lam=0.2): xTx = xMat.T * xMat denom = xTx + np.eye(xMat.shape[1]) * lam if np.linalg.det(denom) == 0.0: print("矩阵为奇异矩阵,不能转置") return return denom.I * (xMat.T * yMat) def setDataCollect(retX, retY): scrapePage(retX, retY, './lego8288.html', 2006, 800, 49.99) scrapePage(retX, retY, './lego10030.html', 2002, 3096, 269.99) scrapePage(retX, retY, './lego10179.html', 2007, 5195, 499.99) scrapePage(retX, retY, './lego10181.html', 2007, 3428, 199.99) scrapePage(retX, retY, './lego10189.html', 2008, 5922, 299.99) scrapePage(retX, retY, './lego10196.html', 2009, 3263, 249.99) def regularize(xMat, yMat): inxMat = xMat.copy() inyMat = yMat - np.mean(yMat, 0) inMeans = np.mean(inxMat, 0) inVar = np.var(inxMat, 0) return (inxMat - inMeans)/inVar, inyMat def rssError(yArr, yHatArr): return ((yArr - yHatArr)**2).sum() def standRegres(xArr, yArr): xMat = np.mat(xArr) yMat = np.mat(yArr).T xTx = xMat.T * xMat if np.linalg.det(xTx) == 0.0: print("矩阵为奇异矩阵,不能转置") return return xTx.I * (xMat.T * yMat) def crossValidation(xArr, yArr, numVal=10): m = len(yArr) indexList = list(range(m)) errorMat = np.zeros((numVal, 30)) for i in range(numVal): trainX, trainY, testX, testY = [], [], [], [] random.shuffle(indexList) for j in range(m): if j < m*0.9: trainX.append(xArr[indexList[j]]) trainY.append(yArr[indexList[j]]) else: testX.append(xArr[indexList[j]]) testY.append(yArr[indexList[j]]) wMat = ridgeTest(trainX, trainY) for k in range(30): matTestX = np.mat(testX) matTrainX = np.mat(trainX) meanTrain = np.mean(matTrainX, 0) varTrain = np.var(matTrainX, 0) matTestX = (matTestX - meanTrain) / varTrain yEst = matTestX * np.mat(wMat[k]).T + np.mean(trainY) errorMat[i,k] = rssError(yEst.T.A, np.array(testY)) # 关键补充代码 meanErrors = np.mean(errorMat, axis=0) bestIndex = np.argmin(meanErrors) bestWeights = wMat[bestIndex] # 输出最终模型参数 xMat = np.mat(xArr) yMat = np.mat(yArr).T meanX = np.mean(xMat, 0) varX = np.var(xMat, 0) unReg = bestWeights / varX print('最佳模型参数:') print('%.2f%+.2f*年份%+.2f*部件数%+.2f*全新%+.2f*原价' % ( (-1 * np.sum(np.multiply(meanX, unReg)) + np.mean(yMat))[0,0], unReg[0,0], unReg[0,1], unReg[0,2], unReg[0,3] )) return bestWeights def ridgeTest(xArr, yArr): xMat = np.mat(xArr) yMat = np.mat(yArr).T yMean = np.mean(yMat, 0) yMat = yMat - yMean xMeans = np.mean(xMat, 0) xVar = np.var(xMat, 0) xMat = (xMat - xMeans) / xVar numTestPts = 30 wMat = np.zeros((numTestPts, xMat.shape[1])) for i in range(numTestPts): ws = ridgeRegres(xMat, yMat, np.exp(i-10)) wMat[i,:] = ws.T return wMat if __name__ == '__main__': lgX, lgY = [], [] setDataCollect(lgX, lgY) crossValidation(lgX, lgY)任务描述 本关任务:编写一个预测乐高玩具套装价格的程序。 相关知识 为了完成本关任务,你需要掌握:1.线性回归,2.局部加权线性回归,3.缩减系数法。 线性回归 比如,假如你想要预测一辆汽车的功率大小,可能会这么计算: HorsePower = 0.0015 * annualSalary - o.99* hoursListeningToPublic Radio 这就是所谓的回归方程(regression equation),其中的0.0015和-0.99称作回归系数(regression weights) ,求这些回归系数的过程就是回归。一旦有了这些回归系数,再给定输入,做预测就非常容易了。具体的做法是用回归系数乘以输人值,再将结果全部加在一起,就得到了预测值。 局部加权线性回归 在局部加权算法中 ,我们给待预测点附近的每个点赋予一定的权重;然后与前面的类似,在这个子集上基于最小均方差来进行普通的回归。与kNN一样,这种算法每次预测均需要事先选取出对应的数据子集。 该算法解出回归系数w的形式如下: 其中w是一个矩阵,用来给每个数据点赋予权重。 缩减系数法 缩减系数法包括岭回归和向前线性回归,其中: (1) 岭回归最先用来处理特征数多于样本数的情况,现在也用于在估计中加人偏差,从而得到更好的估计。这里通过引入1来限制了所有《之和,通过引人该惩罚项,能够减少不重要的参数,这个技术在统计学中也叫做缩减(shrinkage )。 (2) 前向逐步回归算法可以得到与lasso差不多的效果,但更加简单。它属于一种贪心算法,即每一步都尽可能减少误差。一开始,所有的权重都设为1,然后每一步所做的决策是对某个权重增加或减少一个很小的值。 该算法的伪代码如下所示: 数据标准化,使其分布满足0均值和单位方差 在每轮迭代过程中: 设置当前最小误差lowestError为正无穷 对每个特征: 增大或缩小: 改变一个系数得到一个新的w 计算新w下的误差 如果误差Error小于当前最小误差lowestError:设置Wbest等于当前的w 将w设置为新的Wbest 编程要求 根据提示,在右侧编辑器补充代码,预测乐高玩具套装价格。
05-13
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值