Line of wines

本文探讨了葡萄酒买卖问题,即如何通过动态规划方法在不同年份卖出葡萄酒以获得最大总利润。介绍了三种解决方案:底部向上动态规划、顶部向下动态规划及结合前缀和的底部向上动态规划,并给出了具体实现代码。

There are N wines in a row. Each year you sell either the leftmost or the rightmost wine. The i-th wine has initial price p[i] and price k * p[i] in the k-th year. What is the maximum possible total profit?

 

Solution 1. Bottom up dynamic programming

state:
dp[i][j]: the max possible profit of wines[i, j] at year i - 0 + N - 1 - j + 1 = N + i - j
dp[i][j] = max of {p[i] * (N + i - j) + dp[i + 1][j], p[j] * (N + i -j) + dp[i][j - 1]}

Initialization:
dp[i][i] = N * p[i]

Ans:
dp[0][N - 1]

 

Loop over i and j. In this case, the loop order matters. In order to compute dp[i][j], we need to have dp[i + 1][j] and dp[i][j - 1] first. This means we need to loop i in decreasing order from N - 1 to 0 and j in increasing order from i + 1 to N - 1.

    public int maxProfitBottomUp(int[] p) {
        int N = p.length;
        int[][] dp = new int[N][N];
        for(int i = 0; i < N; i++) {
            dp[i][i] = N * p[i];
        }
        for(int i = N - 1; i >= 0; i--) {
            for(int j = i + 1; j < N; j++) {
                dp[i][j] = Math.max(p[i] * (N + i - j) + dp[i + 1][j], p[j] * (N + i -j) + dp[i][j - 1]);
            }
        }
        return dp[0][N - 1];
    }

 

Loop over the interval length first, then loop over all possible left start.

    public int maxProfitBottomUp(int[] p) {
        int N = p.length;
        int[][] dp = new int[N][N];
        for(int i = 0; i < N; i++) {
            dp[i][i] = N * p[i];
        }
        for(int len = 2; len <= N; len++) {
            for(int l = 0; l <= N - len; l++) {
                int r = l + len - 1;
                int y = N + l - r;
                dp[l][r] = Math.max(p[l] * y + dp[l + 1][r], p[r] * y + dp[l][r - 1]);
            }
        }
        return dp[0][N - 1];
    }

 

 

Solution II. Top down dynamic programming

State:

dp[l][r]: the max possible profit before we go to interval [l, r].

dp[l][r] = Math.max(dp[l][r + 1] + p[r + 1] * (currY - 1), dp[l - 1][r] + p[l - 1] * (currY - 1)), currY = N + l - r.


Initialization:
dp[0][N - 1] = 0


Answer:
max of {dp[i][i] + p[i] * N}

 

    public int maxProfitTopDown(int[] p) {
        int N = p.length;
        int[][] dp = new int[N][N];
        dp[0][N - 1] = 0;

        for(int len = N - 1; len >= 1; len--) {
            for(int l = 0; l + len <= N; l++) {
                int r = l + len - 1;
                int currY = N + l - r;
                if(r < N - 1) {
                    dp[l][r] = Math.max(dp[l][r], dp[l][r + 1] + p[r + 1] * (currY - 1));
                }
                if(l > 0) {
                    dp[l][r] = Math.max(dp[l][r], dp[l - 1][r] + p[l - 1] * (currY - 1));
                }
            }
        }

        int res = 0;
        for(int i = 0; i < N; i++) {
            res = Math.max(res, dp[i][i] + p[i] * N);
        }
        return res;
    }

 

 

Solution III. Bottom up dynamic programming with prefix sum 

Key idea: if we assume every interval starts with year 1 and already have the max profit for interval[l, r - 1] and [l + 1, r], then to compute interval[l, r], we shift all the years for [l, r - 1] and [l + 1, r] by 1, then add either p[r] or p[l]. This is the prefixsum of interval[l, r].

 

dp[l][r]: the max possible profit of wines[l, r] starting from year 1.
dp[l][r] = Math.max(dp[l][r - 1], dp[l + 1][r]) + prefixSum[l, r]

init: dp[i][i] = 1 * p[i]

Ans: dp[0][N - 1]

    public int maxProfitPrefixSum(int[] p) {
        int N = p.length;
        int[][] dp = new int[N][N];
        for(int i = 0; i < N; i++) {
            dp[i][i] = p[i];
        }
        int[] prefixSum = new int[N];
        prefixSum[0] = p[0];
        for(int i = 1; i < N; i++) {
            prefixSum[i] = prefixSum[i - 1] + p[i];
        }
        for(int i = N - 1; i >= 0; i--) {
            for(int j = i + 1; j < N; j++) {
                dp[i][j] = Math.max(dp[i][j - 1], dp[i + 1][j]) + prefixSum[j] - (i > 0 ? prefixSum[i - 1] : 0);
            }
        }
        return dp[0][N - 1];
    }

 

转载于:https://www.cnblogs.com/lz87/p/11518228.html

题目1 Fish Market Data Set是一个关于鱼类特征和重量的数据集,包括了7种不同鱼类的属性数据和体重数据,共159个样本。该数据集可用于回归任务,以预测鱼的种类或体重。 数据集包含以下属性: Species: 鱼的种类 Weight: 鱼的重量,单位为克(g) Length1: 从头到尾的长度,单位为厘米(cm) Length2: 从头到腹部底部的长度,单位为厘米(cm) Length3: 从头到尾柄的长度,单位为厘米(cm) Height: 身体高度,单位为厘米(cm) Width: 身体宽度,单位为厘米(cm) 问题 请补全以下代码,定义随机森林回归模型和超参数候选列表,基于提供的数据集建立鱼重量预测模型,并使用网格搜索交叉验证寻找最佳超参数组合并打印,然后使用最佳超参数组合构建模型,用训练集训练后输出在测试集上的常用回归指标的性能(比如 𝑅2 , MSE等)。 1 import pandas as pd 2 import numpy as np 3 import urllib.request 4 from sklearn.ensemble import RandomForestRegressor 5 from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split 6 from sklearn.metrics import mean_squared_error, r2_score 7 import warnings 8 warnings.filterwarnings('ignore') 9 ​ 10 # 假设你已经将fish_market.csv保存至本目录下,只需运行下面一行即可 11 df = pd.read_csv("Data/fish_market.csv") 12 ​ 13 # 数据预处理 14 X = df.iloc[:, 2:] 15 y = df.iloc[:, 1] 16 ​ 17 # 划分训练集和测试集 18 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 19 ​ 20 # 定义随机森林回归模型 21 rf = 22 ​ 23 # 定义超参数候选列表 24 param_grid = { 25 "n_estimators": [], 26 "max_depth": [], 27 "max_features": [] 28 #...... 29 } 30 ​ 31 # 使用网格搜索寻找最佳超参数组合 32 rf_cv = 33 rf_cv.fit(X_train, y_train) 34 ​ 35 # 输出最佳超参数组合及对应的交叉验证模型回归性能 36 print("Best parameters:", rf_cv.) 37 print("Cross-validation score:", rf_cv.) 38 ​ 39 # 使用最佳参数建模 40 best_rf = rf_cv. 41 ​ 42 # 输出常用回归指标的性能 43 y_pred = 44 mse = 45 r2 = 46 print("Mean Squared Error:", mse) 47 print("R-squared:", r2) Cell In[5], line 21 rf = ^ SyntaxError: invalid syntax 题目2 Mushroom Data Set(蘑菇数据集)是一个二分类问题数据集,用于分类有毒蘑菇和可食用蘑菇。该数据集包含8124个蘑菇的观测数据,每个蘑菇具有22个特征,包括蘑菇的颜色、形状、气味等等。这些特征的取值为离散值或枚举值,用于描述蘑菇的各种特征。目标变量为有毒或可食用的标签,分别用p和e表示。该数据集通常用于机器学习的分类问题和特征选择问题。 问题 请补全以下代码,基于Mushroom Data Set数据集,定义SVM模型和超参数候选列表,并使用网格搜索输出最佳超参数组合和对应的交叉验证准确率,使用上述最佳组合构建模型,输出混淆矩阵、分类准确率、错误率、查准率、查全率、F1-score、ROC曲线及ROC曲线下面积(AUC)等常用的分类模型评估指标。 1 import pandas as pd 2 from sklearn.model_selection import GridSearchCV, train_test_split 3 from sklearn.svm import SVC 4 from sklearn.preprocessing import LabelEncoder, StandardScaler 5 import warnings 6 warnings.filterwarnings('ignore') 7 from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc 8 from sklearn.model_selection import train_test_split 9 from sklearn.preprocessing import StandardScaler 10 ​ 11 ​ 12 # 导入数据集 13 data = pd.read_csv("Data/mushrooms.csv") 14 ​ 15 X = data.iloc[:, 1:] # 特征列 16 y = data.iloc[:, 0] # 标签列 17 ​ 18 # 将特征列进行编码 19 le = LabelEncoder() 20 for col in X.columns: 21 X[col] = le.fit_transform(X[col]) 22 y = y.apply(lambda x: 0 if x == 'p' else 1) 23 ​ 24 # 数据预处理 25 scaler = StandardScaler() 26 X = scaler.fit_transform(X) 27 ​ 28 # 划分训练集和测试集 29 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 30 ​ 31 # 定义SVM模型 32 svm = SVC(probability=True) 33 ​ 34 # 定义超参数候选列表 35 param_grid = {} 36 ​ 37 # 使用网格搜索寻找最佳超参数组合 38 grid_search = 39 grid_search.fit(X_train, y_train) 40 ​ 41 # 输出最佳超参数组合和对应的交叉验证准确率 42 print("Best parameters: ", grid_search.) 43 print("Cross-validation accuracy: %.2f%%" % (grid_search.)) 44 ​ 45 # 使用最佳参数建模 46 best_svm = grid_search. 47 ​ 48 # 输出混淆矩阵 49 y_pred = 50 conf_matrix = 51 print("Confusion matrix:\n", conf_matrix) 52 ​ 53 # 输出分类准确率和错误率 54 accuracy = 55 error_rate = 1 - accuracy 56 print("Accuracy: %.2f%%" % (accuracy*100)) 57 print("Error rate: %.2f%%" % (error_rate*100)) 58 ​ 59 # 输出混淆矩阵、查准率、查全率、F1-score等指标 60 ​ 61 ​ 62 # 输出ROC曲线及AUC 63 y_pred_proba = 64 fpr, tpr, thresholds = 65 roc_auc = auc(fpr, tpr) 66 print("ROC AUC: %.2f" % roc_auc) 67 ​ 68 # 绘制ROC曲线 69 import matplotlib.pyplot as plt 70 plt.figure() 71 plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc) 72 plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--') 73 plt.xlim([0.0, 1.0]) 74 plt.ylim([0.0, 1.05]) 75 plt.xlabel('False Positive Rate') 76 plt.ylabel('True Positive Rate') 77 plt.title('Receiver operating characteristic') 78 plt.legend(loc="lower right") 79 plt.show() Cell In[6], line 38 grid_search = ^ SyntaxError: invalid syntax 题目3 客户分类数据集(customer)包含多种特征,主要分为数值特征和类别特征。数值特征包括客户的年收入(Income)、上次参与活动的天数(Recency)、客户年龄(Age)、自首次注册以来的时长(Customer_For)、总消费金额(Spent),以及客户家庭中的孩子数量(Children)。这些数值特征可以直接用于模型训练,或在标准化之后用于聚类分析。类别特征则包括客户的教育水平(Education)、家庭状况(Living_With,如“单独”或“伴侣”)、家庭成员数量(Family_Size)以及是否有孩子(Is_Parent)。这些类别特征需要通过独热编码或标签编码转化为数值形式,以便输入到聚类模型中。使用不同的无监督学习方法对客户进行聚类,以确定不同客户群体的特点。 问题 补全以下代码,完成: 1、对顾客数据集进行清理,包括处理缺失值、标准化和独热编码,将所有特征转化为数值特征。并使用PCA将高维特征降至三维,绘制PCA结果的三维散点图。 2、使用K-Means算法进行聚类,定义超参数候选列表并通过肘部法确定最佳聚类数量。 3、使用轮廓系数、组间和组内的差异性、集群可解释性等指标评估聚类效果,输出相应的聚类性能指标。 1 import numpy as np 2 import pandas as pd 3 import datetime 4 import matplotlib 5 import matplotlib.pyplot as plt 6 from matplotlib import colors 7 import seaborn as sns 8 from sklearn.preprocessing import LabelEncoder 9 from sklearn.preprocessing import StandardScaler 10 from sklearn.decomposition import PCA 11 from yellowbrick.cluster import KElbowVisualizer 12 from sklearn.cluster import KMeans 13 import matplotlib.pyplot as plt, numpy as np 14 from mpl_toolkits.mplot3d import Axes3D 15 from sklearn.cluster import AgglomerativeClustering 16 from matplotlib.colors import ListedColormap 17 from sklearn import metrics 18 from sklearn.cluster import DBSCAN 19 import warnings 20 import sys 21 if not sys.warnoptions: 22 warnings.simplefilter("ignore") 23 np.random.seed(42) 24 data = pd.read_csv("Data/customer.csv") 25 print("Number of datapoints:", len(data)) 26 data.head() 27 data.info() 28 data = data.dropna() 29 print("The total number of data-points after removing the rows with missing values are:", len(data)) 30 ​ 31 # 将日期字符串解析为日期时间对象 32 from datetime import datetime 33 date_string = "21-08-2013" 34 date_format = "%d-%m-%Y" # Correct format matching the input string 35 parsed_date = datetime.strptime(date_string, date_format) 36 print(parsed_date) 37 ​ 38 # 将 'Dt_Customer' 列转换为日期时间格式 39 data["Dt_Customer"] = pd.to_datetime(data["Dt_Customer"], dayfirst=True) 40 dates = data["Dt_Customer"].dt.date 41 print("The newest customer's enrollment date in the records:", max(dates)) 42 print("The oldest customer's enrollment date in the records:", min(dates)) 43 ​ 44 # 计算客户在数据库中的存在天数 45 days = [] 46 d1 = max(dates) # 选择最新注册客户的日期 47 for i in dates: 48 delta = d1 - i # 计算与最新注册日期的差值 49 days.append(delta) 50 data["Customer_For"] = days # 将结果存入新列 51 data["Customer_For"] = pd.to_numeric(data["Customer_For"], errors="coerce") # 转换为数值类型 52 ​ 53 # 计算客户年龄 54 data["Age"] = 2021-data["Year_Birth"] 55 # 计算客户的消费总额 56 data["Spent"] = data["MntWines"]+ data["MntFruits"]+ data["MntMeatProducts"]+ data["MntFishProducts"]+\ 57 data["MntSweetProducts"]+ data["MntGoldProds"] 58 # 将家庭状况进行合并,简化为 'Living_With' 59 data["Living_With"]=data["Marital_Status"].replace({"Married":"Partner", 60 "Together":"Partner", 61 "Absurd":"Alone", 62 "Widow":"Alone", 63 "YOLO":"Alone", 64 "Divorced":"Alone", 65 "Single":"Alone",}) 66 # 计算孩子的数量 67 data["Children"]=data["Kidhome"]+data["Teenhome"] 68 # 计算家庭规模 69 data["Family_Size"] = data["Living_With"].replace({"Alone": 1, "Partner":2})+ data["Children"] 70 # 标记是否是家长 71 data["Is_Parent"] = np.where(data.Children> 0, 1, 0) 72 # 简化教育水平的分类 73 data["Education"]=data["Education"].replace({"Basic":"Undergraduate", 74 "2n Cycle":"Undergraduate", 75 "Graduation":"Graduate", 76 "Master":"Postgraduate", 77 "PhD":"Postgraduate"}) 78 # 重命名一些列以更清晰 79 data=data.rename(columns={"MntWines": "Wines", 80 "MntFruits":"Fruits", 81 "MntMeatProducts":"Meat", 82 "MntFishProducts":"Fish", 83 "MntSweetProducts":"Sweets", 84 "MntGoldProds":"Gold"}) 85 # 删除不必要的列 86 to_drop = ["Marital_Status", "Dt_Customer", "Z_CostContact", "Z_Revenue", "Year_Birth", "ID"] 87 data = data.drop(to_drop, axis=1) 88 ​ 89 # 查看数据集的描述性统计 90 data.describe() 91 ​ 92 # 可视化某些特征的关系 93 import seaborn as sns 94 import matplotlib.pyplot as plt 95 from matplotlib import colors 96 ​ 97 # 设置黑色背景和白色字体 98 sns.set(style="darkgrid", rc={ 99 "axes.facecolor": "#000000", 100 "figure.facecolor": "#000000", 101 "grid.color": "#444444", 102 "text.color": "white", 103 "axes.labelcolor": "white", 104 "xtick.color": "white", 105 "ytick.color": "white", 106 "axes.edgecolor": "white" 107 }) 108 # 选择要绘制的特征 109 palette = ["#FF6B6B", "#FDD835", "#1DE9B6", "#81D4FA", "#B39DDB", "#FF8A65"] 110 To_Plot = [ "Income", "Recency", "Customer_For", "Age", "Spent", "Is_Parent"] 111 print("Reletive Plot Of Some Selected Features: A Data Subset") 112 ​ 113 # 绘制关系图 114 plt.figure() 115 sns.pairplot(data[To_Plot], hue="Is_Parent", palette=palette) 116 plt.show() 117 ​ 118 # 过滤掉异常值 119 data = data[(data["Age"]<90)] 120 data = data[(data["Income"]<600000)] 121 print("The total number of data-points after removing the outliers are:", len(data)) 122 # 对分类变量进行独热编码 123 data_encoded = pd.get_dummies(data, drop_first=True) 124 ​ 125 # 计算相关性矩阵 126 corrmat = data_encoded.corr() 127 ​ 128 # 设置图形的风格和大小 129 plt.figure(figsize=(20, 20)) 130 sns.heatmap(corrmat, annot=True, cmap='viridis', center=0) 131 ​ 132 # 显示图形 133 plt.show() 134 ​ 135 # 找出数据集中所有的分类变量 136 s = (data.dtypes == 'object') 137 object_cols = list(s[s].index) 138 print("Categorical variables in the dataset:", object_cols) 139 ​ 140 # 对分类变量进行标签编码 141 LE=LabelEncoder() 142 for i in object_cols: 143 data[i]=data[[i]].apply(LE.fit_transform) 144 print("All features are now numerical") 145 ​ 146 # 复制数据集用于后续处理 147 ds = data.copy() 148 ​ 149 # 删除不必要的列 150 cols_del = ['AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1','AcceptedCmp2', 'Complain', 'Response'] 151 ds = ds.drop(cols_del, axis=1) 152 ​ 153 # 数据标准化 154 scaler = StandardScaler() 155 scaler.fit(ds) 156 scaled_ds = pd.DataFrame(scaler.transform(ds),columns= ds.columns ) 157 print("All features are now scaled") 158 ​ 159 # 进行PCA降维 160 pca = 161 pca.fit(scaled_ds) 162 PCA_ds = pd.DataFrame(pca.transform(scaled_ds), columns=(["col1","col2", "col3"])) 163 PCA_ds.describe().T # 显示PCA结果的描述统计 164 ​ 165 # 提取PCA的三个主成分 166 x = 167 y = 168 z = 169 ​ 170 # 绘制三维散点图,显示降维后的数据 171 fig = plt.figure(figsize=(10,8)) 172 ax = fig.add_subplot(111, projection="3d") 173 ax.scatter(x,y,z, c="maroon", marker="o" ) 174 ax.set_title("A 3D Projection Of Data In The Reduced Dimension") 175 plt.show() 176 ​ 177 # 使用肘部法则确定聚类数 178 print('Elbow Method to determine the number of clusters to be formed:') 179 Elbow_M = # 设置聚类数的范围 180 Elbow_M.fit(PCA_ds) 181 Elbow_M.show() # 显示肘部法则结果 182 ​ 183 # 使用K-Means进行聚类 184 k = 4 # 根据肘部法则确定最佳的聚类数为4 185 kmeans = # 创建K-Means模型,并设置聚类数为4 186 kmeans_labels = kmeans.fit_predict(scaled_ds) # 对标准化后的数据进行聚类,并返回每个样本的标签 187 centroids = kmeans. # 获取每个聚类的质心 188 ​ 189 # 将高维数据和聚类质心降维到PCA降维后的空间 190 reduced_data = pca.transform(scaled_ds) # 使用PCA将数据降维 191 reduced_centroids = pca.transform(centroids) # 将质心也映射到PCA降维后的空间 192 ​ 193 # 设置图形大小 194 plt.figure(figsize=(10, 6)) 195 ​ 196 # 获取唯一的聚类标签 197 unique_labels = set(kmeans_labels) 198 ​ 199 # 使用颜色映射为每个聚类分配不同颜色 200 colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels))) 201 ​ 202 # 遍历每个聚类并绘制对应的样本点和质心 203 for k, col in zip(unique_labels, colors): 204 # 选择属于当前聚类的样本 205 class_member_mask = (kmeans_labels == k) 206 207 # 获取当前聚类的降维数据 208 xy = reduced_data[class_member_mask] 209 210 # 绘制当前聚类的样本点,使用'o'表示点 211 plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col, markeredgecolor='k', markersize=6) 212 213 # 获取当前聚类的质心 214 center = reduced_centroids[k] 215 216 # 绘制质心,使用'*'表示质心,并加大标记大小 217 plt.plot(center[0], center[1], '*', markerfacecolor=col, markeredgecolor='k', markersize=14) 218 ​ 219 # 设置图形的标题和坐标轴标签 220 plt.title('K-Means Clustering') # 图的标题 221 plt.xlabel('PCA Component 1') # x轴标签 222 plt.ylabel('PCA Component 2') # y轴标签 223 ​ 224 # 显示绘制的聚类图 225 plt.show() Cell In[7], line 160 pca = ^ SyntaxError: invalid syntax
11-05
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值