Bayes Game

通过一个关于三个倒置杯子的选择游戏,展示如何利用贝叶斯定理来更新先验概率,以获得更高的获胜几率。

Game Show about Bayes Theorem

本文简单讲述一个有趣的经典的 Bayes定理 相关的小故事,展示了后验概率 posterior probability 出乎意料的威力。

Question

我们有三个杯子倒置桌上 g1,g2,g3,其中一个杯子下隐藏有 candy。当我们选择一个杯子后,剩下俩个杯子中的一个空杯子将被移除;此时我们是否应该变更选择,以更大概率拿到 candy?

Intuition

简单来说,我们的首次选择仅仅依靠先验概率 prior probability,有 13 的概率拿到 candy;如果一个空杯被移除后,我们坚持选择,概率依然是 candy。然后,如果我们改变选择,选择移除后留下来的那个杯子,概率将变为 113=23,整整翻了一倍。

貌似不相关的事件,却极大地影响了最终结果。上述移除操作本身,和我们的任务目标是耦合的,移除操作带来了额外的信息;如果移除操作是剩下的俩个杯子中随机移除一个的话,概率将不会发生变化。如果一尘不变地抱持着先验经验不放,不对事态的发展做出适应或调整的话,可能会出乎意料地产生误判,错失机会。

Mathematics

我们首先对游戏规则进行量化表述:

对于三个杯子 g1,g2,g3
candy 存在于各个杯子的先验概率相同,设为变量 C:p(C=1)=p(C=2)=p(C=3)=13
假设我们已经选择了:g1 (g2,g3 也是类似的)
移除杯子事件设为变量 R (remove) :p(R=1), p(R=2), p(R=3)

可以得到 remove 规则,条件概率分布 P(R|C) 如下:

R=1R=2R=3
C=101/2
C=200
C=301

g1 已被选择,所以没有可能 remove。当目标 candy 存在于剩下的俩个杯子中时,remove 的必然是空杯子,必然留下存在 candy 的一个,概率为 1。分布 P(R|C) 是 remove 的规则,如果这个操作和 candy 的存在独立的话,上述表格中的每一行应该都是 0,0.5,0.5;这种耦合事件作为我们的已知观察,将会修正最开始的先验分布。这里我们展开基于这个 将会改变我们对于 candy 的先验分布。

基于 remove 结果,修正目标概率评估,求取后验概率分布 P(C|R)

Obviously,remove 操作本身的先验概率如下 (R=3 类似 R=2):

p(R=2)=i=13p(R=2|C=i) p(C=i)=12

p(C=1|R=2)=p(R=2|C=1) p(C=1)p(R=2)=121312=13

p(C=3|R=2)=p(R=2|C=3) p(C=3)p(R=2)=11312=23

所以,不管 R=2 或者 R=3,移除后剩下的杯子,都拥有我们当前选择杯子两倍的命中几率。
We should switch to the left one!

数据集基本信息: <class 'pandas.core.frame.DataFrame'> RangeIndex: 3599999 entries, 0 to 3599998 Data columns (total 3 columns): # Column Dtype --- ------ ----- 0 2 int64 1 Stuning even for the non-gamer object 2 This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^ object dtypes: int64(1), object(2) memory usage: 82.4+ MB 数据前几行内容信息: 2 Stuning even for the non-gamer This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^ 0 2 The best soundtrack ever to anything. I'm reading a lot of reviews saying that this is the best 'game soundtrack' and I figured that I'd write a review to disagree a bit. This in my opinino is Yasunori Mitsuda's ultimate masterpiece. The music is timeless and I'm been listening to it for years now and its beauty simply refuses to fade.The price tag on this is pretty staggering I must say, but if you are going to buy any cd for this much money, this is the only one that I feel would be worth every penny. 1 2 Amazing! "This soundtrack is my favorite music of all time, hands down. The intense sadness of ""Prisoners of Fate"" (which means all the more if you've played the game) and the hope in ""A Distant Promise"" and ""Girl who Stole the Star"" have been an important inspiration to me personally throughout my teen years. The higher energy tracks like ""Chrono Cross ~ Time's Scar~"", ""Time of the Dreamwatch"", and ""Chronomantique"" (indefinably remeniscent of Chrono Trigger) are all absolutely superb as well.This soundtrack is amazing music, probably the best of this composer's work (I haven't heard the Xenogears soundtrack, so I can't say for sure), and even if you've never played the game, it would be worth twice the price to buy it.I wish I could give it 6 stars." 2 2 Excellent Soundtrack I truly like this soundtrack and I enjoy video game music. I have played this game and most of the music on here I enjoy and it's truly relaxing and peaceful.On disk one. my favorites are Scars Of Time, Between Life and Death, Forest Of Illusion, Fortress of Ancient Dragons, Lost Fragment, and Drowned Valley.Disk Two: The Draggons, Galdorb - Home, Chronomantique, Prisoners of Fate, Gale, and my girlfriend likes ZelbessDisk Three: The best of the three. Garden Of God, Chronopolis, Fates, Jellyfish sea, Burning Orphange, Dragon's Prayer, Tower Of Stars, Dragon God, and Radical Dreamers - Unstealable Jewel.Overall, this is a excellent soundtrack and should be brought by those that like video game music.Xander Cross 3 2 Remember, Pull Your Jaw Off The Floor After Hearing it If you've played the game, you know how divine the music is! Every single song tells a story of the game, it's that good! The greatest songs are without a doubt, Chrono Cross: Time's Scar, Magical Dreamers: The Wind, The Stars, and the Sea and Radical Dreamers: Unstolen Jewel. (Translation varies) This music is perfect if you ask me, the best it can be. Yasunori Mitsuda just poured his heart on and wrote it down on paper. 4 2 an absolute masterpiece I am quite sure any of you actually taking the time to read this have played the game at least once, and heard at least a few of the tracks here. And whether you were aware of it or not, Mitsuda's music contributed greatly to the mood of every single minute of the whole game.Composed of 3 CDs and quite a few songs (I haven't an exact count), all of which are heart-rendering and impressively remarkable, this soundtrack is one I assure you you will not forget. It has everything for every listener -- from fast-paced and energetic (Dancing the Tokage or Termina Home), to slower and more haunting (Dragon God), to purely beautifully composed (Time's Scar), to even some fantastic vocals (Radical Dreamers).This is one of the best videogame soundtracks out there, and surely Mitsuda's best ever. ^_^
11-19
### 详细思路 - 了解数据集:查看数据集的基本信息、行数、列数以及前几行内容,明确数据包含的信息和数据类型。 - 数据预处理:对评论文本进行清洗,包括去除特殊字符、转换为小写、分词、去除停用词等操作。处理缺失值,保证数据质量。 - 文本表示:选择合适的文本表示方法,如词袋模型、TF - IDF 模型或词嵌入方法,将文本数据转换为数值特征。 - 模型训练与评估:划分训练集和测试集,选择合适的分类算法进行模型训练,在训练过程中调整超参数。使用准确率、召回率、F1 值等指标评估模型性能。 - 改进模型:尝试增加额外特征或改进模型结构,观察改进前后模型性能的变化。 - 分析评论情感倾向分布:根据模型预测结果,分析不同类型产品的评论情感倾向分布。 ### 代码实现 #### 1. 了解数据集 ```python import pandas as pd # 加载数据集 data = pd.read_csv('/kaggle/input/amazon-reviews/your_file_name.csv') # 查看数据集基本信息 print('数据集基本信息:') data.info() # 查看数据集行数和列数 rows, columns = data.shape if rows < 1000: # 小数据集(行数少于1000)查看全量数据信息 print('数据全部内容信息:') print(data.to_csv(sep='\t', na_rep='nan')) else: # 大数据集查看数据前几行信息 print('数据前几行内容信息:') print(data.head().to_csv(sep='\t', na_rep='nan')) ``` #### 2. 数据预处理 ```python import re from nltk.corpus import stopwords import nltk nltk.download('stopwords') def clean_text(text): # 移除特殊字符 text = re.sub(r'[^a-zA-Z\s]', '', text) # 转换为小写 text = text.lower() # 分词 words = text.split() # 移除停用词 stops = set(stopwords.words('english')) words = [w for w in words if w not in stops] return " ".join(words) # 假设数据集中有一个文本列 'review_text' if 'review_text' in data.columns: data['cleaned_text'] = data['review_text'].apply(clean_text) ``` #### 3. 文本表示 ```python from sklearn.feature_extraction.text import TfidfVectorizer if 'cleaned_text' in data.columns: vectorizer = TfidfVectorizer(max_features=1000) X = vectorizer.fit_transform(data['cleaned_text']) # 假设数据集中有一个目标列 'sentiment' if 'sentiment' in data.columns: y = data['sentiment'] ``` #### 4. 模型训练与评估 ```python from sklearn.model_selection import train_test_split from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix if 'cleaned_text' in data.columns and 'sentiment' in data.columns: # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 训练模型 model = MultinomialNB() model.fit(X_train, y_train) # 预测 y_pred = model.predict(X_test) # 评估模型 accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred, average='weighted') recall = recall_score(y_test, y_pred, average='weighted') f1 = f1_score(y_test, y_pred, average='weighted') conf_matrix = confusion_matrix(y_test, y_pred) print(f"模型准确率: {accuracy:.2f}") print(f"模型精确率: {precision:.2f}") print(f"模型召回率: {recall:.2f}") print(f"模型F1值: {f1:.2f}") print("混淆矩阵:") print(conf_matrix) ``` #### 5. 改进模型 ```python # 可以尝试其他模型,如逻辑回归 from sklearn.linear_model import LogisticRegression if 'cleaned_text' in data.columns and 'sentiment' in data.columns: # 训练新模型 new_model = LogisticRegression() new_model.fit(X_train, y_train) # 预测 new_y_pred = new_model.predict(X_test) # 评估新模型 new_accuracy = accuracy_score(y_test, new_y_pred) new_precision = precision_score(y_test, new_y_pred, average='weighted') new_recall = recall_score(y_test, new_y_pred, average='weighted') new_f1 = f1_score(y_test, new_y_pred, average='weighted') print(f"新模型准确率: {new_accuracy:.2f}") print(f"新模型精确率: {new_precision:.2f}") print(f"新模型召回率: {new_recall:.2f}") print(f"新模型F1值: {new_f1:.2f}") ``` #### 6. 分析评论情感倾向分布 ```python if 'product_type' in data.columns and 'sentiment' in data.columns: # 假设数据集中有 'product_type' 列表示产品类型 sentiment_distribution = data.groupby('product_type')['sentiment'].value_counts() print("不同类型产品的评论情感倾向分布:") print(sentiment_distribution) ``` ### 哔哩哔哩免费课程推荐 - 《Python机器学习基础教程》:该课程详细介绍了Python在机器学习中的应用,包括数据处理、模型训练等基础知识。 - 《吴恩达机器学习课程》:经典的机器学习入门课程,对理解机器学习的基本概念和算法有很大帮助。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值