哈喽GPT-4o,对GPT-4o 数据分析Data Analysis的思考与看法

在这里插入图片描述

大家好,我是哪吒。

OpenAI的GPT-4o的横空出世,再次巩固了其作为行业颠覆者的地位。GPT-4o的发布不仅仅是一个产品的揭晓,它更像是向世界宣告AI技术已迈入了一个全新的纪元,连OpenAI的领航者萨姆·奥特曼也不禁感慨,这如同直接从科幻电影情节走进现实。

GPT-4o以及ChatGPT产品的快速总结:图片GPT-4o(“o”代表“全能”)它可以接受任意组合的文本、音频和图像作为输入,并生成任意组合的文本、音频和图像输出。

GPT-4o的问世,标志着AI技术的一个巨大飞跃,它不再局限于单一媒介的交互,而是开创性地实现了文本、语音、图像三者间的无缝融合。

👉 GPT功能:

  1. GPT-4o知识问答:支持1000+token上下文记忆功能
  2. 最强代码大模型Code Copilot:代码自动补全、代码优化建议、代码重构等
  3. DALL-E AI绘画:AI绘画 + 剪辑 = 自媒体新时代
  4. 私信哪吒,直接使用GPT-4o

新模型新玩法,下面简单介绍一下 GPT-4o 数据分析插件Data Analysis~

Data Analysis 提供了丰富的工具和库来支持这些任务,使其成为数据分析的强大环境。

无论是数据科学分析新手还是专家,都可以按步骤进行实践,了解如何使用Data Analysis进行数据加载、清洗、探索和可视化。

上传一个Excel给Data Analysis。

Prompt:请问这个数据集是做什么的

Prompt:请问书籍的定价如何,请用合适的图表展示它的售价情况

Prompt:请统计书名列中出现最多的名称,然后使用词云将其可视化。

Prompt:请用5个图表从不同方面展示对这个数据集的分析。

绘制分形图形

分形图形是一类由相似部分构成的复杂结构,每个部分都是整体的缩小版。分形的独特之处在于其复杂的结构在不同尺度上重复出现,因此无论将图像放大或缩小,都能看到相同的基本形状。这一特性称为自相似性。

分形是数学和自然界中常见的现象,可以用数学方程来描述。以下是一些经典的分形图形:

(1) 曼德布罗特集合(Mandelbrot Set):分形中最著名的例子之一,由复杂的数学方程定义,并在复数平面上展示。

(2) 朱利亚集合(Julia Set):与曼德布罗特集合相关,但每个点的行为有所不同。

(3) 谢尔宾斯基三角形(Sierpinski Triangle):一个简单但引人注目的分形,由不断缩小的三角形组成。

(4) 科赫曲线(Koch Curve):一个无限复杂的连续分形,由反复添加新的几何形状构成。

分形不仅在数学中有趣,也在自然界中广泛存在。例如,在云朵、山脉、河流、树木等中都可以观察到分形结构。

此外,分形在艺术和计算机图形学中也有广泛应用。它们可用于创建复杂而美丽的图案,常见于电影和视频游戏中的自然景观建模。

分形的研究不仅加深了我们对数学和自然界复杂性的理解,还推动了多个领域的技术进步。

下面使用Data Analysis绘制分形图形。

Prompt:你能够绘制朱利亚集合吗?请以合适的方式展示出来

计算复杂的数学问题

Data Analysis 的数学计算过程本质上是通过 Python 程序调用 SymPy 库等专业工具来实现的。SymPy 库不仅支持符号计算和高精度计算,还具备模式匹配、绘图、解方程、微积分、组合数学、离散数学、几何学、概率与统计及物理学等多方面的功能。借助 Data Analysis,用户可以解决大量基础数学问题,展现其强大和灵活的计算能力。

Prompt:请画出这个函数的图像:x3 + y3 = 3axy,a 的值从 -10 到 10。

使用Data Analysis做一道高数题

Prompt:计算由摆线 x = a(t - sin t), y = a(1 - cos t) 相应于 0 ≤ t ≤ 2π 的一拱与直线 y = 0 所围成的图形绕 x 轴旋转而成的旋转体体积。要保证计算过程和结果都要正确。

在处理较为复杂的数学问题时,Data Analysis 有时会显示出其局限性。Wolfram Alpha 的首席科学家 Michael Trott 为了展示不同工具在解决数学难题方面的效果,特意挑选了一百道数学问题,这些问题来源广泛,包括数学杂志、大学数学竞赛和数学奥林匹克等。随后,他分别利用 Wolfram Plugin 和 Data Analysis 进行尝试,以应对这些挑战性的问题。

实验结果表明,Wolfram Plugin 凭借其强大的计算能力解决了全部问题,而 Data Analysis 仅解决了约 50% 的问题。这个结果清楚地表明,Wolfram Plugin 在处理复杂数学问题时具有显著优势。

此外,他还比较了两者在解决问题时所需的代码长度。值得注意的是,使用 Wolfram 语言编写的代码长度仅为 Data Analysis 使用 Python 代码长度的 27%。这一结果进一步验证了 Wolfram 语言在编写效率和代码简洁性方面的卓越性。

因此,尽管 Data Analysis 在许多方面表现优异,但在解决复杂数学问题时,仍然推荐使用 Wolfram Plugin。这不仅能提供更准确和全面的解决方案,还能通过更简洁的代码节省时间和精力。


👉 GPT功能:

  1. GPT-4o知识问答:支持1000+token上下文记忆功能
  2. 最强代码大模型Code Copilot:代码自动补全、代码优化建议、代码重构等
  3. DALL-E AI绘画:AI绘画 + 剪辑 = 自媒体新时代
  4. 私信哪吒,直接使用GPT-4o

💻 Usage Instructions & Steps to reproduce We structure the code available in this replication package based on the stages involved in the LLM-based annotation process. 🤖 LLM-based annotation The folder contains the code used to generate the LLM-based annotations.llm_annotation There are two main scripts: create_assistant.py is used to create a new assistant with a particular provider and model. This class includes the definition of a common system prompt across all agents, using the file as the basis.data/guidelines.txt annotate_emotions.py is used to annotate a set of emotions using a previously created assistant. This script includes the assessment of the output format, as well as some common metrics for cost-efficiency analysis and output file generation. Our research includes an LLM-based annotation experimentation with 3 LLMs: GPT-4o, Mistral Large 2, and Gemini 2.0 Flash. To illustrate the usage of the code, in this README we refer to the code execution for generating annotations using GPT-4o. However, full code is provided for all LLMs. 🔑 Step 1: Add your API key If you haven't done this already, add your API key to the file in the root folder. For instance, for OpenAI, you can add the following:.env OPENAI_API_KEY=sk-proj-... 🛠️ Step 2: Create an assistant Create an assistant using the script. For instance, for GPT-4o, you can run the following command:create_assistant.py python ./code/llm_annotation/create_assistant_openai.py --guidelines ./data/guidelines.txt --model gpt-4o This will create an assistant loading the file and using the GPT-4o model.data/guidelines.txt 📝 Step 3: Annotate emotions Annotate emotions using the script. For instance, for GPT-4o, you can run the following command using a small subset of 100 reviews from the ground truth as an example:annotate_emotions.py python ./code/llm_annotation/annotate_emotions_openai.py --input ./data/ground-truth-small.xlsx --output ./data/annotations/llm/temperature-00/ --batch_size 10 --model gpt-4o --temperature 0 --sleep_time 10 For annotating the whole dataset, run the following command (IMPORTANT: this will take more than 60 minutes due to OpenAI, Mistral and Gemini consumption times!): python ./code/llm_annotation/annotate_emotions_openai.py --input ./data/ground-truth.xlsx --output ./data/annotations/llm/temperature-00/ --batch_size 10 --model gpt-4o --temperature 0 --sleep_time 10 Parameters include: input: path to the input file containing the set of reviews to annotate (e.g., ).data/ground-truth.xlsx output: path to the output folder where annotations will be saved (e.g., ).data/annotations/llm/temperature-00/ batch_size: number of reviews to annotate for each user request (e.g., 10). model: model to use for the annotation (e.g., ).gpt-4o temperature: temperature for the model responses (e.g., 0). sleep_time: time to wait between batches, in seconds (e.g., 10). This will annotate the emotions using the assistant created in the previous step, creating a new file with the same format as in the file.data/ground-truth.xlsx 🔄 Data processing In this stage, we refactor all files into iterations and we consolidate the agreement between multiple annotators or LLM runs. These logic serves both for human and LLM annotations. Parameters can be updated to include more annotators or LLM runs. ✂️ Step 4: Split annotations into iterations We split the annotations into iterations based on the number of annotators or LLM runs. For instance, for GPT-4o (run 0), we can run the following command: python code/data_processing/split_annotations.py --input_file data/annotations/llm/temperature-00/gpt-4o-0-annotations.xlsx --output_dir data/annotations/iterations/ This facilitates the Kappa analysis and agreement in alignment with each human iteration. 🤝 Step 5: Analyse agreement We consolidate the agreement between multiple annotators or LLM runs. For instance, for GPT-4o, we can run the following command to use the run from Step 3 (run 0) and three additional annotations (run 1, 2, and 3) already available in the replication package (NOTE: we simplify the process to speed up the analysis and avoid delays in annotation): python code/evaluation/agreement.py --input-folder data/annotations/iterations/ --output-folder data/agreements/ --annotators gpt-4o-0 gpt-4o-1 gpt-4o-2 gpt-4o-3 For replicating our original study, run the following: python code/evaluation/agreement.py --input-folder data/annotations/iterations/ --output-folder data/agreements/ --annotators gpt-4o-1 gpt-4o-2 gpt-4o-3 📊 Evaluation After consolidating agreements, we can evaluate both the Cohen's Kappa agreement and correctness between the human and LLM-based annotations. Our code allows any combination of annotators and LLM runs. 📈 Step 6: Emotion statistics We evaluate the statistics of the emotions in the annotations, including emotion frequency, distribution, and correlation between emotions. For instance, for GPT-4o and the example in this README file, we can run the following command: python code/evaluation/emotion_statistics.py --input-file data/agreements/agreement_gpt-4o-0-gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output-dir data/evaluation/statistics/gpt-4o-0123 For replicating our original study, run the following: python code/evaluation/emotion_statistics.py --input-file data/agreements/agreement_gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output-dir data/evaluation/statistics/gpt-4o ⚖️ Step 7: Cohen's Kappa pairwise agreement We measure the average pairwise Cohen's Kappa agreement between annotators or LLM runs. For instance, for GPT-4o and the example in this README file, we can run the following command: python code/evaluation/kappa.py --input_folder data/annotations/iterations/ --output_folder data/evaluation/kappa/ --annotators gpt-4o-0,gpt-4o-1,gpt-4o-2,gpt-4o-3 For replicating our original study, run the following: python code/evaluation/kappa.py --input_folder data/annotations/iterations/ --output_folder data/evaluation/kappa/ --annotators gpt-4o-1,gpt-4o-2,gpt-4o-3 --exclude 0,1,2 In our analysis, we exclude iterations 0, 1 and 2 as they were used for guidelines refinement. ✅ Step 8: LLM-based annotation correctness We measure the correctness (accuracy, precision, recall, and F1 score) between a set of annotated reviews and a given ground truth. For instance, for GPT-4o agreement and the example in this README file, we can run the following command: python code/evaluation/correctness.py --ground_truth data/ground-truth.xlsx --predictions data/agreements/agreement_gpt-4o-0-gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output_dir data/evaluation/correctness/gpt-4o For replicating our original study, run the following: python code/evaluation/correctness.py --ground_truth data/ground-truth.xlsx --predictions data/agreements/agreement_gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output_dir data/evaluation/correctness/gpt-4o 📝 Step 8: Check results After completing these steps, you will be able to check all generated artefacts, including: LLM annotations: available at data\annotations\llm\ Agreement between LLM annotations and humans: available at data\evaluation\kappa Correctness of LLM annotations with respect to Human agreement: available at data\evaluation\correctness 📜 License
最新发布
11-10
评论 18
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

哪 吒

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值