llama3.2-vision多模态交互

最新推荐文章于 2025-06-09 09:22:41 发布

原创最新推荐文章于 2025-06-09 09:22:41 发布 · 817 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#交互 #人工智能 #python #llama

大模型专栏收录该内容

5 篇文章

订阅专栏

文章目录

前言
一、多模态
二、使用步骤
总结

前言

一、多模态

本文测试了llama3.2-vision的（文本+图像）多模态交互。纯文本交互见：llama3.2-vision

二、使用步骤

1. 任务说明

python脚本，给定一张图片，用户可以进行交互讨论。测试了 test.jpg 和 fruts.jpeg，如下：

请添加图片描述

2. Demo

代码如下（示例）：

import ollama
import argparse

def api_chat(img='test.jpg'):
	conversation_history = [
		{"role": "system", "content": "You are an assistant."},
		{
			'role': 'user',
			#'content': 'What is in this image?',
			'content': '你可以用中文介绍一下这幅图片吗?',
			'images': [img]
		}]
	response = ollama.chat(
		model='llama3.2-vision',
		messages=conversation_history
	)
	#print(response['message']['content'])
	print(response['message'])
	conversation_history.append(response['message'])

	try:
		while True:
			# 模拟第二轮对话，用户继续提问
			user_message = input("\n还有什么需要帮助的,请输入: ")
			if not user_message:  # 如果输入为空（即用户只按了回车）
        			print("输入不能为空，请重新输入：")
        			continue  # 继续循环，要求重新输入
			cmd = {"role": "user", "content": user_message}
			# 将用户的新消息添加到对话历史
			conversation_history.append(cmd)
			response = ollama.chat(
				model='llama3.2-vision',
				messages=conversation_history
			)
			print(response['message'])
			conversation_history.append(response['message'])
	except KeyboardInterrupt:
		# 捕获 Ctrl+C 中断
		print("\n程序已终止。")

if __name__ == '__main__':
	# 创建解析器
	parser = argparse.ArgumentParser(description="示例程序")
	# 添加可选参数 --name，默认为"test.jpg"
	parser.add_argument('--img', type=str, default='test.jpg', help='图片名称')
	# 解析命令行参数
	args = parser.parse_args()
	# 使用参数
	print(f"Name of the image: {args.img}")

	api_chat(args.img);