超实用！Google Cloud生成式AI项目中的图像输入与知识锚定实战指南-优快云博客

超实用！Google Cloud生成式AI项目中的图像输入与知识锚定实战指南

【免费下载链接】generative-ai Sample code and notebooks for Generative AI on Google Cloud 项目地址: https://gitcode.com/GitHub_Trending/ge/generative-ai

你是否还在为如何让AI准确理解图像内容并结合外部知识而烦恼？本文将通过Google Cloud Platform/generative-ai项目中的实际案例，带你一步掌握图像输入处理与知识锚定的核心技术，无需复杂编程基础也能轻松上手。读完本文，你将学会如何利用Gemini模型实现文本到图像生成、图像编辑，以及如何将图像信息与知识库关联，让AI应用更智能、更实用。

图像生成技术全解析

文本到图像生成基础

Gemini 2.5 Flash Image模型提供了强大的文本到图像生成能力，只需简单描述即可创建高质量图像。通过设置不同的参数，还能控制生成图像的比例、风格等属性。

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="a cartoon infographic on flying sneakers",
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="9:16",
        ),
        candidate_count=1,
    ),
)

上述代码展示了如何使用Gemini 2.5 Flash Image模型生成一幅关于"飞行运动鞋"的卡通信息图。通过调整aspect_ratio参数，可以生成不同比例的图像，满足各种展示需求。

多模态内容生成

Gemini模型不仅能生成纯图像，还支持同时生成文本和图像的混合内容，非常适合创建教程、指南等场景。

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Create a tutorial explaining how to make a peanut butter and jelly sandwich in three easy steps. For each step, provide a title with the number of the step, an explanation, and also generate an image to illustrate the content. Label each image with the step number but no other words.",
    config=GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="4:3",
        ),
    ),
)

这段代码会生成一个包含文字说明和对应图像的三步教程，展示如何制作花生酱果冻三明治。这种多模态输出能力极大丰富了内容创作的可能性。

高级图像编辑技巧

基于参考图像的风格迁移

利用Gemini模型，你可以将一张图像的风格应用到另一张图像上，创造出独特的视觉效果。例如，将客厅的风格迁移到厨房设计中：

with open("living-room.png", "rb") as f:
    image = f.read()

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        Part.from_bytes(
            data=image,
            mime_type="image/png",
        ),
        "Using the concepts, colors, and themes from this living room generate a kitchen and dining room with the same aesthetic.",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="21:9",
        ),
        candidate_count=1,
    ),
)

通过这种方式，你可以快速将现有设计风格应用到新的场景中，大大提高设计效率。

多轮图像编辑与优化

Gemini支持多轮对话式图像编辑，你可以通过连续的提示来逐步优化图像效果。这种方式特别适合需要精细调整的场景。

# 开始对话
chat = client.chats.create(model="gemini-2.5-flash-image")

# 第一次编辑：改变香水瓶颜色
response = chat.send_message(
    message=[
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/perfume.jpg",
            mime_type="image/jpeg",
        ),
        "change the perfume color to a light purple",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="3:2",
        ),
    ),
)

# 第二次编辑：添加文字
response = chat.send_message(
    message=[
        Part.from_bytes(
            data=data,
            mime_type="image/jpeg",
        ),
        "inscribe the word flowers in French on the perfume bottle in a delicate white cursive font",
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(
            aspect_ratio="3:2",
        ),
    ),
)

通过这种多轮编辑方式，你可以逐步完善图像，实现精确的视觉效果控制。

Imagen 3模型高级应用

Imagen 3 vs Imagen 3 Fast

Imagen 3系列提供了两个模型选项：Imagen 3注重质量，Imagen 3 Fast则在保持良好质量的同时提供更快的响应速度。

# Imagen 3 高质量图像生成
image = client.models.generate_images(
    model="imagen-3.0-generate-002",
    prompt="a photorealistic image of the inside of an amethyst crystal on display in a museum",
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="3:4",
        safety_filter_level="BLOCK_MEDIUM_AND_ABOVE",
        person_generation="DONT_ALLOW",
    ),
)

# Imagen 3 Fast 快速图像生成
fast_image = client.models.generate_images(
    model="imagen-3.0-fast-generate-001",
    prompt="a photorealistic image of the inside of an amethyst crystal on display in a museum",
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="3:4",
        safety_filter_level="BLOCK_MEDIUM_AND_ABOVE",
        person_generation="DONT_ALLOW",
    ),
)

根据项目需求选择合适的模型，可以在质量和性能之间取得平衡。

提示词优化与图像质量提升

Imagen 3模型支持提示词增强功能，通过设置enhance_prompt=True，模型会自动优化输入提示词，生成更高质量的图像。

image = client.models.generate_images(
    model="imagen-3.0-generate-002",
    prompt="an art museum",
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="1:1",
        enhance_prompt=True,
        safety_filter_level="BLOCK_MEDIUM_AND_ABOVE",
        person_generation="DONT_ALLOW",
    ),
)

# 打印优化后的提示词
print(image.generated_images[0].enhanced_prompt)

启用提示词增强后，模型会扩展和优化原始提示，添加更多细节和上下文，从而生成更符合预期的图像。

知识锚定与外部信息融合

图像内容与知识库关联

通过函数调用功能，你可以将图像分析结果与外部知识库关联，实现知识锚定，让AI不仅能"看到"图像，还能"理解"图像内容的背景知识。

get_multiple_location_coordinates = FunctionDeclaration(
    name="get_location_coordinates",
    description="Get coordinates of multiple locations",
    parameters={
        "type": "object",
        "properties": {
            "locations": {
                "type": "array",
                "description": "A list of locations",
                "items": {
                    "description": "Components of the location",
                    "type": "object",
                    "properties": {
                        "point_of_interest": {
                            "type": "string",
                            "description": "Name or type of point of interest",
                        },
                        "city": {"type": "string", "description": "City"},
                        "country": {"type": "string", "description": "Country"},
                    },
                    "required": [
                        "point_of_interest",
                        "city",
                        "country",
                    ],
                },
            }
        },
    },
)

geocoding_tool = Tool(
    function_declarations=[get_multiple_location_coordinates],
)

上述代码定义了一个获取地理位置坐标的函数，并将其注册为工具。结合图像分析结果，模型可以调用此工具获取相关地点的坐标信息，实现图像内容与地理位置知识的融合。

网页内容提取与图像信息融合

通过Vertex AI Search，你可以将图像分析结果与网页内容结合，实现更丰富的知识锚定。下面是一个将图像中的产品信息与网页产品详情关联的示例：

def get_page_contents(search_query: str) -> str | None:
    response = get_relevant_snippets(search_query)
    link = get_first_link(response)
    if link:
        details = load_and_format_page_content(link)
        return details if details else None
    return None

# 使用Gemini模型分析图像中的产品信息
product_info = analyze_product_image(image_data)

# 基于产品信息搜索相关网页内容
web_content = get_page_contents(product_info["name"])

# 融合图像信息和网页内容生成综合回答
combined_answer = gemini.generate_content(f"Based on the product image and web content, provide a comprehensive product description: {product_info} {web_content}")

通过这种方式，AI可以将图像中提取的信息与网页上的详细内容结合，生成更全面、准确的回答。

实际应用案例

零售产品展示自动化

利用Gemini的图像生成和编辑能力，零售企业可以快速创建产品展示素材，结合知识锚定技术，自动添加产品规格、价格等信息。

# 生成产品基础图像
product_base = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="a high-quality image of wireless headphones on a white background",
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(aspect_ratio="1:1"),
    ),
)

# 编辑添加产品信息
product_with_info = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        Part.from_bytes(data=product_base.image_data, mime_type="image/png"),
        "Add price tag $199.99 and feature list: Noise cancellation, 30h battery life, Water resistant"
    ],
    config=GenerateContentConfig(
        response_modalities=["IMAGE"],
        image_config=ImageConfig(aspect_ratio="1:1"),
    ),
)

旅游景点导览系统

结合图像生成和知识锚定技术，可以创建交互式旅游景点导览系统，自动生成景点图像并关联历史背景、开放时间等实用信息。

# 生成景点图像
attraction_image = client.models.generate_content(
    model="imagen-3.0-generate-002",
    contents="photorealistic image of the Eiffel Tower at sunset",
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="16:9",
        enhance_prompt=True,
    ),
)

# 获取景点信息
attraction_info = get_page_contents("Eiffel Tower visitor information")

# 生成导览内容
guide_content = gemini.generate_content(f"Create a tourist guide for the Eiffel Tower using the image and information: {attraction_info}")

总结与展望

通过本文介绍的技术，你已经了解如何在Google Cloud生成式AI项目中处理图像输入并实现知识锚定。这些技术可以广泛应用于内容创作、零售、旅游、教育等多个领域，大大提升AI应用的实用性和智能水平。

随着Gemini和Imagen系列模型的不断更新，未来我们可以期待更强大的图像理解和生成能力，以及更无缝的知识融合技术。建议开发者持续关注官方文档和示例代码库，及时掌握最新功能。

希望本文对你的项目有所帮助！如果你有任何问题或想法，欢迎在项目GitHub仓库提交issue或PR，一起推动生成式AI技术的应用与发展。

【免费下载链接】generative-ai Sample code and notebooks for Generative AI on Google Cloud 项目地址: https://gitcode.com/GitHub_Trending/ge/generative-ai

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考