Florence-2-large-ft Web界面:可视化操作平台
【免费下载链接】Florence-2-large-ft 项目地址: https://ai.gitcode.com/mirrors/Microsoft/Florence-2-large-ft
概述
Florence-2-large-ft是微软推出的先进视觉基础模型,能够通过简单的文本提示处理多种视觉和视觉-语言任务。本文将详细介绍如何为Florence-2-large-ft构建一个功能完整的Web可视化操作平台,让用户能够直观地体验模型的强大能力。
核心功能架构
多任务支持体系
Florence-2-large-ft支持以下核心视觉任务:
技术架构设计
详细功能实现
1. 图像输入模块
文件上传组件
<div class="image-upload-container">
<input type="file" id="imageInput" accept="image/*" />
<div class="drop-zone" id="dropZone">
<p>拖放图像文件到这里或点击选择</p>
</div>
<div class="url-input">
<input type="text" id="imageUrl" placeholder="输入图像URL" />
<button onclick="loadImageFromUrl()">加载</button>
</div>
</div>
图像预览处理
class ImageProcessor {
constructor() {
this.canvas = document.createElement('canvas');
this.ctx = this.canvas.getContext('2d');
this.maxSize = 1024;
}
async processImage(file) {
const image = await this.loadImage(file);
const resizedImage = await this.resizeImage(image);
return this.canvasToBlob(resizedImage);
}
loadImage(file) {
return new Promise((resolve, reject) => {
const img = new Image();
img.onload = () => resolve(img);
img.onerror = reject;
img.src = URL.createObjectURL(file);
});
}
resizeImage(img) {
const scale = Math.min(this.maxSize / img.width, this.maxSize / img.height);
const width = Math.floor(img.width * scale);
const height = Math.floor(img.height * scale);
this.canvas.width = width;
this.canvas.height = height;
this.ctx.drawImage(img, 0, 0, width, height);
return this.canvas;
}
canvasToBlob(canvas) {
return new Promise(resolve => {
canvas.toBlob(resolve, 'image/jpeg', 0.9);
});
}
}
2. 任务选择与配置
任务类型配置表
| 任务类型 | 提示符 | 输入要求 | 输出格式 |
|---|---|---|---|
| 图像描述 | <CAPTION> | 仅图像 | 纯文本描述 |
| 详细描述 | <DETAILED_CAPTION> | 仅图像 | 详细文本描述 |
| 目标检测 | <OD> | 仅图像 | 边界框+标签 |
| OCR识别 | <OCR> | 仅图像 | 识别文本 |
| 区域OCR | <OCR_WITH_REGION> | 仅图像 | 四边形框+文本 |
| 短语定位 | <CAPTION_TO_PHRASE_GROUNDING> | 图像+文本 | 边界框定位 |
| 区域分割 | <REFERRING_EXPRESSION_SEGMENTATION> | 图像+文本 | 多边形分割 |
任务配置界面
const taskConfigs = {
'<CAPTION>': {
name: '图像描述',
description: '生成图像的简短描述',
requiresText: false,
parameters: {}
},
'<DETAILED_CAPTION>': {
name: '详细描述',
description: '生成图像的详细描述',
requiresText: false,
parameters: {}
},
'<OD>': {
name: '目标检测',
description: '检测图像中的对象并定位',
requiresText: false,
parameters: {
confidenceThreshold: {
type: 'slider',
min: 0.1,
max: 0.9,
step: 0.1,
default: 0.5,
label: '置信度阈值'
}
}
},
'<OCR>': {
name: '文字识别',
description: '识别图像中的文字内容',
requiresText: false,
parameters: {}
},
'<CAPTION_TO_PHRASE_GROUNDING>': {
name: '短语定位',
description: '根据描述文本定位图像中的对应区域',
requiresText: true,
parameters: {
textInput: {
type: 'textarea',
placeholder: '输入描述文本...',
required: true
}
}
}
};
3. 后端API服务
FastAPI后端实现
from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import torch
from PIL import Image
import io
from transformers import AutoProcessor, AutoModelForCausalLM
app = FastAPI(title="Florence-2 Web Interface")
# CORS配置
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 全局模型变量
model = None
processor = None
device = "cuda" if torch.cuda.is_available() else "cpu"
@app.on_event("startup")
async def load_model():
global model, processor
try:
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-large-ft",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
trust_remote_code=True
).to(device)
processor = AutoProcessor.from_pretrained(
"microsoft/Florence-2-large-ft",
trust_remote_code=True
)
print("模型加载成功")
except Exception as e:
print(f"模型加载失败: {e}")
@app.post("/api/process")
async def process_image(
task: str,
image: UploadFile = File(...),
text_input: str = None,
confidence_threshold: float = 0.5
):
try:
# 读取图像
image_data = await image.read()
pil_image = Image.open(io.BytesIO(image_data)).convert("RGB")
# 处理任务
inputs = processor(
text=task if text_input is None else f"{task}{text_input}",
images=pil_image,
return_tensors="pt"
).to(device, torch.float16 if torch.cuda.is_available() else torch.float32)
# 生成结果
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
# 后处理
result = processor.post_process_generation(
generated_text,
task=task,
image_size=(pil_image.width, pil_image.height)
)
return JSONResponse(content={
"success": True,
"result": result,
"task": task
})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/tasks")
async def get_available_tasks():
return {
"tasks": [
{"id": "<CAPTION>", "name": "图像描述", "requires_text": False},
{"id": "<DETAILED_CAPTION>", "name": "详细描述", "requires_text": False},
{"id": "<OD>", "name": "目标检测", "requires_text": False},
{"id": "<OCR>", "name": "文字识别", "requires_text": False},
{"id": "<OCR_WITH_REGION>", "name": "区域文字识别", "requires_text": False},
{"id": "<CAPTION_TO_PHRASE_GROUNDING>", "name": "短语定位", "requires_text": True}
]
}
4. 结果可视化组件
边界框绘制组件
class BoundingBoxRenderer {
constructor(canvas, image) {
this.canvas = canvas;
this.ctx = canvas.getContext('2d');
this.image = image;
this.boxes = [];
this.scale = 1;
this.setupCanvas();
}
setupCanvas() {
const maxWidth = 800;
const maxHeight = 600;
this.scale = Math.min(
maxWidth / this.image.width,
maxHeight / this.image.height
);
this.canvas.width = this.image.width * this.scale;
this.canvas.height = this.image.height * this.scale;
this.ctx.drawImage(
this.image,
0, 0,
this.canvas.width, this.canvas.height
);
}
addBox(box, label, confidence) {
const [x1, y1, x2, y2] = box;
this.boxes.push({
box: [
x1 * this.scale,
y1 * this.scale,
x2 * this.scale,
y2 * this.scale
],
label,
confidence
});
}
render() {
this.ctx.clearRect(0, 0, this.canvas.width, this.canvas.height);
this.ctx.drawImage(
this.image,
0, 0,
this.canvas.width, this.canvas.height
);
this.boxes.forEach(({box, label, confidence}) => {
const [x1, y1, x2, y2] = box;
const width = x2 - x1;
const height = y2 - y1;
// 绘制边界框
this.ctx.strokeStyle = '#ff4757';
this.ctx.lineWidth = 2;
this.ctx.strokeRect(x1, y1, width, height);
// 绘制标签背景
this.ctx.fillStyle = '#ff4757';
const text = `${label} (${(confidence * 100).toFixed(1)}%)`;
const textWidth = this.ctx.measureText(text).width;
this.ctx.fillRect(x1, y1 - 20, textWidth + 10, 20);
// 绘制标签文本
this.ctx.fillStyle = 'white';
this.ctx.font = '12px Arial';
this.ctx.fillText(text, x1 + 5, y1 - 5);
});
}
clear() {
this.boxes = [];
this.ctx.clearRect(0, 0, this.canvas.width, this.canvas.height);
this.ctx.drawImage(
this.image,
0, 0,
this.canvas.width, this.canvas.height
);
}
}
文本结果显示组件
class TextResultRenderer {
constructor(container) {
this.container = container;
}
renderResult(result, taskType) {
this.container.innerHTML = '';
switch(taskType) {
case '<CAPTION>':
case '<DETAILED_CAPTION>':
case '<MORE_DETAILED_CAPTION>':
this.renderCaption(result);
break;
case '<OCR>':
this.renderOCR(result);
break;
case '<OD>':
case '<DENSE_REGION_CAPTION>':
this.renderDetection(result);
break;
case '<OCR_WITH_REGION>':
this.renderOCRWithRegion(result);
break;
default:
this.renderRawResult(result);
}
}
renderCaption(result) {
const div = document.createElement('div');
div.className = 'caption-result';
div.innerHTML = `
<h3>描述结果</h3>
<div class="caption-text">${result['<CAPTION>']}</div>
`;
this.container.appendChild(div);
}
renderDetection(result) {
const detectionData = result['<OD>'];
const div = document.createElement('div');
div.className = 'detection-result';
let html = `
<h3>检测结果</h3>
<div class="detection-stats">
检测到 ${detectionData.bboxes.length} 个对象
</div>
<div class="detection-list">
`;
detectionData.labels.forEach((label, index) => {
const bbox = detectionData.bboxes[index];
html += `
<div class="detection-item">
<span class="label">${label}</span>
<span class="coordinates">
[${bbox[0].toFixed(1)}, ${bbox[1].toFixed(1)},
${bbox[2].toFixed(1)}, ${bbox[3].toFixed(1)}]
</span>
</div>
`;
});
html += '</div>';
div.innerHTML = html;
this.container.appendChild(div);
}
renderOCR(result) {
const ocrData = result['<OCR>'];
const div = document.createElement('div');
div.className = 'ocr-result';
div.innerHTML = `
<h3>文字识别结果</h3>
<div class="ocr-text">${ocrData}</div>
`;
this.container.appendChild(div);
}
}
5. 性能优化策略
模型加载优化
class ModelManager:
def __init__(self):
self.model = None
self.processor = None
self.device = self.get_device()
self.is_loaded = False
def get_device(self):
if torch.cuda.is_available():
return f"cuda:{torch.cuda.current_device()}"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
return "mps"
else:
return "cpu"
async def load_model_async(self):
"""异步加载模型"""
if self.is_loaded:
return True
try:
# 使用线程池避免阻塞主线程
loop = asyncio.get_event_loop()
await loop.run_in_executor(
None,
self._load_model_sync
)
self.is_loaded = True
return True
except Exception as e:
print(f"模型加载失败: {e}")
return False
def _load_model_sync(self):
"""同步加载模型"""
torch_dtype = torch.float16 if self.device.startswith('cuda') else torch.float32
self.model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-large-ft",
torch_dtype=torch_dtype,
trust_remote_code=True,
device_map="auto" if self.device.startswith('cuda') else None
).to(self.device)
self.processor = AutoProcessor.from_pretrained(
"microsoft/Florence-2-large-ft",
trust_remote_code=True
)
# 设置为评估模式
self.model.eval()
请求批处理与缓存
from functools import lru_cache
from datetime import datetime, timedelta
class RequestProcessor:
def __init__(self):
self.request_cache = {}
self.batch_queue = []
self.batch_size = 4
self.batch_timeout = 0.1 # 100ms
@lru_cache(maxsize=1000)
def get_cache_key(self, task, image_hash, text_input=None):
"""生成缓存键"""
key_parts = [task, image_hash]
if text_input:
key_parts.append(text_input)
return hash(tuple(key_parts))
async def process_request(self, task, image, text_input=None):
"""处理单个请求,支持缓存和批处理"""
# 生成图像哈希
image_hash = self._hash_image(image)
cache_key = self.get_cache_key(task, image_hash, text_input)
# 检查缓存
if cache_key in self.request_cache:
cached_result = self.request_cache[cache_key]
if datetime.now() - cached_result['timestamp'] < timedelta(hours=1):
return cached_result['result']
# 添加到批处理队列
request_data = {
'task': task,
'image': image,
'text_input': text_input,
'cache_key': cache_key,
'future': asyncio.Future()
}
self.batch_queue.append(request_data)
# 检查是否达到批处理条件
if len(self.batch_queue) >= self.batch_size:
await self._process_batch()
else:
# 设置超时处理
await asyncio.sleep(self.batch_timeout)
if len(self.batch_queue) > 0:
await self._process_batch()
# 等待结果
result = await request_data['future']
# 缓存结果
self.request_cache[cache_key] = {
'result': result,
'timestamp': datetime.now()
}
return result
async def _process_batch(self):
"""处理批处理请求"""
if not self.batch_queue:
return
batch_requests = self.batch_queue[:self.batch_size]
self.batch_queue = self.batch_queue[self.batch_size:]
try:
# 批量处理逻辑
batch_results = await self._process_batch_requests(batch_requests)
# 设置每个请求的结果
for request, result in zip(batch_requests, batch_results):
request['future'].set_result(result)
except Exception as e:
# 设置错误结果
for request in batch_requests:
request['future'].set_exception(e)
部署与运维
Docker容器化部署
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender-dev \
&& rm -rf /var/lib/apt/lists/*
# 复制项目文件
COPY requirements.txt .
COPY app.py .
COPY model_loader.py .
# 安装Python依赖
RUN pip install --no-cache-dir -r requirements.txt
# 创建非root用户
RUN useradd -m -u 1000 webuser
USER webuser
# 暴露端口
EXPOSE 8000
# 启动命令
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Nginx配置优化
# nginx.conf
events {
worker_connections 1024;
}
http {
upstream florence2_app {
server localhost:8000;
server localhost:8001;
keepalive 32;
}
server {
listen 80;
server_name florence2.example.com;
# 静态文件服务
location /static/ {
alias /app/static/;
expires 1y;
add_header Cache-Control "public, immutable";
}
# API路由
location /api/ {
proxy_pass http://florence2_app;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# 超时设置
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
# 前端路由
location / {
root /app/frontend;
try_files $uri $uri/ /index.html;
}
# 启用gzip压缩
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain text/css text/xml text/javascript
application/javascript application/xml+rss
application/json;
}
}
监控与日志
# monitoring.py
import logging
from prometheus_client import Counter, Histogram, generate_latest
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
# 指标定义
REQUEST_COUNT = Counter(
'florence2_requests_total',
'Total requests',
['method', 'endpoint', 'status_code']
)
REQUEST_LATENCY = Histogram(
'florence2_request_latency_seconds',
'Request latency',
['method', 'endpoint']
)
ERROR_COUNT = Counter(
'florence2_errors_total',
'Total errors',
['error_type']
)
class MonitoringMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
method = request.method
endpoint = request.url.path
# 记录请求开始时间
start_time = time.time()
try:
response = await call_next(request)
# 记录请求指标
REQUEST_COUNT.labels(
method=method,
endpoint=endpoint,
status_code=response.status_code
).inc()
# 记录延迟
latency = time.time() - start_time
REQUEST_LATENCY.labels(
method=method,
endpoint=endpoint
).observe(latency)
return response
except Exception as e:
ERROR_COUNT.labels(error_type=type(e).__name__).inc()
raise
# 日志配置
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('app.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger("florence2-web")
使用指南
快速开始
- 环境准备
# 克隆项目
git clone https://gitcode.com/mirrors/Microsoft/Florence-2-large-ft
# 安装依赖
pip install -r requirements.txt
# 启动Web服务
python app.py
- 模型下载
# 自动下载模型(首次运行)
python -c "
from transformers import AutoModelForCausalLM, AutoProcessor
model = AutoModelForCausalLM.from_pretrained('microsoft/Florence-2-large-ft')
processor = AutoProcessor.from_pretrained('microsoft/Florence-2-large-ft')
"
- 访问界面 打开浏览器访问
http://localhost:8000即可使用Web界面。
功能演示
图像描述任务
- 上传图像文件
- 选择"图像描述"任务
- 点击"处理"按钮
- 查看生成的描述文本
目标检测任务
- 上传包含多个对象的图像
- 选择"目标检测"任务
- 调整置信度阈值
- 查看检测结果和可视化边界框
OCR文字识别
- 上传包含文字的图像
- 选择"文字识别"任务
- 查看识别出的文本内容
高级配置
性能调优
# config.yaml
model:
device: auto
dtype: float16
max_batch_size: 4
server:
workers: 2
timeout: 300
max_requests: 1000
cache:
enabled: true
max_size: 1000
ttl: 3600
自定义任务
# 添加自定义任务处理
class CustomTaskHandler:
def handle_custom_task(self, image, parameters):
# 自定义处理逻辑
pass
故障排除
常见问题解决
| 问题 | 解决方案 |
|---|---|
| 模型加载失败 | 检查网络连接,确保可以访问HuggingFace |
| 内存不足 | 减少批处理大小,使用float16精度 |
| 处理速度慢 | 启用GPU加速,优化图像尺寸 |
| 结果不准确 | 调整置信度阈值,检查图像质量 |
性能监控指标
# 查看服务状态
curl http://localhost:8000/health
# 查看性能指标
curl http://localhost:8000/metrics
# 查看日志
tail -f app.log
总结
Florence-2-large-ft Web可视化操作平台为用户提供了直观、易用的方式来体验这个强大的多任务视觉模型。通过本文介绍的完整架构和实现方案,开发者可以快速搭建自己的可视化平台,充分发挥Florence-2-large-ft在图像理解、目标检测、文字识别等领域的强大能力。
该平台不仅提供了丰富的功能特性,还包含了性能优化、监控运维等生产级特性,确保了系统的稳定性和可扩展性。无论是研究人员还是开发者,都可以通过这个平台更好地理解和应用Florence-2-large-ft模型。
【免费下载链接】Florence-2-large-ft 项目地址: https://ai.gitcode.com/mirrors/Microsoft/Florence-2-large-ft
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



