基于 MODNet 和 Face Parsing 实现高质量人像分割与换发色

最新推荐文章于 2025-09-29 09:44:54 发布

原创

最新推荐文章于 2025-09-29 09:44:54 发布 · 963 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #计算机视觉

一. 发色替换

在虚拟形象编辑、发型试换、人脸美颜等场景中，高质量的人像分割与发色替换已经成为图像处理领域的重要应用需求。

传统的图像分割方法往往难以精准处理复杂背景和细节结构，尤其是在人发区域容易出现边缘锯齿、融合不自然等问题。

本文尝试通过深度学习并结合传统的图像处理来实现这一功能。

二. 算法流程

基于深度学习的发色替换大致流程如下：

算法流程.png

大致步骤如下：

人像前景提取：使用 MODNet 对原图进行前景分割，获取 mask。
人像区域裁剪：在 mask 基础上，对人像区域进行轮廓分析与扩展裁剪，特别考虑女生长发等情况，避免头发被截断。裁剪后得到一个人像局部 ROI，用于精细的人脸语义分割。
Face Parsing 主要是获取头发区域：将裁剪后的人像 ROI 输入 face parsing 模型，获得每个像素的语义标签图。从中提取出头发对应的区域，生成头发掩码。
掩码映射与对齐：将 parsing 得到的头发掩码 resize 回裁剪前的人像 ROI 尺寸，再映射回整张原图中，形成与原图一致大小的 hair_mask。
发色替换与融合（HSV 模型）：将原图转换为 HSV 色彩空间，在 hair_mask 区域内替换 H 通道为目标色相，适当增强 S 通道（饱和度），并保留 V 通道（亮度），确保头发结构、阴影与高光不被破坏。最后，将替换后的图像转换回 BGR，并仅在头发区域进行像素融合。
最终输出：生成的结果图像保留了原图背景、人像五官与光影信息，仅头发区域被自然地替换为目标发色。

三. 整体的实现

整个流程涉及到两个模型，每个模型的部署和加速都使用 ONNXRuntime，它们的调用我都封装好了。

下面以 Modnet 模型的调用为例：

#include "../onnxruntime/OnnxRuntimeBase.h"

usingnamespace cv;
usingnamespacestd;
usingnamespace Ort;

class Modnet:public OnnxRuntimeBase {
public:
    Modnet(std::string modelPath, constchar* logId, constchar* provider);

    void inferImage(Mat& src, Mat& mask);

private:
    void preprocess(const cv::Mat& image);
    Mat postprocess(float* output_data, int width, int height);
    vector<float> input_image_;

    int inpWidth;
    int inpHeight;
};

#include "../../include/faceBeauty/Modnet.h"

Modnet::Modnet(std::string modelPath, constchar* logId, constchar* provider): OnnxRuntimeBase(modelPath, logId, provider)
{
    this->inpHeight = 512;
    this->inpWidth = 512;
}

void Modnet::preprocess(const cv::Mat& image) {
    cv::Mat resized, float_img;
    cv::resize(image, resized, cv::Size(512, 512));
    resized.convertTo(float_img, CV_32FC3, 1.0 / 255.0);
    this->input_image_.resize(this->inpWidth * this->inpHeight * image.channels());

    for (int c = 0; c < 3; ++c)
        for (int h = 0; h < this->inpHeight; ++h)
            for (int w = 0; w < this->inpWidth; ++w)
                this->input_image_[c * this->inpHeight * this->inpWidth + h * this->inpWidth + w] = float_img.at<cv::Vec3f>(h, w)[c];
}

// 假设 postprocess() 将 MODNet 输出 float* 转为 CV_32FC1 alpha mask（取值范围 0~1）
cv::Mat Modnet::postprocess(float* output_data, int width, int height) {
    cv::Mat alpha(height, width, CV_32FC1, output_data);
    // 拷贝一份，防止 output_data 被释放
    return alpha.clone();
}

void Modnet::inferImage(Mat& src, Mat& mask) {
    this->preprocess(src);
    std::array<int64_t,4> input_shape {1,3, this->inpHeight, this->inpWidth};

    Ort::Value input_tensor_ = Ort::Value::CreateTensor<float>(memory_info_handler, input_image_.data(), input_image_.size(), input_shape.data(), input_shape.size());

    vector<Value> ort_outputs = this -> forward(input_tensor_);

    auto output_data = ort_outputs[0].GetTensorMutableData<float>();

    cv::Mat alpha = this -> postprocess(output_data, this->inpWidth, this->inpHeight);  // CV_32FC1
    cv::resize(alpha, alpha, src.size());                // 缩放到原图尺寸
    alpha.convertTo(alpha, CV_32FC1);                    // 确保是 float32 类型

    mask = alpha;
}

Modnet 模型的调用需要输

最低0.47元/天解锁文章