用Ultra-Light-Fast-Generic-Face-Detector-1MB寻找人眼-优快云博客

用户Linzaer发布了一款仅1MB大小的超轻量级人脸检测模型，适用于边缘计算和移动端。该模型在320x240分辨率下计算量仅为90MFlops。文章介绍了如何将FDDB数据集转换为模型所需的VOC格式，通过TengineKit获取关键点信息，然后利用Python代码生成符合YOLO格式的标注文件，并处理了可能导致训练时loss为nan的问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 Ultra-Light-Fast-Generic-Face-Detector-1MB简介

用户 Linzaer 在 Github 上推出了一款适用于边缘计算设备、移动端设备以及 PC 的超轻量级通用人脸检测模型，该模型文件大小仅 1MB，320x240 输入下计算量仅 90MFlops。项目推出不久即引起了大家的关注，登上了今天的 Github trending。
github地址：https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB

2 生成自己需要的label

因为Ultra-face需要使用 voc的数据格式，所以我这里是使用的yolo的数据集格式转变为voc的格式。
我这里用的是FDDB数据集，因为FDDB数据集没有标签我们要生成人眼的标签。首先使用
https://github.com/OAID/TengineKit/tree/master/Linux/sample/FaceDemo
TenginekIT 是定位人脸221个关键点输出txt文档
TengineKIT结果
然后根据212人脸关键点中人眼的关键点定位眼睛的中心位置和w，h。生成yolo的数据格式
finame label x_c Y_c w h
代码段

f = open("test.txt")  # 返回一个文件对象
lines = f.readlines()
for i in range(1,16909,2):
    line = lines[i]
    arr = line.split(',')
    x = int(arr[252]) + int(arr[236])
    y = int(arr[253]) + int(arr[237])
    w = int(arr[236])-int(arr[252])
    h = int(arr[260])-int(arr[244])
    x=x/2
    y=y/2
    w=w/2*6
    h=h/2*8
    #挑选出图片中眼睛较小的图片
    if w < 10:
        t = lines[i - 1]
        newname = line.replace(line, t)
        newfile = open("1.txt", 'a')
        newfile.write(newname)
        newfile.close()
    if h>10:
        x1 = int(x - (w / 2))
        y1 = int(y - (h / 2))
        x2 = int(x + (w / 2))
        y2 = int(y + (h / 2))
    else:
        h=w/2
        x1 = int(x - (w / 2))
        y1 = int(y - (h / 2))
        x2 = int(x + (w / 2))
        y2 = int(y + (h / 2))
    l=(x1,y1,x2,y2)
    l="1"+" "+str(l)
    l1=l.replace("(","")
    l2 = l1.replace(")", "")
    l3=l2.replace(",","")
    print(l3)
    newname=line.replace(line,l3)
    newfile=open("righteye.txt", 'a')
    newfile.write(newname+'\n')
    newfile.close()
for i in range(1,16909,2):
    line = lines[i]
    arr = line.split(',')
    #左眼
    x=int(arr[204])+int(arr[220])
    y=int(arr[205])+int(arr[221])
    w = int(arr[220])-int(arr[204])
    h = int(arr[212])-int(arr[228])
    x=x/2
    y=y/2
    w=w/2*6
    h=h/2*8
    if w < 10:
        t = lines[i - 1]
        newname = line.replace(line, t)
        newfile = open("1.txt", 'a')
        newfile.write(newname)
        newfile.close()
    if h>10:
        x1 = int(x - (w / 2))
        y1 = int(y - (h / 2))
        x2 = int(x + (w / 2))
        y2 = int(y + (h / 2))
    else:
        h=w/2
        x1 = int(x - (w / 2))
        y1 = int(y - (h / 2))
        x2 = int(x + (w / 2))
        y2 = int(y + (h / 2))
    l=(x1,y1,x2,y2)
    l="1"+" "+str(l)
    l1=l.replace("(","")
    l2 = l1.replace(")", "")
    l3=l2.replace(",","")
    print(l3)
    newname=line.replace(line,l3)
    newfile=open("lefteye.txt", 'a')
    newfile.write(newname+'\n')
    newfile.close()
#生成image txt
for i in range(0,16908,2):
    line = lines[i]
    arr = line.split(',')
    l1=line.replace("./resources/image/","")
    newname=l1.replace(",","")
    print(newname)
    newfile=open("image.txt",'a')
    newfile.write(newname)
    newfile.close()
f.close()


with open('image.txt','r') as fa: # 读取需要拼接的前面那个TXT
    with open('righteye.txt','r') as fb: #读取需要拼接的后面那个TXT
        with open('f1.txt','w') as fc: # 写入新的TXT
            for line in fa:
                fc.write(line.strip('\r\n')) #用于移除字符串头尾指定的字符
                fc.write(fb.readline())



with open('image.txt','r') as fa: # 读取需要拼接的前面那个TXT
    with open('lefteye.txt','r') as fb: #读取需要拼接的后面那个TXT
        with open('f2.txt','w') as fc: # 写入新的TXT
            for line in fa:
                fc.write(line.strip('\r\n')) #用于移除字符串头尾指定的字符
                fc.write(fb.readline())

with open('f1.txt','r') as fa: # 读取需要拼接的前面那个TXT
    with open('f2.txt','r') as fb: #读取需要拼接的后面那个TXT
        with open('final.txt','w') as fc: # 写入新的TXT
            for line in fa:
                fc.write(line.strip('\r')) #用于移除字符串头尾指定的字符
                fc.write(fb.readline())

然后用yolo转xml的代码生成
但是在跑实验发现loss为nan，使用Ultra-face里面的check_gt_box.py发现数据分布如下图在这里插入图片描述
根据图片发现在0，0点有部分数据，猜想可能是这部分数据让loss=nan，
于是在generate_txt里面挑出这一部分数据，设置一个阈值，小于阈值的就删除，发现果然可以正常训练了，附上结果图