龙哥以前说过,不懂汇编,就别说自己懂264,确实汇编在视频编解码中的作用太大了。在非opencl等显卡并行优化的平台上,SIMD就成了算法并行处理的唯一渠道。整个X264的代码的精华都在那些汇编文件中,当然,所有的算法都有C的实现,但是为什么X264的编码速度能够达到现在的水平,基本决定于它的汇编优化。
x86inc.asm是x264汇编语言的头文件,和编码算法没有直接的关系,只涉及到跨平台的各种预编译宏。这个文件是为nasm平台写的,所以熟悉WIN32汇编的同学可能对这个文件中的一些代码仍然比较陌生,好在这个文件中包含了很详细的注释,基本上结合注释和nasm的手册的第四章 the nasm preprocessor就能看懂了。其中比较麻烦的就是宏参数的传递,默认参数个数等。
下面是这个文件的一些注释,可能有错误之处,以后改正。。。后面会陆续分析DCT,sad,QUANT,mc,deblock等算法的一些汇编实现。
;*****************************************************************************
;* x86inc.asm
;*****************************************************************************
;* Copyright (C) 2005-2008 x264 project
;*
;* Authors: Loren Merritt <lorenm@u.washington.edu>
;* Anton Mitrofanov <BugMaster@narod.ru>
;*
;* This program is free software; you can redistribute it and/or modify
;* it under the terms of the GNU General Public License as published by
;* the Free Software Foundation; either version 2 of the License, or
;* (at your option) any later version.
;*
;* This program is distributed in the hope that it will be useful,
;* but WITHOUT ANY WARRANTY; without even the implied warranty of
;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;* GNU General Public License for more details.
;*
;* You should have received a copy of the GNU General Public License
;* along with this program; if not, write to the Free Software
;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
;*****************************************************************************
;如果定义ARCH_X86_64,则判断操作系统平台,如果__OUTPUT_FORMAT_为win32则定义WIN64,否则定义UNIX64
%ifdef ARCH_X86_64
%ifidn __OUTPUT_FORMAT__,win32
%define WIN64
%else
%define UNIX64
%endif
%endif
; FIXME: All of the 64bit asm functions that take a stride as an argument
; via register, assume that the high dword of that register is filled with 0.
; This is true in practice (since we never do any 64bit arithmetic on strides,
; and x264's strides are all positive), but is not guaranteed by the ABI.
; Name of the .rodata section.
; Kludge: Something on OS X fails to align .rodata even given an align attribute,
; so use a different read-only section.
;定义代码段的对齐方式,0-1表示可以有0个或者1个参数,如果0个参数的时候%1默认是16,即16位对齐
;.text段可读写,.rodata是只读段 noexec nowrite
%macro SECTION_RODATA 0-1 16
%ifidn __OUTPUT_FORMAT__,macho64
SECTION .text align=%1
%elifidn __OUTPUT_FORMAT__,macho
SECTION .text align=%1
fakegot:
%else
SECTION .rodata align=%1
%endif
%endmacro
; PIC support macros.
; x86_64 can't fit 64bit address literals in most instruction types,
; so shared objects (under the assumption that they might be anywhere
; in memory) must use an address mode that does fit.
; So all accesses to global variables must use this macro, e.g.
; mov eax, [foo GLOBAL]
; instead of
; mov eax, [foo]
;
; PIC参见nasm doc 7.9.3 Position−Independent Code: elf Special Symbols and WRT
; x86_32 doesn't require PIC.
; Some distros prefer shared objects to be PIC, but nothing breaks if
; the code contains a few textrels, so we'll skip that complexity.
; 地址无关代码支持宏
%ifdef WIN64
%define PIC
%elifndef ARCH_X86_64
%undef PIC
%endif
%ifdef PIC
%define GLOBAL wrt rip
%else
%define GLOBAL
%endif
; Macros to eliminate most code duplication between x86_32 and x86_64:
; Currently this works only for leaf functions which load all their arguments
; into registers at the start, and make no other use of the stack. Luckily that
; covers most of x264's asm.
; PROLOGUE:
; %1 = number of arguments. loads them from stack if needed.
; %2 = number of registers used. pushes callee-saved regs if needed.
; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed.
; %4 = list of names to define to registers
; PROLOGUE can also be invoked by adding the same options to cglobal
; e.g.
; cglobal foo, 2,3, dst, src, tmp
; declares a function (foo), taking two args (dst and src) and one local variable (tmp)
; TODO Some functions can use some args directly from the stack. If they're the

本文详细介绍了x264编码器中x86汇编头文件x86inc.asm的重要性和内容,包括跨平台预编译宏、宏参数传递、地址无关代码支持宏等。该文件对于理解x264的汇编优化至关重要,特别是对于视频编解码中的SIMD并行处理。
最低0.47元/天 解锁文章
792

被折叠的 条评论
为什么被折叠?



