Media Foundation Programming--Image Stride

本文详细介绍了视频图像在内存中的存储方式,包括像素排列、行间距(stride)、内存布局(Top-Down vs Bottom-Up),以及不同图像格式(如RGB、YUV)在内存中的组织形式。重点讨论了如何在处理视频图像时考虑stride的影响,提供了通用的图像处理函数模板,并通过具体代码示例展示了如何将32-bit RGB图像转换为AYUV和YV12格式。

When a video image is stored in memory, the memory buffer might contain extra padding bytes after each row of pixels. The padding bytes affect how the image is stored in memory, but do not affect how the image is displayed.

The stride is the number of bytes from one row of pixels in memory to the next row of pixels in memory. Stride is also called pitch. If padding bytes are present, the stride is wider than the width of the image, as shown in the following illustration.

 

Diagram showing padding bytes.

Diagram showing padding bytes.

 

Two buffers that contain video frames with equal dimensions can have two different strides. If you process a video image, you must take the stride into account.

In addition, there are two ways that an image can be arranged in memory. In a top-down image, the top row of pixels in the image appears first in memory. In a bottom-up image, the last row of pixels appears first in memory. The following illustration shows the difference between a top-down image and a bottom-up image.

 

Diagram showing the difference between a top-down image and a bottom-up image.

Diagram showing the difference between a top-down image and a bottom-up image.

 

A bottom-up image has a negative stride, because stride is defined as the number of bytes need to move down a row of pixels, relative to the displayed image. YUV images should always be top-down, and any image that is contained in a Direct3D surface must be top-down. RGB images in system memory are usually bottom-up.

Video transforms in particular need to handle buffers with mismatched strides, because the input buffer might not match the output buffer. For example, suppose that you want to convert a source image and write the result to a destination image. Assume that both images have the same width and height, but might not have the same pixel format or the same image stride.

The following example code shows a generalized approach for writing this kind of function. This is not a complete working example, because it abstracts many of the specific details.

void ProcessVideoImage(
    BYTE*       pDestScanLine0,     
    LONG        lDestStride,        
    const BYTE* pSrcScanLine0,      
    LONG        lSrcStride,         
    DWORD       dwWidthInPixels,     
    DWORD       dwHeightInPixels
    )
{
    for (DWORD y = 0; y < dwHeightInPixels; y++)
    {
        SOURCE_PIXEL_TYPE *pSrcPixel = (SOURCE_PIXEL_TYPE*)pDestScanLine0;
        DEST_PIXEL_TYPE *pDestPixel = (DEST_PIXEL_TYPE*)pSrcScanLine0;

        for (DWORD x = 0; x < dwWidthInPixels; x +=2)
        {
            pDestPixel[x] = TransformPixelValue(pSrcPixel[x]);
        }
        pDestScanLine0 += lDestStride;
        pSrcScanLine0 += lSrcStride;
    }
}

This function takes six parameters:

  • A pointer to the start of scan line 0 in the destination image.
  • The stride of the destination image.
  • A pointer to the start of scan line 0 in the source image.
  • The stride of the source image.
  • The width of the image in pixels.
  • The height of the image in pixels.

The general idea is to process one row at a time, iterating over each pixel in the row. Assume that SOURCE_PIXEL_TYPE and DEST_PIXEL_TYPE are structures representing the pixel layout for the source and destination images, respectively. (For example, 32-bit RGB uses the RGBQUAD structure. Not every pixel format has a pre-defined structure.) Casting the array pointer to the structure type enables you to access the RGB or YUV components of each pixel. At the start of each row, the function stores a pointer to the row. At the end of the row, it increments the pointer by the width of the image stride, which advances the pointer to the next row.

This example calls a hypothetical function named TransformPixelValue for each pixel. This could be any function that calculates a target pixel from a source pixel. Of course, the exact details will depend on the particular task. For example, if you have a planar YUV format, you must access the chroma planes independently from the luma plane; with interlaced video, you might need to process the fields separately; and so forth.

To give a more concrete example, the following code converts a 32-bit RGB image into an AYUV image. The RGB pixels are accessed using an RGBQUAD structure, and the AYUV pixels are accessed using a DXVA2_AYUVSample8 structure structure.

//-------------------------------------------------------------------
// Name: RGB32_To_AYUV
// Description: Converts an image from RGB32 to AYUV
//-------------------------------------------------------------------
void RGB32_To_AYUV(
    BYTE*       pDest,
    LONG        lDestStride,
    const BYTE* pSrc,
    LONG        lSrcStride,
    DWORD       dwWidthInPixels,
    DWORD       dwHeightInPixels
    )
{
    for (DWORD y = 0; y < dwHeightInPixels; y++)
    {
        RGBQUAD             *pSrcPixel = (RGBQUAD*)pSrc;
        DXVA2_AYUVSample8   *pDestPixel = (DXVA2_AYUVSample8*)pDest;
        
        for (DWORD x = 0; x < dwWidthInPixels; x++)
        {
            pDestPixel[x].Alpha = 0x80;
            pDestPixel[x].Y = RGBtoY(pSrcPixel[x]);   
            pDestPixel[x].Cb = RGBtoU(pSrcPixel[x]);   
            pDestPixel[x].Cr = RGBtoV(pSrcPixel[x]);   
        }
        pDest += lDestStride;
        pSrc += lSrcStride;
    }
}

The next example converts a 32-bit RGB image to a YV12 image. This example shows how to handle a planar YUV format. (YV12 is a planar 4:2:0 format.) In this example, the function maintains three separate pointers for the three planes in the target image. However, the basic approach is the same as the previous example.

void RGB32_To_YV12(
    BYTE*       pDest,
    LONG        lDestStride,
    const BYTE* pSrc,
    LONG        lSrcStride,
    DWORD       dwWidthInPixels,
    DWORD       dwHeightInPixels
    )
{
    assert(dwWidthInPixels % 2 == 0);
    assert(dwHeightInPixels % 2 == 0);

    const BYTE *pSrcRow = pSrc;
    
    BYTE *pDestY = pDest;

    // Calculate the offsets for the V and U planes.

    // In YV12, each chroma plane has half the stride and half the height  
    // as the Y plane.
    BYTE *pDestV = pDest + (lDestStride * dwHeightInPixels);
    BYTE *pDestU = pDest + 
                   (lDestStride * dwHeightInPixels) + 
                   ((lDestStride * dwHeightInPixels) / 4);

    // Convert the Y plane.
    for (DWORD y = 0; y < dwHeightInPixels; y++)
    {
        RGBQUAD *pSrcPixel = (RGBQUAD*)pSrcRow;
        
        for (DWORD x = 0; x < dwWidthInPixels; x++)
        {
            pDestY[x] = RGBtoY(pSrcPixel[x]);    // Y0
        }
        pDestY += lDestStride;
        pSrcRow += lSrcStride;
    }

    // Convert the V and U planes.

    // YV12 is a 4:2:0 format, so each chroma sample is derived from four 
    // RGB pixels.
    pSrcRow = pSrc;
    for (DWORD y = 0; y < dwHeightInPixels; y += 2)
    {
        RGBQUAD *pSrcPixel = (RGBQUAD*)pSrcRow;
        RGBQUAD *pNextSrcRow = (RGBQUAD*)(pSrcRow + lSrcStride);

        BYTE *pbV = pDestV;
        BYTE *pbU = pDestU;

        for (DWORD x = 0; x < dwWidthInPixels; x += 2)
        {
            // Use a simple average to downsample the chroma.

            *pbV++ = ( RGBtoV(pSrcPixel[x]) +
                       RGBtoV(pSrcPixel[x + 1]) +       
                       RGBtoV(pNextSrcRow[x]) +         
                       RGBtoV(pNextSrcRow[x + 1]) ) / 4;        

            *pbU++ = ( RGBtoU(pSrcPixel[x]) +
                       RGBtoU(pSrcPixel[x + 1]) +       
                       RGBtoU(pNextSrcRow[x]) +         
                       RGBtoU(pNextSrcRow[x + 1]) ) / 4;    
        }
        pDestV += lDestStride / 2;
        pDestU += lDestStride / 2;
        
        // Skip two lines on the source image.
        pSrcRow += (lSrcStride * 2);
    }
}
### 条件生成对抗网络(Conditional GAN)在图像到图像转换中的应用 条件生成对抗网络(Conditional GAN, cGAN)是一种扩展版的生成对抗网络(GAN),它允许模型基于额外的信息(如类别标签、另一张图片等)生成更具体的输出。cGAN 的核心思想是在生成器和判别器中都加入条件变量 \( y \),从而指导生成过程。 #### Conditional GAN 的基本架构 在一个典型的 cGAN 中,生成器 \( G \) 接收输入噪声向量 \( z \) 和条件变量 \( y \),并生成一张假图 \( G(z|y) \)。判别器 \( D \) 则接收真实图像或生成图像以及对应的条件变量 \( y \),判断该图像是否为真实的。这种机制使得生成器能够根据给定条件生成更加精确的结果[^1]。 对于 **图像到图像转换** 任务来说,\( y \) 往往是一幅输入图像而非简单的分类标签。例如,在 Pix2Pix 模型中,输入可以是一个边缘检测后的草图,而输出则是对应的真实照片;或者反过来,从卫星地图生成街道视图[^2]。 以下是实现此类功能的一个典型例子: ```python import torch.nn as nn class Generator(nn.Module): def __init__(self): super(Generator, self).__init__() # 定义编码部分... self.encoder = nn.Sequential( nn.Conv2d(in_channels=3, out_channels=64, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2, inplace=True), ... ) # 解码器层定义... self.decoder = nn.Sequential( nn.ConvTranspose2d(...), nn.ReLU(), ... nn.Tanh() ) def forward(self, input_image, condition_image): combined_input = torch.cat((input_image, condition_image), dim=1) encoded_features = self.encoder(combined_input) output_image = self.decoder(encoded_features) return output_image class Discriminator(nn.Module): def __init__(self): super(Discriminator, self).__init__() self.main = nn.Sequential( nn.Conv2d(in_channels=6, out_channels=64, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2, inplace=True), ... nn.Sigmoid() ) def forward(self, real_or_fake_image, condition_image): combined_input = torch.cat((real_or_fake_image, condition_image), dim=1) validity = self.main(combined_input) return validity ``` 在这个代码片段里,`Generator` 类接受两张图像作为输入——原始图像 `condition_image` 及其期望变换形式的目标域表示 `input_image` ——并通过一系列卷积操作将其映射至新的空间表征再重构回目标图像。同样地,`Discriminator` 负责评估所给出的一对图像的真实性程度[^3]。 #### 实现注意事项 当构建用于实际场景下的 conditional GANs 时需要注意几个方面: - 数据预处理:确保训练集中每一对匹配良好的源与目标样本; - 损失函数设计:除了标准的对抗损失外还可以考虑增加像素级 L1/L2 loss 或者感知损失(perceptual losses)来提升视觉质量; - 正则化技术的应用比如谱归一化(spectral normalization)可以帮助稳定训练过程减少模式崩溃(mode collapse)现象的发生概率[^4]。 --- ###
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值