Jetson Xavier AGX 下基于 GStreamer + CUDA + OpenCV 的 四路 RTSP → 硬解码 → CUDA remap → 拼接 → GStreamer 推 RTSP 的 C++代码。
📌 一、思路
1️⃣ 用 GStreamer pipeline 拉取 RTSP 流
2️⃣ 用 nvv4l2decoder
实现 硬件 H.264/H.265 解码
3️⃣ 将解码后的帧放入 appsink
,交给 OpenCV (或 CUDA)
4️⃣ 用 cv::cuda::remap
做畸变矫正
5️⃣ 用 OpenCV CUDA 拼接
6️⃣ 拼接好的帧放回 appsrc
,通过 nvv4l2h264enc
编码
7️⃣ rtspclientsink
或 udpsink
推给外部 RTSP 服务器(比如 mediamtx)
全流程用
appsink
/appsrc
把 GStreamer 和 OpenCV/CUDA 串起来
核心思路
-
每路 RTSP 用独立的 GStreamer pipeline +
appsink
拉帧。 -
用 OpenCV CUDA (
cv::cuda
) 做remap
。 -
拼接后从 GPU 拷贝回 CPU (
GpuMat.download
),推给appsrc
。 -
appsrc
是把 CPU 内存帧送回 GStreamer 编码并通过 RTSP 回推。
nvv4l2decoder
→ Jetson 硬件 H264 解码单元
cv::cuda::remap
→ GPU 上跑畸变校正
appsrc
/appsink
→ GStreamer 和 OpenCV 交互全流程 GPU 加速,CPU 只做拼接上传和结果推送。
✅ 二、环境检查
1、确认 nvv4l2decoder
、nvvidconv
可用
在终端执行:
gst-inspect-1.0 nvv4l2decoder gst-inspect-1.0 nvvidconv
如果输出了插件描述,比如:
gst-inspect-1.0 nvv4l2decoder
Factory Details:
Rank primary + 11 (267)
Long-name NVIDIA v4l2 video decoder
Klass Codec/Decoder/Video
Description Decode video streams via V4L2 API
Author Nicolas Dufresne <nicolas.dufresne@collabora.com>, Viranjan Pagar <vpagar@nvidia.com>
Plugin Details:
Name nvvideo4linux2
Description Nvidia elements for Video 4 Linux
Filename /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideo4linux2.so
Version 1.14.0
License LGPL
Source module nvvideo4linux2
Binary package nvvideo4linux2
Origin URL http://nvidia.com/
GObject
+----GInitiallyUnowned
+----GstObject
+----GstElement
+----GstVideoDecoder
+----GstNvV4l2VideoDec
+----nvv4l2decoder
Pad Templates:
SRC template: 'src'
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]
SINK template: 'sink'
Availability: Always
Capabilities:
image/jpeg
video/x-h264
stream-format: { (string)byte-stream }
alignment: { (string)au }
video/x-h265
stream-format: { (string)byte-stream }
alignment: { (string)au }
video/mpeg
mpegversion: 4
systemstream: false
parsed: true
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
video/mpeg
mpegversion: [ 1, 2 ]
systemstream: false
parsed: true
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
video/x-divx
divxversion: [ 4, 5 ]
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
video/x-vp8
video/x-vp9
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
Element has no clocking capabilities.
Element has no URI handling capabilities.
Pads:
SINK: 'sink'
Pad Template: 'sink'
SRC: 'src'
Pad Template: 'src'
Element Properties:
name : The name of the object
flags: readable, writable
String. Default: "nvv4l2decoder0"
parent : The parent of the object
flags: readable, writable
Object of type "GstObject"
device : Device location
flags: readable
String. Default: "/dev/nvhost-nvdec"
device-name : Name of the device
flags: Opening in BLOCKING MODE
Opening in BLOCKING MODE
readable
String. Default: ""
device-fd : File descriptor of the device
flags: readable
Integer. Range: -1 - 2147483647 Default: -1
output-io-mode : Output side I/O mode (matches sink pad)
flags: readable, writable
Enum "GstNvV4l2DecOutputIOMode" Default: 0, "auto"
(0): auto - GST_V4L2_IO_AUTO
(2): mmap - GST_V4L2_IO_MMAP
(3): userptr - GST_V4L2_IO_USERPTR
capture-io-mode : Capture I/O mode (matches src pad)
flags: readable, writable
Enum "GstNvV4l2DecCaptureIOMode" Default: 0, "auto"
(0): auto - GST_V4L2_IO_AUTO
(2): mmap - GST_V4L2_IO_MMAP
extra-controls : Extra v4l2 controls (CIDs) for the device
flags: readable, writable
Boxed pointer of type "GstStructure"
skip-frames : Type of frames to skip during decoding
flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
Enum "SkipFrame" Default: 0, "decode_all"
(0): decode_all - Decode all frames
(1): decode_non_ref - Decode non-ref frames
(2): decode_key - decode key frames
drop-frame-interval : Interval to drop the frames,ex: value of 5 means every 5th frame will be given by decoder, rest all dropped
flags: readable, writable, changeable only in NULL or READY state
Unsigned Integer. Range: 0 - 30 Default: 0
num-extra-surfaces : Additional number of surfaces in addition to min decode surfaces given by the v4l2 driver
flags: readable, writable, changeable only in NULL or READY state
Unsigned Integer. Range: 0 - 24 Default: 1
disable-dpb : Set to disable DPB buffer for low latency
flags: readable, writable
Boolean. Default: false
enable-full-frame : Whether or not the data is full framed
flags: readable, writable
Boolean. Default: false
enable-frame-type-reporting: Set to enable frame type reporting
flags: readable, writable
Boolean. Default: false
enable-error-check : Set to enable error check
flags: readable, writable
Boolean. Default: false
enable-max-performance: Set to enable max performance
flags: readable, writable
Boolean. Default: false
mjpeg : Set to open MJPEG block
flags: readable, writable
Boolean. Default: false
gst-inspect-1.0 nvvidconv
gst-inspect-1.0 nvvidconv
Factory Details:
Rank primary (256)
Long-name NvVidConv Plugin
Klass Filter/Converter/Video/Scaler
Description Converts video from one colorspace to another & Resizes
Author amit pandya <apandya@nvidia.com>
Plugin Details:
Name nvvidconv
Description video Colorspace conversion & scaler
Filename /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvidconv.so
Version 1.2.3
License Proprietary
Source module gstreamer-nvvconv-plugin
Binary package GStreamer nvvconv Plugin
Origin URL http://nvidia.com/
GObject
+----GInitiallyUnowned
+----GstObject
+----GstElement
+----GstBaseTransform
+----Gstnvvconv
Pad Templates:
SINK template: 'sink'
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
format: { (string)I420, (string)I420_10LE, (string)P010_10LE, (string)I420_12LE, (string)UYVY, (string)YUY2, (string)YVYU, (string)NV12, (string)NV16, (string)NV24, (string)GRAY8, (string)BGRx, (string)RGBA, (string)Y42B }
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]
video/x-raw
format: { (string)I420, (string)UYVY, (string)YUY2, (string)YVYU, (string)NV12, (string)NV16, (string)NV24, (string)P010_10LE, (string)GRAY8, (string)BGRx, (string)RGBA, (string)Y42B }
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]
SRC template: 'src'
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
format: { (string)I420, (string)I420_10LE, (string)P010_10LE, (string)UYVY, (string)YUY2, (string)YVYU, (string)NV12, (string)NV16, (string)NV24, (string)GRAY8, (string)BGRx, (string)RGBA, (string)Y42B }
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]
video/x-raw
format: { (string)I420, (string)UYVY, (string)YUY2, (string)YVYU, (string)NV12, (string)NV16, (string)NV24, (string)GRAY8, (string)BGRx, (string)RGBA, (string)Y42B }
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]
Element has no clocking capabilities.
Element has no URI handling capabilities.
Pads:
SINK: 'sink'
Pad Template: 'sink'
SRC: 'src'
Pad Template: 'src'
Element Properties:
name : The name of the object
flags: readable, writable
String. Default: "nvvconv0"
parent : The parent of the object
flags: readable, writable
Object of type "GstObject"
qos : Handle Quality-of-Service events
flags: readable, writable
Boolean. Default: false
silent : Produce verbose output ?
flags: readable, writable
Boolean. Default: false
flip-method : video flip methods
flags: readable, writable, controllable
Enum "GstNvVideoFlipMethod" Default: 0, "none"
(0): none - Identity (no rotation)
(1): counterclockwise - Rotate counter-clockwise 90 degrees
(2): rotate-180 - Rotate 180 degrees
(3): clockwise - Rotate clockwise 90 degrees
(4): horizontal-flip - Flip horizontally
(5): upper-right-diagonal - Flip across upper right/lower left diagonal
(6): vertical-flip - Flip vertically
(7): upper-left-diagonal - Flip across upper left/lower right diagonal
output-buffers : number of output buffers
flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
Unsigned Integer. Range: 1 - 4294967295 Default: 4
interpolation-method: Set interpolation methods
flags: readable, writable, controllable
Enum "GstInterpolationMethod" Default: 0, "Nearest"
(0): Nearest - Nearest
(1): Bilinear - Bilinear
(2): 5-Tap - 5-Tap
(3): 10-Tap - 10-Tap
(4): Smart - Smart
(5): Nicest - Nicest
left : Pixels to crop at left
flags: readable, writable
Integer. Range: 0 - 2147483647 Default: 0
right : Pixels to crop at right
flags: readable, writable
Integer. Range: 0 - 2147483647 Default: 0
top : Pixels to crop at top
flags: readable, writable
Integer. Range: 0 - 2147483647 Default: 0
bottom : Pixels to crop at bottom
flags: readable, writable
Integer. Range: 0 - 2147483647 Default: 0
bl-output : Blocklinear output, applicable only for memory:NVMM NV12 format output buffer
flags: readable, writable
Boolean. Default: true
如果输出了插件描述,那就表示可用。
若显示:
No such element or plugin 'nvv4l2decoder'
就说明你 JetPack 或 GStreamer 没配好,需要重装 GStreamer + NVIDIA 插件。
2、检查 CUDA 版本
可以在 Python 里验证:
python3 -c "import cv2; print(cv2.getBuildInformation())" | grep CUDA
要看到:
NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
3、推流服务
1️⃣ 先有服务端:
必须先跑一个 RTSP 服务器(最常用 mediamtx
)。
它监听 rtsp://0.0.0.0:8554
,等待客户端推流进来。
2️⃣ 再有推流端:
Jetson 用 appsrc
+ x264enc
+ rtspclientsink
,往 rtsp://192.168.10.20:8554/live
推流。
3️⃣ 最后才有拉流端:
其他地方用 gst-launch-1.0 rtspsrc
或 VLC 去拉 rtsp://192.168.10.20:8554/live
。
# 下载二进制
wget https://github.com/bluenviron/mediamtx/releases/latest/download/mediamtx_linux_arm64.tar.gz
tar -xzvf mediamtx_linux_arm64.tar.gz
# 启动
./mediamtx
默认会监听 rtsp://0.0.0.0:8554
。
执行上面代码生成的./demo,然后查看端口情况。
sudo lsof -i :8554
输出:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mediamtx 25235 nvidia 7u IPv6 525958 0t0 TCP *:8554 (LISTEN)
此处我用了RTMP服务。注意IP修改为实际IP。
检查推流服务是否可以启动:验证RTMP服务端是否正常:
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,format=I420 ! x264enc tune=zerolatency ! flvmux streamable=true ! rtmpsink location="rtmp://192.168.10.236:1935/live/mystream"
然后另一个终端打开
sudo apt install ffmpeg
ffplay rtmp://192.168.10.236:1935/live/mystream
或者VLC打开链接
rtmp://192.168.10.236:1935/live/mystream
显示如下:
-
若此测试 可播放,证明
mediamtx
没问题,网络没问题,ffplay
没问题。
✅ 三、最小试验
先搞定一个「最小可运行、可播放」的纯 GStreamer + OpenCV 推流示例,验证整个 appsrc
→ x264enc
→ flvmux
→ rtmpsink
链路。
main.cpp如下:
#include <gst/gst.h>
#include <gst/app/gstappsrc.h>
#include <opencv2/opencv.hpp>
#include <thread>
#include <chrono>
#include <iostream>
using namespace cv;
int main() {
gst_init(nullptr, nullptr);
int width = 1280;
int height = 720;
int fps = 30;
// ---- 构造 pipeline ----
std::string launch =
"appsrc name=mysrc is-live=true block=true format=time ! "
"videoconvert ! video/x-raw,format=I420 ! "
"x264enc speed-preset=ultrafast tune=zerolatency ! "
"flvmux streamable=true name=mux ! "
"rtmpsink location=rtmp://192.168.10.236:1935/live/mystream";
GError* err = nullptr;
GstElement* pipeline = gst_parse_launch(launch.c_str(), &err);
if (!pipeline) {
std::cerr << "Failed to create pipeline: " << err->message << std::endl;
g_error_free(err);
return -1;
}
GstElement* appsrc = gst_bin_get_by_name(GST_BIN(pipeline), "mysrc");
// 设置 caps
GstCaps* caps = gst_caps_new_simple("video/x-raw",
"format", G_TYPE_STRING, "I420",
"width", G_TYPE_INT, width,
"height", G_TYPE_INT, height,
"framerate", GST_TYPE_FRACTION, fps, 1,
nullptr);
gst_app_src_set_caps(GST_APP_SRC(appsrc), caps);
gst_caps_unref(caps);
gst_element_set_state(pipeline, GST_STATE_PLAYING);
guint64 timestamp = 0;
// 用黑图代替真视频
Mat blackBGR(height, width, CV_8UC3, Scalar(0, 0, 0));
for (int i = 0; i < fps * 10; ++i) { // 只推 10 秒
// BGR -> I420
Mat frameI420;
cvtColor(blackBGR, frameI420, COLOR_BGR2YUV_I420);
// 分配 buffer
GstBuffer* buffer = gst_buffer_new_allocate(nullptr, frameI420.total() * frameI420.elemSize(), nullptr);
GstMapInfo map;
gst_buffer_map(buffer, &map, GST_MAP_WRITE);
memcpy(map.data, frameI420.data, frameI420.total() * frameI420.elemSize());
gst_buffer_unmap(buffer, &map);
GST_BUFFER_PTS(buffer) = timestamp;
GST_BUFFER_DURATION(buffer) = gst_util_uint64_scale_int(1, GST_SECOND, fps);
timestamp += GST_BUFFER_DURATION(buffer);
GstFlowReturn ret;
g_signal_emit_by_name(appsrc, "push-buffer", buffer, &ret);
gst_buffer_unref(buffer);
if (ret != GST_FLOW_OK) {
std::cerr << "Push buffer failed!" << std::endl;
break;
}
std::this_thread::sleep_for(std::chrono::milliseconds(1000 / fps));
}
gst_element_send_event(appsrc, gst_event_new_eos());
gst_element_set_state(pipeline, GST_STATE_NULL);
gst_object_unref(pipeline);
std::cout << "Done." << std::endl;
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(demo)
find_package(OpenCV REQUIRED)
find_package(PkgConfig REQUIRED)
pkg_check_modules(GST REQUIRED gstreamer-1.0>=1.14 gstreamer-app-1.0)
find_package(Threads REQUIRED)
include_directories(
${OpenCV_INCLUDE_DIRS}
${GST_INCLUDE_DIRS}
)
add_executable(demo main.cpp)
target_link_libraries(demo
${OpenCV_LIBS}
${GST_LIBRARIES}
${CMAKE_THREAD_LIBS_INIT}
)
另起终端:
ffplay rtmp://192.168.10.236:1935/live/mystream
或者VLC输入:rtmp://192.168.10.236:1935/live/mystream
✅ 这个跑通就说明:
-
appsrc
→x264enc
→flvmux
→rtmpsink
没有结构性 bug; -
mediamtx
服务、网络、推流地址没问题; -
后续只需把黑帧替换成拼接好的
remap
结果即可。
✅ 四、主线流程原理
「转 I420 → appsrc → x264enc → flvmux → rtmpsink」 的核心原理和技术主线:
🎯 1️⃣ 为什么要转成 I420?
-
OpenCV 默认处理的是 BGR 格式,摄像头、视频文件也常是 BGR 或 YUV。
-
视频编码器(H.264)对输入帧通常要求 YUV 格式,最典型就是 I420(即 YUV420P)。
-
I420:先是平面 Y(亮度),接着是 U、V 两个色度分量,UV 都是 1/4 分辨率。
-
转成 I420,既节省带宽,也能让 H.264 编码器正确工作,避免颜色异常。
🎯 2️⃣ appsrc 的角色:
-
appsrc
是 GStreamer 的“用户态数据源”。 -
意思是:GStreamer 自己没有采集摄像头或读取文件,而是由 你(C++)在循环里把帧喂给它。
-
它把你的内存帧封装成 GstBuffer,插到后面的管道里流转。
🎯 3️⃣ x264enc 的作用:
-
x264enc
是 GStreamer 插件里调用 x264 库的模块。 -
它把 I420 原始帧压缩成 H.264 编码帧。
-
speed-preset=ultrafast
和tune=zerolatency
是典型低延迟直播配置:-
ultrafast 牺牲压缩率换速度;
-
zerolatency 关闭 B 帧等缓冲,保证实时输出。
-
🎯 4️⃣ flvmux:
-
flvmux
把压缩后的视频(H.264)封装成 FLV 容器格式。 -
RTMP 推流协议要求封装为 FLV,然后才发送。
-
没有 mux,就只是裸 H.264 流,不符合 RTMP 要求。
🎯 5️⃣ rtmpsink:
-
rtmpsink
把封装好的 FLV 流,通过 RTMP 协议,推送到流媒体服务器(如 Nginx-RTMP、MediaMTX、SRS)。 -
这一步就把你的自制帧变成了观众/播放器可拉的 RTMP 直播流。
🎯 6️⃣ 数据主线:
[OpenCV Mat (BGR)]
→ cv::cvtColor → I420
→ GstBuffer
→ appsrc
→ x264enc
→ flvmux
→ rtmpsink → RTMP 推到流媒体服务器
总结
整个过程是:
1️⃣ 采帧/处理帧(拼接、去畸变)
2️⃣ 格式转化(BGR → I420)
3️⃣ 把帧做成 GstBuffer
4️⃣ 通过 appsrc 注入到 GStreamer
5️⃣ 用 x264enc 压缩成 H.264
6️⃣ 用 flvmux 封装成 FLV
7️⃣ rtmpsink 通过 RTMP 发到服务器
链路意义:
帧的生成与内容(自定义画面、拼接结果、加水印都可以)
用 GStreamer 保证编码稳定、低延迟、跨平台
用 RTMP 兼容绝大多数直播播放器、网页、推流平台
全流程只要带宽够,Jetson 的硬件编解码 + CUDA 可以吃掉绝大部分算力需求
I420 → appsrc → x264enc → flvmux → rtmpsink
= 自研环视拼接 + 工业视觉推流 的最简、最可控、最实时的标准范式。