视频推流技术_ai视频流的最新技术进步-优快云博客

视频推流技术

85% of the data consumed over the internet is via videos. About 2.8 exabytes of data is transferred over the internet via streaming videos. This growth is driven by the advent of VOD platforms like Netflix, Video communication platforms like Zoom, Social platforms like Tiktok, esports, live streaming to name a few.

互联网上85％的数据消耗是通过视频。通过流视频在互联网上传输大约2.8艾字节的数据。这种增长是由诸如Netflix之类的VOD平台，诸如Zoom之类的视频通信平台，诸如Tiktok之类的社交平台，电子竞技，实时流媒体的出现所推动的。

Image for post — *Image by author from* *图片来自* Vadoo Vadoo

Covid19 pandemic has accelerated the video consumption and has been the driving force of companies moving from offline mode to online live mode. With this explosion of the video consumption on a day-to-day basis we need to be prepared for the upcoming demand.

Covid19大流行加速了视频消费，并成为了公司从离线模式过渡到在线实时模式的驱动力。随着视频消费每天的激增，我们需要为即将到来的需求做好准备。

In this article we will be discussing what are the latest advancements in video streaming technology and how can they help in improving the streaming experience.

在本文中，我们将讨论视频流技术的最新进展以及它们如何帮助改善流体验。

超分辨率 (Super-Resolution)

Artificial Intelligence has been disrupting all kinds of industries and video streaming is no exception. AI models can learn how to generate a high-resolution image from a low-resolution image by learning from a lot of images. This method of generating a high-res image from a low-res image is called Super-Resolution

人工智能已经扰乱了各个行业，视频流也不例外。 AI模型可以通过从大量图像中学习来学习如何从低分辨率图像中生成高分辨率图像。这种从低分辨率图像生成高分辨率图像的方法称为超分辨率

Super-Resolution comes under the realm of generative algorithms where the algorithm has the ability to generate information which is not present before. As can be seen in the above figure, the network can take the image from the left and imagine the finer details to re-create the image on the right. This is possible because the AI models have been trained on lots of data of images and it now has an understanding of how to upscale the image when a new image has been provided.

超分辨率属于生成算法领域，该算法具有生成以前不存在的信息的能力。从上图中可以看出，网络可以从左侧获取图像，然后想象更细微的细节以在右侧重新创建图像。这是可能的，因为AI模型已经在大量图像数据上进行了训练，并且现在已经了解了在提供新图像时如何放大图像。

The same concept can be extended for videos with minor modifications. In the case of videos, multiple generated high-resolution frames of the past and the current low resolution frame are used together to generate the current high-resolution frame but nevertheless the concept is same. This technology provides the capability to send a video at 480p but to watch on the client device at say 1080p.

可以对具有少量修改的视频扩展相同的概念。在视频的情况下，将过去生成的多个高分辨率帧和当前的低分辨率帧一起使用以生成当前的高分辨率帧，但是概念是相同的。这项技术提供了以480p发送视频但以1080p在客户端设备上观看的功能。

This technology is possible due to the recent advancements in deep learning and the availability of huge compute power at client devices. A recent paper TecoGAN has produced high-resolution results which are eerily similar to the real-world images as can be seen in image above. Using this technology can save up-to 30% of the bandwidth consumption thus improving the overall user experience.

由于深度学习方面的最新进展以及客户端设备上巨大的计算能力，该技术成为可能。最近的一篇论文TecoGAN产生了高分辨率的结果，与上图所示的图像真实地相似。使用该技术可以节省多达30％的带宽消耗，从而改善整体用户体验。

P2P流 (P2P Streaming)

Video streaming follows a client-server model. The content is delivered from edge locations called CDN which cache the content from the server. The client devices i.e the mobile/laptop/TV fetch the content from these CDN to start playing your video. Video streaming via CDN has a few limitations as mentioned below

视频流遵循客户端-服务器模型。内容从称为CDN的边缘位置传递，该位置缓存服务器中的内容。客户端设备(即移动/笔记本电脑/电视)从这些CDN中获取内容，以开始播放视频。通过CDN进行视频流传输有一些限制，如下所述

Video streaming using CDN is expensive
使用CDN的视频流很昂贵
Viewership spike leads to high buffering
观看人数激增会导致高缓冲
Delivery of content to remote locations due to poor coverage of CDN
由于CDN覆盖范围较差，因此将内容传递到偏远地区

Due to these reasons it becomes difficult for a video streaming company to scale their service effectively and to provide a good user experience. All of these problems can be solved by extending a CDN with P2P streaming aka Hybrid CDN

由于这些原因，视频流传输公司很难有效地扩展其服务并提供良好的用户体验。所有这些问题都可以通过将CDN扩展为P2P流又称为混合CDN来解决。

Adding Peer-to-Peer layer over the traditional CDN reduces the load on the main CDN by distributing the content from neighboring peers which act like a CDN themselves thus bringing the edge much closer to the user. Due to less load on CDN, the expenses will now be 40% less while the user experience improves as well due to low re-buffering as the content is fetched from a nearby peer than a far-away CDN. More info about how the technology works is available here.

在传统的CDN上添加对等层，通过分发来自邻居对等点的内容来减轻主CDN的负担，邻居对等点本身就象CDN一样，从而使边缘更接近用户。由于CDN上的负载较少，因此现在的支出将减少40％，同时由于从附近对等方(而不是较远的CDN)获取内容而进行的重新缓冲较少，因此用户体验也得到了改善。有关该技术工作原理的更多信息，请点击此处。

Scale your video streaming with Vadoo

使用Vadoo扩展视频流

多CDN (Multi-CDN)

We have seen that content is delivered to client devices via CDN which cache the content from the server. A CDN consists of data-centers located at multiple locations to serve content. These locations are called Point-of-Presence(POP). Ideally we would want to have as many POP’s as possible. But not all CDN’s have equal reach worldwide. For example some of the most popular CDN doesn’t even have a single POP in China.

我们已经看到，内容是通过CDN传递到客户端设备的，该CDN从服务器缓存内容。 CDN由位于多个位置的数据中心组成，以提供内容。这些位置称为存在点(POP)。理想情况下，我们希望拥有尽可能多的POP。但是，并非所有CDN的影响力在全球范围内都相等。例如，某些最受欢迎的CDN在中国甚至没有一个POP。

In addition, the performance of every CDN varies with time and is inconsistent.A CDN can have an outage at any point of time due to Murphy’s law. To deal with all these risks, an ideal option would be to have access to multiple CDN at a time. This can either be done by reaching out to multiple CDN and cutting deals with each of them individually or instead reach out to a Multi-CDN provider who manages all of this for you.

此外，每个CDN的性能都会随时间变化并且不一致。由于墨菲定律，CDN可能会在任何时间点中断。为了应对所有这些风险，理想的选择是一次访问多个CDN。这可以通过与多个CDN进行联系并分别与每个CDN达成交易来完成，也可以与多CDN提供商联系，后者为您管理所有这些。

Powering your video streaming service using a Multi-CDN provides you with the advantages which help in improving the overall user experience.

使用Multi-CDN为您的视频流服务提供动力可为您提供有助于改善整体用户体验的优势。

Prevention of sudden stoppage of service by switching to a working CDN
通过切换到工作的CDN来防止服务突然停止
Fetching content from the highest performing CDN at that point of time using Real User Monitoring(RUM) metrics to improve Quality of Service(QOS)
使用实时用户监视(RUM)指标从当时最佳性能的CDN中获取内容，以提高服务质量(QOS)
Mid-stream switching can help in reducing CDN costs
中游交换可以帮助降低CDN成本

中国空军 (CMAF)

Live streaming a sports match would have a delay of 25–30 seconds than a television broadcast thus impeding user experience. This is due to the way the technology around video streaming is designed. The most prominent way of streaming videos is HTTP based streaming, where the video content is delivered in the following steps

直播体育比赛比电视广播要延迟25-30秒，因此会妨碍用户体验。这是由于围绕视频流技术的设计方式所致。流式传输视频的最突出方式是基于HTTP的流式传输，其中视频内容按以下步骤进行传递

The video is encoded with a codec say h264 to reduce the video file size
视频使用编解码器h264进行编码，以减小视频文件的大小
Then the video is converted to a streamable format like HLS or MPEG-DASH
然后将视频转换为可流式格式，例如HLS或MPEG-DASH
The content from the server is now distributed via CDN
来自服务器的内容现在通过CDN分发
The video player buffers few segments of the video before starting to play
视频播放器在开始播放之前会缓冲视频的几个片段

HLS or MPEG-DASH allows sending the videos in segments which are downloaded sequentially by the video player. A segment generally contains around 10 seconds of video. The encoder has to wait for encoding the entire segment of the video before making it available for CDN. The CDN has to wait for receiving the entire segment of video before passing it across to the video player. The video player has to buffer atleast few segments of video before it starts playing to maintain user experience.

HLS或MPEG-DASH允许按分段发送视频，这些分段由视频播放器顺序下载。一个片段通常包含大约10秒的视频。编码器必须等待对视频的整个片段进行编码，然后才能将其用于CDN。 CDN必须等待接收整个视频片段，然后再将其传递给视频播放器。视频播放器必须在开始播放之前缓冲至少几段视频，以维持用户体验。

This entire process leads to a delay of 25–30 seconds. Also due to the presence of multiple formats i.e .ts fr HLS and .mp4 of MPEG-DASH there is a need for double the storage, double the encoding and double the CDN. CMAF tries to solve both the above problems.

这整个过程导致25到30秒的延迟。同样，由于存在多种格式，即MPEG-DASH的.ts fr HLS和.mp4，因此需要两倍的存储量，两倍的编码和两倍的CDN。 CMAF试图解决上述两个问题。

CMAF provides a consistent format fragment mp4(fmp4) which is supported by both HLS and MPEG-DASH. It also has the capability of doing chunked transfer encoding. What this means is, the encoder won’t be waiting for the entire segment of video before transmitting. The segment is further divided into smaller chunks and these chunks are sent as and when encoded which is passed on by the CDN to encoder in no particular order. The video player takes care of organizing the chunks and playing the video segment.

CMAF提供了一致的格式片段mp4(fmp4)，HLS和MPEG-DASH均支持该格式。它还具有执行分块传输编码的功能。这意味着，编码器在传输之前不会等待整个视频片段。该段被进一步划分为较小的块，并且这些块在编码时随即发送，并由CDN以不特定的顺序传递给编码器。视频播放器负责组织块并播放视频片段。

Thus CMAF helps in reducing the encoding and storage costs as well as reducing the latency using chunked transfer encoding.

因此，CMAF有助于减少编码和存储成本，并使用分块传输编码来减少等待时间。

字幕编码 (Per-title encoding)

Video encoding is the process of compressing video using codecs like H.264 which take advantage of common information present across consecutive frames and thus only storing the newly added information in the frame. To deal with a variety of network conditions and volatile internet connections the video content is encoded at multiple resolutions and is intelligently switched. This technique is called Adaptive Bitrate(ABR).

视频编码是使用编解码器(例如H.264)压缩视频的过程，编解码器利用了连续帧中存在的公共信息，因此仅将新添加的信息存储在帧中。为了应对各种网络条件和不稳定的Internet连接，视频内容以多种分辨率进行编码并进行智能切换。这种技术称为自适应比特率(ABR)。

The below image depicts multiple resolutions and the corresponding bitrate allocated. The key-factor to observe here is that the content at the same resolution can be encoded at multiple bitrates i.e a 1920 x 1080 video can be encoded at 5800 mbps or 6800 mbps or even 4800 mbps. This is due to the fact that video compression is a lossy compression and hence the lesser the chosen bitrate the lesser the information available in the compressed video. This can be observed in the encoding artifacts similar to the image above which you would have observed while you watch a video on a streaming platform.

下图描述了多种分辨率和分配的相应比特率。这里要观察的关键因素是，可以以多种比特率对相同分辨率的内容进行编码，即，可以以5800 mbps或6800 mbps甚至4800 mbps的速度编码1920 x 1080视频。这是由于以下事实：视频压缩是有损压缩，因此选择的比特率越小，压缩视频中可用的信息就越少。可以在类似于在流媒体平台上观看视频时所观察到的图像的编码伪像中观察到这一点。

Now that we know we can encode a video at a particular resolution using different bitrates, in real-world an ideal generic bitrate is chosen such that there would lesser artifacts while not consuming a lot of bandwidth. The above picture shows a generic bandwidth pattern which is chosen for all videos. But there is a problem with this one size fits all approach. Some of the videos such as high octane action films might have rich information and thus might need a higher bitrate for encoding while a simple cartoon video can perform the same even with a lower bitrate.

现在我们知道我们可以使用不同的比特率以特定的分辨率对视频进行编码，在现实世界中，选择理想的通用比特率可以减少伪像，同时又不占用大量带宽。上图显示了为所有视频选择的通用带宽模式。但是这种尺寸适合所有方法存在一个问题。一些视频(例如高辛烷值动作片)可能具有丰富的信息，因此可能需要更高的比特率进行编码，而即使是较低比特率的简单卡通视频也可以执行相同的操作。

Thus Netflix proposed encoding a particular video resolution at multiple bitrates and then choosing the best bitrate which can render the video without artifacts. This is done using a metric called VMAF which is a visual score to identify the quality of rendered video. With the help of this, custom bitrate ladders can be designed for each video. Thus if we are streaming a cartoon video, it can be published at a lower bitrate while still having similar quality whereas the action film can be published at a higher bitrate without any artifacts and thus improving user experience.

因此，Netflix建议以多个比特率编码特定的视频分辨率，然后选择可以呈现无伪影的视频的最佳比特率。这可以使用称为VMAF的指标来完成，该指标是视觉分数，用于标识渲染视频的质量。借助于此，可以为每个视频设计自定义比特率阶梯。因此，如果我们正在流式传输卡通视频，它可以以较低的比特率发布，同时仍具有相似的质量，而动作电影可以以较高的比特率发布而没有任何伪像，从而改善了用户体验。

摘要 (Summary)

To summarize we have seen multiple techniques which can help scaling, improve user experience and reduce latency in video streaming.The advent of 5G and the world of Virtual Reality(VR) and Augmented Reality(AR) and the rise of esports will skyrocket video bandwidth consumption. Thus it is time for the adoption of the latest technologies to catch up and service this growth.

总而言之，我们看到了多种技术可以帮助扩展，改善用户体验并减少视频流中的延迟.5G的出现以及虚拟现实(VR)和增强现实(AR)的世界以及电子竞技的兴起将使视频带宽飞速增长消费。因此，现在是采用最新技术来赶上并服务于这种增长的时候了。

翻译自: https://towardsdatascience.com/latest-technological-advancements-in-video-streaming-with-ai-293d3b8b2a7e

视频推流技术