文章作自2016年8月11日
经过之前的“智能高清”试验,发觉其实人的关注点并不大,只要在主要的区域播放高清视频,即可达到不错的效果。考虑到捕捉人眼视线的难度,还有捕捉之后传输到服务器再加载小视频的延迟,如果处理的不好,体验可能会较差。能不能有一个预先处理的方案呢?
Prior to the "smart HD" test, we found that gaze area is small, as long as play the HD video in main gaze area, we can have good result. Thinking of difficulty of capturing the gaze, as well as the network delay,the user experience may be poor. Can we have a pre-processd solution?这里提出一个基于时域/空域动态叠加高清视频的方案。实现方法如下:
Here we propose a video overlay across time / space solution. The implementation is as follows:
⦁ 首先将同一个高清视频HD压缩成一个分辨率较低的视频LD。
⦁ 邀请一批不同层次的(男女老少,不同地方,不同职业等)观众观看HD视频,在观看的同时,用专用的设备记录他们在整个视频中的关注点。
⦁ 从空间和时间两个维度对关注点进行分析,得出关注点最集中的区域/时间,(注意这里只是举例,实际上区域可以多于2个,区域可大可小,时间可长可短),比如下图,红色的点是观众的关注点。
First, the same high-definition video HD is compressed into a lower resolution video LD.
Invite a number of different people (men, women, children, different places, different occupations, etc.) to watch HD videos, while watching, use special equipment to record their attention in the entire video.
From the analysis of the focus of the two dimensions of space and time, we get the focus on the area / time, (note that there is only an example, in the area can be more than 2. The area can be small, short or long), such as the red point is the focus of the audience.
图1,从0分0秒到0分30秒,观众最关心的是区域A和B。
Figure 1, from 0 minutes, 0 seconds to 0 minutes, 30 seconds, the audience is most concerned about the regional A and B.
图2,从0分30秒到0分50秒,观众最关心的是区域C和D。
Figure 2, from 0 minutes, 30 seconds to 0 minutes, 50 seconds, the audience is most concerned about the regional C and D.
⦁ 服务器裁剪HD视频V1-V4如下,并且将这些信息保存在一个表T里面。
区域A从0分0秒到0分30秒的视频为V1,位置为P1,大小为S1
区域B从0分0秒到0分30秒的视频为V2,位置为P2,大小为S2
区域A从0分30秒到0分50秒的视频为V3,位置为P3,大小为S3
区域B从0分30秒到0分50秒的视频为V4,位置为P4,大小为S4
⦁ 播放器加载LD视频和表T,并开始播放LD视频。
⦁ 播放器在播放过程中不停的检查表T,当发现即将需要加载高清视频时,比如0分25秒,提前加载视频V3和V4。
⦁ 当播放时间到达0分30秒时,在位置P3播放大小为S3的高清视频V3,同时在位置P4播放大小为S4的高清视频V4。其播放时间图如下:
The server cuts the HD video V1-V4 as follows, and saves the information in a table T.
Area A from 0 minutes, 0 seconds to 0 minutes, 30 seconds of video for the V1, the location of P1, the size of S1
Area B from 0 minutes, 0 seconds to 0 minutes, 30 seconds of video for the V2, the location of P2, the size of S2
Area A from 0 minutes, 30 seconds to 0 minutes, 50 seconds of video for the V3, the location of P3, the size of S3
Area B from 0 minutes, 30 seconds to 0 minutes, 50 seconds of video for the V4, the location of P4, the size of S4
The player loads LD video and table T, and starts playing LD videos.
The player keeps checking the table T during playback, and when it comes to the need to load high-definition video, for example, 0 minutes and 25 seconds, the video V3 and V4 are loaded in advance.
When the play time reaches 0 minutes and 30 seconds, play the HD video V3 of size S3 in position P3, and play the HD video V4 of size S4 in position P4. The playing time is as follows:
上述只是举了一个简单的例子,实际情况可能非常复杂,比如:
The above is just a simple example. The actual situation can be very complicated, for example: