前言
基于UDP的传输协议,因为UDP本身的不可靠,需要解决丢包的问题,一般针对丢包问题有两种方案:
-
基于丢包重传的NACK和ACK 机制,NACK机制是接收端告诉发送端自己那些序列号的包没有收到,然后发送端重传该序号的包;ACK是接收端发送已确认收到的包序号给发送端,发送端根据接收端的ACK信息,检测是否需要重传包。
-
传输冗余信息的FEC机制,当传输中出现丢包,接收器根据已接收的部分包恢复完整的数据。
weberc 采用了NACK的方案,本文基于mediasoup NACK模块来分析NACK的设计要点。实际上,mediasoup NACK设计原理和webrtc中的NACK设计是一致的。
1. NACK设计
对于实时音视频来说,既要做到及时快速的重传,提高传输速率,又要做到避免过多的,过久的重传,减少带宽的使用,这也是webrtc NACK的设计要点,需要平衡音视频延时和重传带宽占用。
mediasoup 设计一个NackGenerator类进行NACK管理,它维护了一个重传队列和关键帧队列,为了避免过多的,过久的重传,重传队列设置了最大长度1000个RTP包,或者覆盖10000的包序号范围,每个包最多重传10次。
mediasoup 有两处地方会发送NACK 指令:
- 每次收到RTP数据包,会去检测是否有包丢失,如果有包丢失,第一次重传会立即进行。
- 设置了一个40毫秒的定时器,定期检测重传队列,如果当前时间距离上次重传时间超过当前计算的rtt时,会再重新进行重传,但最大重传次数是10次,超过10次会取消该序列号的重传。
如果重传一直无效,导致重传队列包数到达了最大值,mediasoup会尝试清除过久的重传包,直到最近的一个关键帧重传请求。清理后如果队列还是超出了限制,会清除整个重传队列,发送一个RTCP 的PLI 命令,直接请求发送方发一个关键帧,避免了过多的重传带宽。
2. 代码分析
mediasoup 的NACK 代码实现
// Returns true if this is a found nacked packet. False otherwise.
bool NackGenerator::ReceivePacket(RTC::RtpPacket* packet, bool isRecovered)
{
MS_TRACE();
uint16_t seq = packet->GetSequenceNumber();
bool isKeyFrame = packet->IsKeyFrame();
if (!this->started)
{
this->started = true;
this->lastSeq = seq;
if (isKeyFrame)
this->keyFrameList.insert(seq);
return false;
}
// Obviously never nacked, so ignore.
if (seq == this->lastSeq)
return false;
// May be an out of order packet, or already handled retransmitted packet,
// or a retransmitted packet.
if (SeqManager<uint16_t>::IsSeqLowerThan(seq, this->lastSeq))
{
auto it = this->nackList.find(seq);
// It was a nacked packet.
if (it != this->nackList.end())
{
MS_DEBUG_DEV(
"NACKed packet received [ssrc:%" PRIu32 ", seq:%" PRIu16 ", recovered:%s]",
packet->GetSsrc(),
packet->GetSequenceNumber(),
isRecovered ? "true" : "false");
this->nackList.erase(it);
return true;
}
// Out of order packet or already handled NACKed packet.
if (!isRecovered)
{
MS_WARN_DEV(
"ignoring older packet not present in the NACK list [ssrc:%" PRIu32 ", seq:%" PRIu16 "]",
packet->GetSsrc(),
packet->GetSequenceNumber());
}
return false;
}
// If we are here it means that we may have lost some packets so seq is
// newer than the latest seq seen.
if (isKeyFrame)
this->keyFrameList.insert(seq);
// Remove old keyframes.
{
auto it = this->keyFrameList.lower_bound(seq - MaxPacketAge);
if (it != this->keyFrameList.begin())
this->keyFrameList.erase(this->keyFrameList.begin(), it);
}
if (isRecovered)
{
this->recoveredList.insert(seq);
// Remove old ones so we don't accumulate recovered packets.
auto it = this->recoveredList.lower_bound(seq - MaxPacketAge);
if (it != this->recoveredList.begin())
this->recoveredList.erase(this->recoveredList.begin(), it);
// Do not let a packet pass if it's newer than last seen seq and came via
// RTX.
return false;
}
AddPacketsToNackList(this->lastSeq + 1, seq);
this->lastSeq = seq;
// Check if there are any nacks that are waiting for this seq number.
std::vector<uint16_t> nackBatch = GetNackBatch(NackFilter::SEQ);
if (!nackBatch.empty())
this->listener->OnNackGeneratorNackRequired(nackBatch);
// This is important. Otherwise the running timer (filter:TIME) would be
// interrupted and NACKs would never been sent more than once for each seq.
if (!this->timer->IsActive())
MayRunTimer();
return false;
}
重传队列
void NackGenerator::AddPacketsToNackList(uint16_t seqStart, uint16_t seqEnd)
{
MS_TRACE();
// Remove old packets.
auto it = this->nackList.lower_bound(seqEnd - MaxPacketAge);
this->nackList.erase(this->nackList.begin(), it);
// If the nack list is too large, remove packets from the nack list until
// the latest first packet of a keyframe. If the list is still too large,
// clear it and request a keyframe.
uint16_t numNewNacks = seqEnd - seqStart;
if (this->nackList.size() + numNewNacks > MaxNackPackets)
{
// clang-format off
while (
RemoveNackItemsUntilKeyFrame() &&
this->nackList.size() + numNewNacks > MaxNackPackets
)
// clang-format on
{
}
if (this->nackList.size() + numNewNacks > MaxNackPackets)
{
MS_WARN_TAG(
rtx, "NACK list full, clearing it and requesting a key frame [seqEnd:%" PRIu16 "]", seqEnd);
this->nackList.clear();
this->listener->OnNackGeneratorKeyFrameRequired();
return;
}
}
for (uint16_t seq = seqStart; seq != seqEnd; ++seq)
{
MS_ASSERT(this->nackList.find(seq) == this->nackList.end(), "packet already in the NACK list");
// Do not send NACK for packets that are already recovered by RTX.
if (this->recoveredList.find(seq) != this->recoveredList.end())
continue;
this->nackList.emplace(std::make_pair(seq, NackInfo{ seq, seq }));
}
}
重传时机
std::vector<uint16_t> NackGenerator::GetNackBatch(NackFilter filter)
{
MS_TRACE();
uint64_t nowMs = DepLibUV::GetTimeMs();
std::vector<uint16_t> nackBatch;
auto it = this->nackList.begin();
while (it != this->nackList.end())
{
NackInfo& nackInfo = it->second;
uint16_t seq = nackInfo.seq;
// 第一次重传
if (
filter == NackFilter::SEQ &&
nackInfo.sentAtMs == 0 &&
(
nackInfo.sendAtSeq == this->lastSeq ||
SeqManager<uint16_t>::IsSeqHigherThan(this->lastSeq, nackInfo.sendAtSeq)
)
)
// clang-format on
{
nackBatch.emplace_back(seq);
nackInfo.retries++;
nackInfo.sentAtMs = nowMs;
if (nackInfo.retries >= MaxNackRetries)
{
MS_WARN_TAG(
rtx,
"sequence number removed from the NACK list due to max retries [filter:seq, seq:%" PRIu16
"]",
seq);
it = this->nackList.erase(it);
}
else
{
++it;
}
continue;
}
// 定时器重传
if (filter == NackFilter::TIME && nowMs - nackInfo.sentAtMs >= this->rtt)
{
nackBatch.emplace_back(seq);
nackInfo.retries++;
nackInfo.sentAtMs = nowMs;
if (nackInfo.retries >= MaxNackRetries)
{
MS_WARN_TAG(
rtx,
"sequence number removed from the NACK list due to max retries [filter:time, seq:%" PRIu16
"]",
seq);
it = this->nackList.erase(it);
}
else
{
++it;
}
continue;
}
++it;
}
return nackBatch;
}
3. 潜在问题
mediasoup 在包乱序的情况下会出现重传的问题。mediasoup 检测是否丢包是根据包序号间隔是否连续,因此如果出现包乱序的情况下,mediasoup会立刻发送NACK,但可能实际上乱序包之间到达实际间隔很小,只需要稍微等待一会儿,就可以组成连续的包,避免发送NACK。
对此,webrtc中有一个WebRTC-SendNackDelayMs参数设置,设置NACK发送的delay 时间。