Code Poetry
and Text Adventures

by catid posted (>30 days ago) 5:04pm Fri. Jul 13th 2012 PDT
Background: The Opus Audio Codec

I have been considering adding lossless forward error correction (FEC) to the Opus codec (http://opus-codec.org) with Wirehair.  For low data rates, Opus uses the SILK codec's built-in FEC scheme.  The built-in FEC scheme seems to be lossy in that it provides a lower-quality audio reproduction after packet loss.  For the FB and WB modes of Opus (see their site), FEC does not appear to be used.  I agree with these design decisions, because for low bandwidth, something is better than nothing, and lossless (in terms of packets not compression) might be impossible to achieve.  Furthermore, good lossless FEC like Wirehair is outside of the scope of an audio codec.

Adding FEC to Opus would be a good idea to reduce audio drop-outs for streaming music for wireless devices.  The way I see it being used is to add redundancy to a large part of the audio buffer.  If the audio buffer is 400 ms, then the FEC window would be something like 200 ms.


Approach 1: Accumulate

Programmatically I would use a simple framing scheme (length + data, length + data...) to store the variable-length audio packets in a large buffer, break the buffer into chunks, and then generate check symbols from the chunks as normal.  Decoding the data would involve walking the chunks in the buffer until the lost chunk is found.  For a small number of chunks from 200 ms of audio data, random-access is not a necessity.

Problems with this approach: Because each chunk is variable-length, the length of the lost pieces are unknown so the redundant data would need to include this information in each packet.  Furthermore, if a message is broken over two FEC frames then it would require two packets to recover one packet from the original data.

Clearly accumulating the data is not the right way.


Approach 2: Length-Prepending

Instead of accumulating into a large buffer, the length of each message can be prepended to the message data.  Wirehair would need to be modified to accept variable-length message parts instead of equal-length parts.  The input to the encoder would be a list of message parts, from which the largest message size would be found.  The internal buffers would be allocated with this largest message size in mind, because with the large amount of data mixing it is likely that all of the buffers would end up being full-size.  Check data would be generated from the internal buffers so the check data would all be the same size as the largest message.

To handle the missing message lengths, the lengths are prepended.  This way, when a message is recovered with the check data, its length is also available.

Wirehair currently does not support this mode, and supporting variable-length data is interesting and useful.  I am sure that there are other minor changes needed to optimize for this usage pattern.


Conclusion

With length-prepending, Wirehair FEC can be adapted to variable-length message streams like compressed audio.  It would also be useful for messaging protocols in general.

For instance, it would be easy to combine together every 100 ms worth of messages into an FEC cluster and add another message or two for redundancy for online games.  Online game experience is directly tied to message latency and I see this as being a new and useful tool.
last edit by catid edited (>30 days ago) 6:26pm Fri. Jul 13th 2012 PDT
by (Anonymous) posted (>30 days ago) 9:51pm Sat. Feb 3rd 2018 PST
Originally posted by :
Hello, while I appreciate the good intent, I'd warn you, that the real time audio as implemented in the VoIP applications uses 20 mS frames. The rfc6716 "Definition of the Opus Audio Codec" "2.1.4.  Frame Duration " allows up to 120 mS packet, but for real-time audio recommends 20 mS, the same as for other codec like GSM.
The next thing I would consider is that not all software runs on IBM style PC with gigs of ram, GHz CPU and AVX-2 instructions. Some apps are embedded in MCU with low RAM  resources.
I wonder whether the CD audio's short RS code (28, 24, 5) is feasible over Golay(24,12,8) for such apps...
 

I posted this 6 years ago.  Feel free to Email me at mrcatid at gmail dot com if you want to discuss.  I can show you some cool erasure codes that are fast on low-end hardware.
by (Anonymous) posted (>30 days ago) 2:31am Sat. Apr 15th 2017 PDT
Hello, while I appreciate the good intent, I'd warn you, that the real time audio as implemented in the VoIP applications uses 20 mS frames. The rfc6716 "Definition of the Opus Audio Codec" "2.1.4.  Frame Duration " allows up to 120 mS packet, but for real-time audio recommends 20 mS, the same as for other codec like GSM.
The next thing I would consider is that not all software runs on IBM style PC with gigs of ram, GHz CPU and AVX-2 instructions. Some apps are embedded in MCU with low RAM  resources.
I wonder whether the CD audio's short RS code (28, 24, 5) is feasible over Golay(24,12,8) for such apps...
 
by (Anonymous) posted (>30 days ago) 1:05pm Mon. Jun 15th 2015 PDT
Amazing, your second approach is exactly the idea I had when making https://github.com/lrq3000/pyFileFixity/blob/master/structural_adaptive_ecc.py . And 3 years before.