So I explored this more and learned more about CRCs. This was a particularly good review: http://www.tc.faa.gov/its/worldpac/techrpt/tc14-49.pdf

CRC error detection/recovery really only makes sense for wireless sensor networks (or other link layer-type situations) since it's designed for a few randomly-located bit flips in the data. Wireless channels have a quantifiable *bit* error rate, but on disk we typically lose whole blocks of data at once...

CRC error correction is more efficiently implemented by rolling the CRC backwards revealing the error syndrome, which can be identified by where the CRC register is small during roll-back. This works for single bit errors and (short) burst errors.

But even when it makes sense to use them for error recovery there are definitely pitfalls:

(1) CRC generator polynomials designed for error correction are less efficient than those designed for error detection. This is because the best ones don't work for burst error correction, but may work for single bit errors. I found that (luckily) the best one for CRC16 for 240 bits can do single bit error correction and that's what is now up here:

https://github.com/catid/CRC16Recovery

Counter: It might be possible to come up with one that works for burst error correction that also has decent detection. I would use Koopman's open source hdlen project to do the grunt work of choosing between options and evaluating the result.

(2) Recovering more than one bit may introduce an unacceptably large number of false positives. This is a legitimate concern based on my testing. I found that the reversed-CRC recovery approach has a high chance of resulting in the CRC matching with different data.

Niedermayer's filter just checks if reversed CRC < 256 to see if it found a solution. This can be problematic if any two (or more) randomly-located bit errors ends up rolling back to a CRC with high bits cleared. If the CRC is designed to detect up to 5 single bit errors, for example, then the error recovery code also needs to be validated for up to 5 single bit errors. I don't think he did that as it would require a lot more work than I found on his blog.

In that specific case, when more than 5 errors occur, I think it can be accepted that the CRC is no better than a random hash, and then error recovery would give false positives just as often as the CRC would pass bad data anyway. This is supported by comments from "Selected CRC Polynomials Can Correct Errors and Thus Reduce Retransmission" by Travis Mandel, Jens Mache.

(3) Multiple bit error correction, ie. 2 randomly located bits, only seems possible via a large table. The table can be compressed slightly. I'm really worried about collisions and false positives.

Cut to the conclusion:

So my thought is maybe a few more burst bits can be added with careful work on top of the one bit correction code on my Github. Maybe it can be extended to short patterns like 11, 101, 111 without hurting error detection too much.

CRC error detection/recovery really only makes sense for wireless sensor networks (or other link layer-type situations) since it's designed for a few randomly-located bit flips in the data. Wireless channels have a quantifiable *bit* error rate, but on disk we typically lose whole blocks of data at once...

CRC error correction is more efficiently implemented by rolling the CRC backwards revealing the error syndrome, which can be identified by where the CRC register is small during roll-back. This works for single bit errors and (short) burst errors.

But even when it makes sense to use them for error recovery there are definitely pitfalls:

(1) CRC generator polynomials designed for error correction are less efficient than those designed for error detection. This is because the best ones don't work for burst error correction, but may work for single bit errors. I found that (luckily) the best one for CRC16 for 240 bits can do single bit error correction and that's what is now up here:

https://github.com/catid/CRC16Recovery

Counter: It might be possible to come up with one that works for burst error correction that also has decent detection. I would use Koopman's open source hdlen project to do the grunt work of choosing between options and evaluating the result.

(2) Recovering more than one bit may introduce an unacceptably large number of false positives. This is a legitimate concern based on my testing. I found that the reversed-CRC recovery approach has a high chance of resulting in the CRC matching with different data.

Niedermayer's filter just checks if reversed CRC < 256 to see if it found a solution. This can be problematic if any two (or more) randomly-located bit errors ends up rolling back to a CRC with high bits cleared. If the CRC is designed to detect up to 5 single bit errors, for example, then the error recovery code also needs to be validated for up to 5 single bit errors. I don't think he did that as it would require a lot more work than I found on his blog.

In that specific case, when more than 5 errors occur, I think it can be accepted that the CRC is no better than a random hash, and then error recovery would give false positives just as often as the CRC would pass bad data anyway. This is supported by comments from "Selected CRC Polynomials Can Correct Errors and Thus Reduce Retransmission" by Travis Mandel, Jens Mache.

(3) Multiple bit error correction, ie. 2 randomly located bits, only seems possible via a large table. The table can be compressed slightly. I'm really worried about collisions and false positives.

Cut to the conclusion:

So my thought is maybe a few more burst bits can be added with careful work on top of the one bit correction code on my Github. Maybe it can be extended to short patterns like 11, 101, 111 without hurting error detection too much.

last edit by catid
edited (>30 days ago) 10:35pm Sun. Nov 8th 2015 PST