Update

With more optimization the latest results on the same laptop are:

Comparing performance from all my erasure correction code libraries, on my laptop:

To summarize, a set of 128 of 64 KB data files are supplemented by about 128 redundant code pieces (encoded) meaning a code rate of 1/2. From those redundant code pieces the original set is recovered (decoded).

The results are all from libraries I've written over the past few years. They all have the same vector-optimized inner loops, but the types of error correction codes are different.

For 128 data pieces of input and 128 data pieces of redundancy:

Fastest to encode: Leopard (1.2 GB/s)

Distant second-place: WH256 (660 MB/s), FEC-AL (515 MB/s)

Slowest encoders: Longhair, CM256

Fastest to decode: WH256 (830 MB/s)

Distant second-place: Leopard (482 MB/s)

Slowest decoders: FEC-AL, CM256, Longhair

There are a lot of variables that affect when each of these libraries should be used.

Each one is ideal in a different situation, and no one library can be called the best overall.

The situation tested mainly helps explore the trade-offs of WH256, FEC-AL and Leopard for code rate 1/2.

CM256: Traditional O(N^2) Cauchy matrix MDS Reed-Solomon codec

Runs at about 100 MB/s encode and decode for this case.

This is an MDS code that uses a Cauchy matrix for structure.

Other examples of this type would be most MDS Reed-Solomon codecs online: Jerasure, Zfec, ISA-L, etc.

It requires SSSE3 or newer Intel instruction sets for this speed. Otherwise it runs much slower.

This type of software gets slower as O(K*M) where K = input count and M = recovery count.

It is practical for either small data or small recovery set up to 255 pieces.

It is available for production use under BSD license here:

http://github.com/catid/cm256

(Note that the inner loops can be optimized more by applying the GF256 library.)

Longhair: Binary O(N^2) Cauchy matrix MDS Reed-Solomon codec

Runs at about 100 MB/s encode and decode for this case.

This is an MDS code that uses a Cauchy matrix for structure.

This one only requires XOR operations so it can run fast on low-end processors.

Requires data is a multiple of 8 bytes.

This type of software gets slower as O(K*M) where K = input count and M = recovery count.

It is practical for either small data or small recovery set up to 255 pieces.

There is no other optimized software available online for this type of error correction code. There is a slow version available in the Jerasure software library.

It is available for production use under BSD license here:

http://github.com/catid/longhair

(Note that the inner loops can be optimized more by applying the GF256 library.)

Wirehair: O(N) Hybrid LDPC Erasure Code

Encodes at 660 MB/s, and decodes at 830 MB/s for ALL cases.

This is not an MDS code. It has about a 3% chance of failing to recover and requiring one extra block of data.

It uses mostly XOR so it only gets a little slower on lower-end processors.

This type of software gets slower as O(K) where K = input count.

This library incorporates some novel ideas that are unpublished. The new ideas are described in the source code.

It is practical for data up to 64,000 pieces and can be used as a "fountain" code.

There is no other optimized software available online for this type of error correction code. I believe there are some public (slow) implementations of Raptor codes available online for study.

It is available for production use under BSD license here:

http://github.com/catid/wirehair

There's a pre-production version that needs more work here using GF256 for more speed,

which is what I used for the benchmark:

http://github.com/catid/wh256

FEC-AL *new*: O(N^2/8) XOR Structured Convolutional Matrix Code

Encodes at 510 MB/s. Decodes at 121 MB/s.

This is not an MDS code. It has about a 1% chance of failing to recover and requiring one extra block of data.

This library incorporates some novel ideas that are unpublished. The new ideas are described in the README.

It uses mostly XOR operations so only gets about 2-4x slower on lower-end processors.

It gets slower as O(K*M/8) for larger data, bounded by the speed of XOR.

This new approach is ideal for streaming erasure codes; two implementations are offered one for files and another for real-time streaming reliable data.

It is practical for data up to about 4,000 pieces and can be used as a "fountain" code.

There is no other software available online for this type of error correction code.

It is available for production use under BSD license here:

http://github.com/catid/fecal

It can also be used as a convolutional streaming code here for e.g. rUDP:

http://github.com/catid/siamese

Leopard-RS *new*: O(K Log M) FFT MDS Reed-Solomon codec

Encodes at 1.2 GB/s, and decodes at 480 MB/s for this case.

12x faster than existing MDS approaches to encode, and almost 5x faster to decode.

This uses a recent result from 2014 introducing a novel polynomial basis permitting FFT over fast Galois fields.

This is an MDS Reed-Solomon similar to Jerasure, Zfec, ISA-L, etc, but much faster.

It requires SSSE3 or newer Intel instruction sets for this speed. Otherwise it runs much slower.

Requires data is a multiple of 64 bytes.

This type of software gets slower as O(K Log M) where K = input count, M = recovery count.

It is practical for extremely large data.

There is no other software available online for this type of error correction code.

I'm still working on Leopard-RS so it's not ready for use yet (still bugs to fix).

Check the github page for the latest: http://github.com/catid/leopard

With more optimization the latest results on the same laptop are:

Leopard Encoder(8.192 MB in 128 pieces, 128 losses): Input=2102.67 MB/s, Output=2102.67 MB/s

Leopard Decoder(8.192 MB in 128 pieces, 128 losses): Input=686.212 MB/s, Output=686.212 MB/s

Leopard Encoder(64 MB in 1000 pieces, 200 losses): Input=2194.94 MB/s, Output=438.988 MB/s

Leopard Decoder(64 MB in 1000 pieces, 200 losses): Input=455.633 MB/s, Output=91.1265 MB/s

Leopard Encoder(2097.15 MB in 32768 pieces, 32768 losses): Input=451.168 MB/s, Output=451.168 MB/s

Leopard Decoder(2097.15 MB in 32768 pieces, 32768 losses): Input=190.471 MB/s, Output=190.471 MB/s

Leopard Decoder(8.192 MB in 128 pieces, 128 losses): Input=686.212 MB/s, Output=686.212 MB/s

Leopard Encoder(64 MB in 1000 pieces, 200 losses): Input=2194.94 MB/s, Output=438.988 MB/s

Leopard Decoder(64 MB in 1000 pieces, 200 losses): Input=455.633 MB/s, Output=91.1265 MB/s

Leopard Encoder(2097.15 MB in 32768 pieces, 32768 losses): Input=451.168 MB/s, Output=451.168 MB/s

Leopard Decoder(2097.15 MB in 32768 pieces, 32768 losses): Input=190.471 MB/s, Output=190.471 MB/s

Comparing performance from all my erasure correction code libraries, on my laptop:

To summarize, a set of 128 of 64 KB data files are supplemented by about 128 redundant code pieces (encoded) meaning a code rate of 1/2. From those redundant code pieces the original set is recovered (decoded).

The results are all from libraries I've written over the past few years. They all have the same vector-optimized inner loops, but the types of error correction codes are different.

For 64KB data chunks:

CM256 Encoder: 64000 bytes k = 128 m = 128 : 82194.7 usec, 99.6658 MBps

CM256 Decoder: 64000 bytes k = 128 m = 128 : 78279.5 usec, 104.651 MBps

Longhair Encoded k=128 data blocks with m=128 recovery blocks in 81641.2 usec : 100.342 MB/s

Longhair Decoded 128 erasures in 85000.7 usec : 96.3757 MB/s

WH256 wirehair_encode(N = 128) in 12381.3 usec, 661.644 MB/s after 127.385 avg losses

WH256 wirehair_decode(N = 128) average overhead = 0.025 blocks, average reconstruct time = 9868.65 usec, 830.103 MB/s

FEC-AL Encoder(8.192 MB in 128 pieces, 128 losses): Input=518.545 MB/s, Output=518.545 MB/s, (Encode create: 3762.73 MB/s)

FEC-AL Decoder(8.192 MB in 128 pieces, 128 losses): Input=121.093 MB/s, Output=121.093 MB/s, (Overhead = 0 pieces)

Leopard Encoder(8.192 MB in 128 pieces, 128 losses): Input=1242.62 MB/s, Output=1242.62 MB/s

Leopard Decoder(8.192 MB in 128 pieces, 128 losses): Input=482.53 MB/s, Output=482.53 MB/s

CM256 Encoder: 64000 bytes k = 128 m = 128 : 82194.7 usec, 99.6658 MBps

CM256 Decoder: 64000 bytes k = 128 m = 128 : 78279.5 usec, 104.651 MBps

Longhair Encoded k=128 data blocks with m=128 recovery blocks in 81641.2 usec : 100.342 MB/s

Longhair Decoded 128 erasures in 85000.7 usec : 96.3757 MB/s

WH256 wirehair_encode(N = 128) in 12381.3 usec, 661.644 MB/s after 127.385 avg losses

WH256 wirehair_decode(N = 128) average overhead = 0.025 blocks, average reconstruct time = 9868.65 usec, 830.103 MB/s

FEC-AL Encoder(8.192 MB in 128 pieces, 128 losses): Input=518.545 MB/s, Output=518.545 MB/s, (Encode create: 3762.73 MB/s)

FEC-AL Decoder(8.192 MB in 128 pieces, 128 losses): Input=121.093 MB/s, Output=121.093 MB/s, (Overhead = 0 pieces)

Leopard Encoder(8.192 MB in 128 pieces, 128 losses): Input=1242.62 MB/s, Output=1242.62 MB/s

Leopard Decoder(8.192 MB in 128 pieces, 128 losses): Input=482.53 MB/s, Output=482.53 MB/s

For 128 data pieces of input and 128 data pieces of redundancy:

Fastest to encode: Leopard (1.2 GB/s)

Distant second-place: WH256 (660 MB/s), FEC-AL (515 MB/s)

Slowest encoders: Longhair, CM256

Fastest to decode: WH256 (830 MB/s)

Distant second-place: Leopard (482 MB/s)

Slowest decoders: FEC-AL, CM256, Longhair

There are a lot of variables that affect when each of these libraries should be used.

Each one is ideal in a different situation, and no one library can be called the best overall.

The situation tested mainly helps explore the trade-offs of WH256, FEC-AL and Leopard for code rate 1/2.

CM256: Traditional O(N^2) Cauchy matrix MDS Reed-Solomon codec

Runs at about 100 MB/s encode and decode for this case.

This is an MDS code that uses a Cauchy matrix for structure.

Other examples of this type would be most MDS Reed-Solomon codecs online: Jerasure, Zfec, ISA-L, etc.

It requires SSSE3 or newer Intel instruction sets for this speed. Otherwise it runs much slower.

This type of software gets slower as O(K*M) where K = input count and M = recovery count.

It is practical for either small data or small recovery set up to 255 pieces.

It is available for production use under BSD license here:

http://github.com/catid/cm256

(Note that the inner loops can be optimized more by applying the GF256 library.)

Longhair: Binary O(N^2) Cauchy matrix MDS Reed-Solomon codec

Runs at about 100 MB/s encode and decode for this case.

This is an MDS code that uses a Cauchy matrix for structure.

This one only requires XOR operations so it can run fast on low-end processors.

Requires data is a multiple of 8 bytes.

This type of software gets slower as O(K*M) where K = input count and M = recovery count.

It is practical for either small data or small recovery set up to 255 pieces.

There is no other optimized software available online for this type of error correction code. There is a slow version available in the Jerasure software library.

It is available for production use under BSD license here:

http://github.com/catid/longhair

(Note that the inner loops can be optimized more by applying the GF256 library.)

Wirehair: O(N) Hybrid LDPC Erasure Code

Encodes at 660 MB/s, and decodes at 830 MB/s for ALL cases.

This is not an MDS code. It has about a 3% chance of failing to recover and requiring one extra block of data.

It uses mostly XOR so it only gets a little slower on lower-end processors.

This type of software gets slower as O(K) where K = input count.

This library incorporates some novel ideas that are unpublished. The new ideas are described in the source code.

It is practical for data up to 64,000 pieces and can be used as a "fountain" code.

There is no other optimized software available online for this type of error correction code. I believe there are some public (slow) implementations of Raptor codes available online for study.

It is available for production use under BSD license here:

http://github.com/catid/wirehair

There's a pre-production version that needs more work here using GF256 for more speed,

which is what I used for the benchmark:

http://github.com/catid/wh256

FEC-AL *new*: O(N^2/8) XOR Structured Convolutional Matrix Code

Encodes at 510 MB/s. Decodes at 121 MB/s.

This is not an MDS code. It has about a 1% chance of failing to recover and requiring one extra block of data.

This library incorporates some novel ideas that are unpublished. The new ideas are described in the README.

It uses mostly XOR operations so only gets about 2-4x slower on lower-end processors.

It gets slower as O(K*M/8) for larger data, bounded by the speed of XOR.

This new approach is ideal for streaming erasure codes; two implementations are offered one for files and another for real-time streaming reliable data.

It is practical for data up to about 4,000 pieces and can be used as a "fountain" code.

There is no other software available online for this type of error correction code.

It is available for production use under BSD license here:

http://github.com/catid/fecal

It can also be used as a convolutional streaming code here for e.g. rUDP:

http://github.com/catid/siamese

Leopard-RS *new*: O(K Log M) FFT MDS Reed-Solomon codec

Encodes at 1.2 GB/s, and decodes at 480 MB/s for this case.

12x faster than existing MDS approaches to encode, and almost 5x faster to decode.

This uses a recent result from 2014 introducing a novel polynomial basis permitting FFT over fast Galois fields.

This is an MDS Reed-Solomon similar to Jerasure, Zfec, ISA-L, etc, but much faster.

It requires SSSE3 or newer Intel instruction sets for this speed. Otherwise it runs much slower.

Requires data is a multiple of 64 bytes.

This type of software gets slower as O(K Log M) where K = input count, M = recovery count.

It is practical for extremely large data.

There is no other software available online for this type of error correction code.

I'm still working on Leopard-RS so it's not ready for use yet (still bugs to fix).

Check the github page for the latest: http://github.com/catid/leopard

last edit by catid
edited (>30 days ago) 7:10pm Sun. Jun 4th 2017 PDT