Code Poetry
and Text Adventures

by catid posted (>30 days ago) 6:45pm Tue. Mar 13th 2012 PDT
Continued from

It is clear from the invertibility data, in the previous post, that if the matrix size is DxD, that when D Mod 4 = 0, a Shuffle-2 Code matrix is not invertible.  And when D Mod 4 = 2, then it is better than the random matrix for small D.  However for D > 22, even the best sizes are not as often invertible as the random matrix.

However this is not a huge issue.  Skipping over a lot of details out of scope of this article, the way that the Shuffle-2 Codes are used is as follows:

How the codes are actually used:

(1) Several DxD random matrices are produced with Shuffle-2 Codes.  Call these matrices {R0, R1, R2, R3...}.

(2) Only a few columns (M) are selected at random from these matrices to form a new matrix, called the GE matrix.  M is slightly larger than the square root of the total number of columns across all random R# matrices.

(3) The resulting matrix must be rank M or it leads to lower error correcting performance in Wirehair.

(4) Furthermore, due to the other things going on around this algorithm, the bits in the GE matrix are somewhat randomly flipped after the first third of them.  This makes it easier for the matrix to be full rank, but is also a challenge because it means that if the columns have low Hamming weight that they are in danger of all being flipped off.

(5) I want to be able to use the same random-looking R# matrices for any given D and I want it to behave well.  This allows me to use a short table of PRNG seeds for each value of D to generate a best-performing set of R# matrices.

From the way the Shuffle-2 Code is used, some requirements are apparent:

Wirehair Shuffle-2 Code requirements:

(1) Should be able to generate the R# matrices from a seed.

(2) The average rank of randomly-selected columns should be high to satisfy the primary goal of the code.

(2a) To achieve (2), the average Hamming distance between columns should be maximized.

(2b) The minimum Hamming distance between columns is 2.

(2c) Based on the empirical data from before, D is chosen so that D Mod 4 = 2.  In practice D will be rounded up to the next "good" one.

(3) The minimum Hamming weight of each column should be 3 to avoid being flipped into oblivion.

To find the best matrices, I tried all seeds from 0..65535 and generated D of the DxD matrices.  Then, I verified requirement (2b) and (3).  Of the remaining options, the one with the highest average rank was chosen.  I am only interested in values of D between 14 and 486 for practical use.  Smaller D are special cases that do not need to be in the table, and larger D are unused.

I found that (2b) is not possible to satisfy.  The minimum Hamming distance will always be 1.

Here's some example data for 22x22:

Seed 4504 minimum Hamming distance of 1 and average = 14.4 and minimum Hamming weight of 13
Rank 2 at 0.999
Rank 3 at 0.993
Rank 4 at 0.986
Rank 5 at 0.977
Rank 6 at 0.96
Rank 7 at 0.952
Rank 8 at 0.92
Rank 9 at 0.918
Rank 10 at 0.887
Rank 11 at 0.876
Rank 12 at 0.84
Rank 13 at 0.829
Rank 14 at 0.804
Rank 15 at 0.762
Rank 16 at 0.722
Rank 17 at 0.723
Rank 18 at 0.648
Rank 19 at 0.586
Rank 20 at 0.504
Rank 21 at 0.36
Rank 22 at 0.174

I revised the seed search to look at the best two seeds in terms of average Hamming distance, and picked the one that is more often invertible for rank D-4 through D-2.  This comes from the fact that often times the average Hamming distance being higher doesn't always mean it is a better choice.  The real test is how often it is full rank.

I set a 4 minute timeout for the best seed search and let it run overnight.  The result is a small 118-element table that determines all of the unchanging dense rows in the Wirehair check matrix for N=2 up to N=64000, providing best performance for a given number of dense rows.

Some other random thoughts:
+ Since D is always even, it is not necessary to handle the odd case in the algorithm.
+ The average rank of randomly selected columns drops off pretty sharply near D.  To achieve 90% average invertibility, the number of rows needs to roughly double and add one.  Adding just one row puts it above normal random matrices for invertibility rate.
+ The seeds are not necessarily the best that could be found, since for each D, a range of N use that value of D, and this is much less than D*D -- it is up to 64,000 tops.
last edit by catid edited (>30 days ago) 9:59pm Tue. Mar 13th 2012 PDT