I came up with some ideas to improve the LZ bitstream format for GCIF that are paying off:

(1) Include "same as last 1-4 distance" code and use it as often as possible.

(2) Use most common 2D distance matches for escape codes in addition to common lengths. This results in LZ matches under 5 pixels that often take only 7-9 bits to represent overall.

(3) Use only short literal lengths for escape codes. This is great because only long matches will make it through to the default "complex LZ" escape code, where I do not care too much about being careful about bit representation.

(4) Only match up to 256 and if I have to emit a match length, emit a literal Huffman code for 2-256. Longer matches do not matter and this makes full use of the statistics.

(5) To encode distances, an extended "local region" of pixels up to 16 to the left/right for the current and previous row, and -8 through +8 for the next 7 rows are assigned symbols in a Huffman code, in addition to a bit count encoding that indicates how many bits the distance contains as well as some of the high bits.

(6) Low bits of the distances are encoded with 1-3 Huffman codes to further compress the literal distances.

It's all about the distance encoding.

When the distance can use a fast escape code of common/recent values you can represent a match in 7-9 bits. When it's in the local neighborhood of common positions, it takes 11-15 bits. When the distance can be represented in under 16 bits before compression it takes about 21-22. And if it's a global match it takes 24-29.

While this bitstream format is much nicer than the old approach on average, it is still sometimes better to not use LZ and just allow the normal natural image compressor do its work. This is because the natural image compressor can often emit a long string of zeroes with a single symbol, which can be more efficient than using LZ.

The next step is to determine when to use LZ or not for RGBA data and tune it to work well across our test set.

My plan for this is to:

(1) Perform L1-norm-based filter selection and residual computation for a first order approximation of the residuals to determine how well the natural image compressor works for each pixel.

(2) Come up with some heuristics for how the residuals can map to encoding cost. Probably a lookup table. And calculate the approximate bit savings for each match. This is definitely an upper bound on how many bits each residual costs to encode, so if our LZ match encoding is worse than this then we should not emit the LZ match.

(3) Come up with some simple heuristics during match searching since distance encoding is a strong indicator of how expensive a match is to represent.

(4) After finding all the best matches, simulate each of the LZ match encodings to generate an approximate bit cost.

(5) Reject matches that are too expensive compared to natural image compression.

(1) Include "same as last 1-4 distance" code and use it as often as possible.

(2) Use most common 2D distance matches for escape codes in addition to common lengths. This results in LZ matches under 5 pixels that often take only 7-9 bits to represent overall.

(3) Use only short literal lengths for escape codes. This is great because only long matches will make it through to the default "complex LZ" escape code, where I do not care too much about being careful about bit representation.

(4) Only match up to 256 and if I have to emit a match length, emit a literal Huffman code for 2-256. Longer matches do not matter and this makes full use of the statistics.

(5) To encode distances, an extended "local region" of pixels up to 16 to the left/right for the current and previous row, and -8 through +8 for the next 7 rows are assigned symbols in a Huffman code, in addition to a bit count encoding that indicates how many bits the distance contains as well as some of the high bits.

(6) Low bits of the distances are encoded with 1-3 Huffman codes to further compress the literal distances.

* RGBA LZ Escape Bitstream Format

*

* Attached to the Y channel of the YUV encoder output are a number of

* extra "escape codes" that cause the decoder to go into an LZ decoding

* mode for the next few bits.

*

*

* The extra (23) escape codes are:

*

* 256 = "Distance Same as 1st Last"

* 257 = "Distance Same as 2nd Last"

* 258 = "Distance Same as 3rd Last"

* 259 = "Distance Same as 4th Last"

* 260 = "Distance = 1"

* 261 = "Distance = 3"

* 262 = "Distance = 4"

* 263 = "Distance = 5"

* 264 = "Distance = 6"

* 265 = "Distance = (-2, -1)"

* 266 = "Distance = (-1, -1)"

* 267 = "Distance = (0, -1)"

* 268 = "Distance = (1, -1)"

* 269 = "Distance = (2, -1)"

* 270 = "Length = 2"

* 271 = "Length = 3"

* 272 = "Length = 4"

* 273 = "Length = 5"

* 274 = "Length = 6"

* 275 = "Length = 7"

* 276 = "Length = 8"

* 277 = "Length = 9"

* 278 = Complex match follows (matching at least 10 pixels).

*

* For codes 256-269, the following bits are a Length code.

* For codes 270-277, the following bits are a Distance code.

* For code 278, the following bits are a Length code and then a Distance code.

*

*

* Length Huffman code:

*

* 0 = "Length = 2"

* 1 = "Length = 3"

* ...

* 254 = "Length = 256"

*

*

* Distance Huffman code:

*

* 0 = "Distance = 2"

* 1 = "Distance = 7"

* 2 = "Distance = 8"

* 3 = "Distance = 9"

* 4 = "Distance = 10"

* 5 = "Distance = 11"

* 6 = "Distance = 12"

* 7 = "Distance = 13"

* 8 = "Distance = 14"

* 9 = "Distance = 15"

* 10 = "Distance = 16"

*

* 11 = "Distance = (-16, -1)"

* 12 = "Distance = (-15, -1)"

* ...

* 24 = "Distance = (-3, -1)"

* 25 = "Distance = (3, -1)"

* ...

* 38 = "Distance = (16, -1)"

*

* 39 - 55 = "Distance (-8, -2) ... (8, -2)"

* 56 - 72 = "Distance (-8, -3) ... (8, -3)"

* 73 - 89 = "Distance (-8, -4) ... (8, -4)"

* 90 - 106 = "Distance (-8, -5) ... (8, -5)"

* 107 - 123 = "Distance (-8, -6) ... (8, -6)"

* 124 - 140 = "Distance (-8, -7) ... (8, -7)"

* 141 - 157 = "Distance (-8, -8) ... (8, -8)"

*

* Let D = Distance - 17.

*

* 158 = D in [0, 1] + 1 extra bit emitted

* 159 = D in [2, 3] + 1 extra bit emitted

* 160 = D in [4, 5] + 1 extra bit emitted

* 161 = D in [6, 7] + 1 extra bit emitted

* 162 = D in [8, 11] + 2 extra bits emitted

* 163 = D in [12, 15] + 2 extra bits emitted

* 164 = D in [16, 19] + 2 extra bits emitted

* 165 = D in [20, 23] + 2 extra bits emitted

* 166 = D in [24, 31] + 3 extra bits emitted

* 167 = D in [32, 39] + 3 extra bits emitted

* 168 = D in [40, 47] + 3 extra bits emitted

* 169 = D in [48, 55] + 3 extra bits emitted

* 170 = D in [56, 71] + 4 extra bits emitted

* ...

* 226 = D in [524280, 655351] + 18 bits emitted

* 227 = D in [655352, 786423] + 18 bits emitted

* 228 = D in [786424, 917495] + 18 extra bits emitted

* 229 = D in [917496, 1048567] + 18 extra bits emitted

*

* 0000000 - 0000111 <- 1 bit extra

* 0001000 - 0010111 <- 2

* 0011000 - 0110111 <- 3

* 0100000

* 0101000 <- 3 codes

* 0110000

* 0111000 - 1110111 <- 4

*

* Converting from D to (Code, EB):

* EB = Bits(D >> 3) + 1

* C0 = ((1 << (EB - 1)) - 1) << 3

* Code = ((D - C0) >> EB) + 158 + ((EB - 1) * 4)

*

* Converting from Code to (D0, EB):

* EB = ((Code - 158) >> 2) + 1

* C0 = ((1 << (EB - 1)) - 1) << 3

* D0 = (((Code - 154) - (EB * 4)) << EB) + C0;

* D = D0 + ExtraBits(EB)

*

* Attached to the Y channel of the YUV encoder output are a number of

* extra "escape codes" that cause the decoder to go into an LZ decoding

* mode for the next few bits.

*

*

* The extra (23) escape codes are:

*

* 256 = "Distance Same as 1st Last"

* 257 = "Distance Same as 2nd Last"

* 258 = "Distance Same as 3rd Last"

* 259 = "Distance Same as 4th Last"

* 260 = "Distance = 1"

* 261 = "Distance = 3"

* 262 = "Distance = 4"

* 263 = "Distance = 5"

* 264 = "Distance = 6"

* 265 = "Distance = (-2, -1)"

* 266 = "Distance = (-1, -1)"

* 267 = "Distance = (0, -1)"

* 268 = "Distance = (1, -1)"

* 269 = "Distance = (2, -1)"

* 270 = "Length = 2"

* 271 = "Length = 3"

* 272 = "Length = 4"

* 273 = "Length = 5"

* 274 = "Length = 6"

* 275 = "Length = 7"

* 276 = "Length = 8"

* 277 = "Length = 9"

* 278 = Complex match follows (matching at least 10 pixels).

*

* For codes 256-269, the following bits are a Length code.

* For codes 270-277, the following bits are a Distance code.

* For code 278, the following bits are a Length code and then a Distance code.

*

*

* Length Huffman code:

*

* 0 = "Length = 2"

* 1 = "Length = 3"

* ...

* 254 = "Length = 256"

*

*

* Distance Huffman code:

*

* 0 = "Distance = 2"

* 1 = "Distance = 7"

* 2 = "Distance = 8"

* 3 = "Distance = 9"

* 4 = "Distance = 10"

* 5 = "Distance = 11"

* 6 = "Distance = 12"

* 7 = "Distance = 13"

* 8 = "Distance = 14"

* 9 = "Distance = 15"

* 10 = "Distance = 16"

*

* 11 = "Distance = (-16, -1)"

* 12 = "Distance = (-15, -1)"

* ...

* 24 = "Distance = (-3, -1)"

* 25 = "Distance = (3, -1)"

* ...

* 38 = "Distance = (16, -1)"

*

* 39 - 55 = "Distance (-8, -2) ... (8, -2)"

* 56 - 72 = "Distance (-8, -3) ... (8, -3)"

* 73 - 89 = "Distance (-8, -4) ... (8, -4)"

* 90 - 106 = "Distance (-8, -5) ... (8, -5)"

* 107 - 123 = "Distance (-8, -6) ... (8, -6)"

* 124 - 140 = "Distance (-8, -7) ... (8, -7)"

* 141 - 157 = "Distance (-8, -8) ... (8, -8)"

*

* Let D = Distance - 17.

*

* 158 = D in [0, 1] + 1 extra bit emitted

* 159 = D in [2, 3] + 1 extra bit emitted

* 160 = D in [4, 5] + 1 extra bit emitted

* 161 = D in [6, 7] + 1 extra bit emitted

* 162 = D in [8, 11] + 2 extra bits emitted

* 163 = D in [12, 15] + 2 extra bits emitted

* 164 = D in [16, 19] + 2 extra bits emitted

* 165 = D in [20, 23] + 2 extra bits emitted

* 166 = D in [24, 31] + 3 extra bits emitted

* 167 = D in [32, 39] + 3 extra bits emitted

* 168 = D in [40, 47] + 3 extra bits emitted

* 169 = D in [48, 55] + 3 extra bits emitted

* 170 = D in [56, 71] + 4 extra bits emitted

* ...

* 226 = D in [524280, 655351] + 18 bits emitted

* 227 = D in [655352, 786423] + 18 bits emitted

* 228 = D in [786424, 917495] + 18 extra bits emitted

* 229 = D in [917496, 1048567] + 18 extra bits emitted

*

* 0000000 - 0000111 <- 1 bit extra

* 0001000 - 0010111 <- 2

* 0011000 - 0110111 <- 3

* 0100000

* 0101000 <- 3 codes

* 0110000

* 0111000 - 1110111 <- 4

*

* Converting from D to (Code, EB):

* EB = Bits(D >> 3) + 1

* C0 = ((1 << (EB - 1)) - 1) << 3

* Code = ((D - C0) >> EB) + 158 + ((EB - 1) * 4)

*

* Converting from Code to (D0, EB):

* EB = ((Code - 158) >> 2) + 1

* C0 = ((1 << (EB - 1)) - 1) << 3

* D0 = (((Code - 154) - (EB * 4)) << EB) + C0;

* D = D0 + ExtraBits(EB)

It's all about the distance encoding.

When the distance can use a fast escape code of common/recent values you can represent a match in 7-9 bits. When it's in the local neighborhood of common positions, it takes 11-15 bits. When the distance can be represented in under 16 bits before compression it takes about 21-22. And if it's a global match it takes 24-29.

While this bitstream format is much nicer than the old approach on average, it is still sometimes better to not use LZ and just allow the normal natural image compressor do its work. This is because the natural image compressor can often emit a long string of zeroes with a single symbol, which can be more efficient than using LZ.

The next step is to determine when to use LZ or not for RGBA data and tune it to work well across our test set.

My plan for this is to:

(1) Perform L1-norm-based filter selection and residual computation for a first order approximation of the residuals to determine how well the natural image compressor works for each pixel.

(2) Come up with some heuristics for how the residuals can map to encoding cost. Probably a lookup table. And calculate the approximate bit savings for each match. This is definitely an upper bound on how many bits each residual costs to encode, so if our LZ match encoding is worse than this then we should not emit the LZ match.

(3) Come up with some simple heuristics during match searching since distance encoding is a strong indicator of how expensive a match is to represent.

(4) After finding all the best matches, simulate each of the LZ match encodings to generate an approximate bit cost.

(5) Reject matches that are too expensive compared to natural image compression.

last edit by catid
edited (>30 days ago) 12:41pm Thu. Jul 11th 2013 PDT