Code Poetry
and Text Adventures

by catid posted (>30 days ago) 12:37am Mon. Jun 12th 2017 PDT
I just updated my website to a modern design here:

https://catid.io
by catid posted (>30 days ago) 2:07am Tue. Jun 6th 2017 PDT
Results from hand-tuned (Windows-only) worker thread-pool for Leopard:

It's actually pretty hard to tell why it's not 8x faster.  I'm guessing that it is hitting a memory bandwidth limit on the processor or something like that, based on my previous multi-threaded XOR test.

[code]
Packet-sized data 1 MB with 10% redundancy:

Parameters: [original count=1000] [recovery count=100] [buffer by... (more)
last edit by catid edited (>30 days ago) 3:27am Tue. Jun 6th 2017 PDT
by catid posted (>30 days ago) 4:19pm Mon. Jun 5th 2017 PDT
I recently ran an experiment to see how fast, using multi-threading on the CPU, we can XOR two sets of buffers in memory that do not overlap.  A worker pool of the same size as the CPU count fights over the work items.

Spoilers: For any memory bound work-load over large data the correct solution is to create as many work items as there are CPU cores (even counting hyperthreading).

Her... (more)
last edit by catid edited (>30 days ago) 4:26pm Mon. Jun 5th 2017 PDT
by catid posted (>30 days ago) 3:34am Sun. May 28th 2017 PDT
Update

With more optimization the latest results on the same laptop are:

[code]
Leopard Encoder(8.192 MB in 128 pieces, 128 losses): Input=2102.67 MB/s, Output=2102.67 MB/s
Leopard Decoder(8.192 MB in 128 pieces, 128 losses): Input=686.212 MB/s, Output=686.212 MB/s

Leopard Encoder(64 MB in 1000 pieces, 200 losses): Input=2194.94 MB/s, Output=438.988 MB/s
Leopard Decoder(64 MB in... (more)
last edit by catid edited (>30 days ago) 7:10pm Sun. Jun 4th 2017 PDT
by catid posted (>30 days ago) 12:53am Wed. May 17th 2017 PDT
For implementing math modulo Q, a number of the form Q = 2^^n - c, there's a neat trick I learned a while back called partial reduction.

The trick is to allow the inputs and outputs to your "add" and "subtract" functions to go outside the range 0..(Q-1).  Instead of having to constantly fix the result of every operation to go back in that range, the code can be sloppy (and faster) during calcul... (more)
last edit by catid edited (>30 days ago) 12:59am Wed. May 17th 2017 PDT
by catid posted (>30 days ago) 12:29am Mon. May 15th 2017 PDT
I just released a few new software libraries on github:

https://github.com/catid/fecal

FEC-AL is a simple, portable, fast library for Forward Error Correction. FEC-AL is a block codec derived from the Siamese streaming FEC library. From a block of input data it generates recovery symbols that can be used to recover lost originals.

It requires that data pieces are all a fixed si... (more)
by catid posted (>30 days ago) 5:21am Sat. Mar 18th 2017 PDT
I've been reading up on ML and CV lately and noticed some things that seem to be largely overlooked:


We should investigate doing machine learning (ML) for computer vision (CV) differently:

(1) We should try modelling an eyeball scanning the world rather than running it against lens-distorted camera images.  This means we should add a new step to the front-end of image recognition (and othe... (more)
last edit by catid edited (>30 days ago) 12:20pm Sat. Mar 18th 2017 PDT
by catid posted (>30 days ago) 10:16pm Tue. Dec 13th 2016 PST
Determine which bit 0..7 to flip to generate the next Gray code value, without a table: (Haven't seen this before)

[code]
// Get the next bit to flip to produce the 8-bit Gray code at the provided index
// Precondition: index > 0 and index < 256
static int GetBitFlipForGrayCode8(int index)
{
   if (index & 1)
      return 0;

   if (index & 15)
      return (0x6764 >> (index & 14... (more)
last edit by catid edited (>30 days ago) 10:54pm Tue. Dec 13th 2016 PST
by catid posted (>30 days ago) 4:47pm Tue. Dec 13th 2016 PST
4-bit Gray Codes are binary sequences that flip one bit instead of adding one each time to get to the next sequence element.

[code]
#include <iostream>
#include <iomanip>
#include <cassert>
using namespace std;

int only_one_bit_set_to_one(uint32_t b)
{
   return b && !(b & (b - 1));
}

bool hamming1(uint32_t a, uint32_t b)
{
   return only_one_bit_set_to_one(a ^ b);
}

void s... (more)
last edit by catid edited (>30 days ago) 4:58pm Tue. Dec 13th 2016 PST
by catid posted (>30 days ago) 9:23pm Sat. Oct 1st 2016 PDT
I've been building helper classes with a certain structure these days and it's paid off:

(1) Encapsulate all the class state in a separate struct with default values for all the state members
(2) Give the state object a Json serializer
(3) Reset the state with MyState = StateT(); in the class
(4) Return the state with a getter instead of providing lots of accessors

The wins are that it's ... (more)
by catid posted (>30 days ago) 1:14am Sun. Aug 7th 2016 PDT
These are all interesting because they have ARM NEON targets and generally work very well on mobile:

For linear algebra:
Google MathFu based on Vectorial: http://google.github.io/mathfu/

For compression:
Zstd: https://github.com/Cyan4973/zstd

For encryption:
AEAD: https://github.com/norx/norx/tree/master/norx6441

For key agreement:
DH: [url]https://... (more)
last edit by catid edited (>30 days ago) 1:16am Sun. Aug 7th 2016 PDT
by catid posted (>30 days ago) 11:10pm Sat. Jul 2nd 2016 PDT
Just a little toy project to share:

[quote]
/*

Generate a tight polynomial fit from example data
and calculate error level in the exact solution

(1) Use scaled input data.

Based on the included benchmark:
   For unscaled data, average error level = 2.19262e-06
   For (0..1) scaled data, average error level = 2.54577e-07
The result is that scaled data interpolates 10x better on a... (more)
by catid posted (>30 days ago) 1:52am Wed. May 4th 2016 PDT
For game DRM-type security I would use this product if possible to avoid writing something that will get defeated easily in practice:
http://denuvo.com/

Encryption is a lot of fun to work on.  These days I think this might be one of the best libraries for it:
https://nacl.cr.yp.to/

Other awesome related algorithms for...

Password hashing:
[url]https://password-has... (more)
by catid posted (>30 days ago) 7:34pm Thu. Nov 19th 2015 PST
Compare CatsChoice PRNG with this one: http://vigna.di.unimi.it/ftp/papers/xorshift.pdf
by catid posted (>30 days ago) 10:30pm Sun. Nov 8th 2015 PST
So I explored this more and learned more about CRCs.  This was a particularly good review: http://www.tc.faa.gov/its/worldpac/techrpt/tc14-49.pdf

CRC error detection/recovery really only makes sense for wireless sensor networks (or other link layer-type situations) since it's designed for a few randomly-located bit flips in the data.  Wireless channels have a quantifiable *bit* error... (more)
last edit by catid edited (>30 days ago) 10:35pm Sun. Nov 8th 2015 PST
by catid posted (>30 days ago) 3:41pm Wed. Nov 4th 2015 PST
CLMUL is useful for accelerating CRC calculations and binary extension fields for cryptography acceleration.

https://msdn.microsoft.com/en-us/library/cc664767(v=vs.120).aspx

https://en.wikipedia.org/wiki/CLMUL_instruction_set#CPUs_with_CLMUL_instruction_set

[url]http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-poly... (more)
last edit by catid edited (>30 days ago) 4:05pm Wed. Nov 4th 2015 PST
by catid posted (>30 days ago) 8:26pm Fri. Oct 30th 2015 PDT
The CM256 block erasure codec is up here: https://github.com/catid/cm256

It uses a Cauchy matrix modified so that the first row is all ones, for better performance.

The encoder is just a matrix-vector product.  I think there might be a way to speed that up more.

The latest version of the decoder takes some subset of the matrix and LDU-decomposes it into three matrices that can ... (more)
last edit by catid edited (>30 days ago) 8:31pm Fri. Oct 30th 2015 PDT
by catid posted (>30 days ago) 1:00pm Sat. Oct 24th 2015 PDT
I put up this library today on github: https://github.com/catid/wh256/.

It's built on Wirehair (hybrid GF(256)+LDPC) and my new CM256 reed solomon block codec.  The idea is to use RS for small numbers of blocks where its O(N^2) performance is still practically fast, and switch to Wirehair for larger block counts.  The switch-over seems to happen at about N=32.

I also optimized Wir... (more)
by catid posted (>30 days ago) 1:34am Mon. Oct 12th 2015 PDT
Can tweak things a bit more of course to suit the expected size of the data, and unrolling it once more for 80 bytes at a time seems to be marginally faster but it's down to tuning at that point.

[code]
void bulk_xor(void * __restrict vdest,
                  const void * __restrict vsrc, int bytes)
{
   __m128i * __restrict dest16 = reinterpret_cast<__m128i *>(vdest);
   const __m128i ... (more)
last edit by catid edited (>30 days ago) 1:34am Fri. Oct 16th 2015 PDT
by catid posted (>30 days ago) 12:07pm Mon. Mar 30th 2015 PDT
Dead Reckoning for 0 acceleration:

Integrate between (x, v, a=0) at t = 0, to (x, v, a=0) at t = dt, where acceleration 'a' is zero over the time interval [0, dt].

x += v * dt;

Velocity Verlet for constant acceleration:

Integrate between (x, v, a) at t = 0, to (x, v, a) at t = dt, where acceleration 'a' is constant over the time interval [0, dt].

[code... (more)
last edit by catid edited (>30 days ago) 12:10pm Mon. Mar 30th 2015 PDT
by catid posted (>30 days ago) 12:09am Tue. Feb 10th 2015 PST
[code]
Matrix rows of all ones is desired.

To get these, one row symbol is taken as preferred.

All the other rows are chosen from this symbol.

Some rows have more ones than others.  Prefer ones with fewer ones.

When a small amount of traffic is being sent, 5-bit CRS codes are more ideal.

OOB data is one option.

For original data:
+ OOB flag (1 bit)
+ Code group (7 bits)
+ Ori... (more)
last edit by catid edited (>30 days ago) 12:10am Tue. Feb 10th 2015 PST
by catid posted (>30 days ago) 10:06pm Sun. Dec 14th 2014 PST
It looks like Dr. Plank was threatened with patent infringement litigation by StreamScale.  His software page now indicates that Jerasure is unsupported.

Some more details came out recently on this site: http://erasure-code-patents.xyz/2014/11/

There are two papers that have been published explaining what the patent is mainly about:

There is prior art in the Linux kernel relate... (more)
last edit by catid edited (>30 days ago) 10:28pm Sun. Dec 14th 2014 PST
by (Anonymous) posted (>30 days ago) 10:34am Mon. Jun 15th 2015 PDT
by catid posted (>30 days ago) 8:30pm Thu. Dec 4th 2014 PST
In the past when I've looked at checksums they've only had a power of two number of symbols, like 2^5, 2^32, etc.  So the go-to choice has been a CRC.  Recently I had reason to look at symbol counts other than a power of two.

First I resisted and implemented a CRC with 5 bits:

[code]
static const char CRC5_TAB[32] = {
   0, 11, 22, 29, 5, 14, 19, 24, 10, 1, 28, 23, 15, 4, 25, 18,
   20,... (more)
by catid posted (>30 days ago) 6:38pm Tue. Nov 25th 2014 PST
Given a constraint on a matrix size (MxK), what's the best Galois field MDS (Reed Solomon) matrix (of any polynomial or bit count) for multiplication in the context of erasure codes?

Some background:

Galois field (GF) multiplication boils down to XORs.

GF elements are numbers between 0 and 2^^(bit count).  Let the bit count be denoted B.

GF elements can be expanded out into BxB binary ... (more)
last edit by catid edited (>30 days ago) 10:43pm Tue. Nov 25th 2014 PST
by catid posted (>30 days ago) 9:29pm Sun. Nov 23rd 2014 PST
The MDS-AES paper, while somewhat broken as it turns out, is interesting for erasure codes.  First a glossary of terms:

MDS code: Maximum distance separable code.  This means it can provide the same optimal recovery performance as Reed-Solomon codes.

Cauchy/Vandermonde matrix: Good references on wikipedia.  Just ways to construct MDS codes.

Punctured code: This means that rows or columns ... (more)
by catid posted (>30 days ago) 2:03pm Tue. Aug 5th 2014 PDT
So the last 4 months have been incredibly busy.  I took on a job at Oculus VR, which was then quickly acquired by Facebook.

In that time I've been working roughly 12-15 hrs/day and weekends on the software that accompanies the headset.  A lot of my interests happen to overlap.  For instance, they're interested in using my erasure code software for wireless sensor updates, and the encryption too... (more)
by (Anonymous) posted (>30 days ago) 12:25pm Sat. Sep 19th 2015 PDT
by catid posted (>30 days ago) 7:07pm Thu. Mar 6th 2014 PST
Here is the performance I'm getting on the iPad 2:

[code]
Generating a 256-bit entropy server key...
+ Successfully created a new server key in 11017.000000 usec (one sample)
+ Tabby sign: `587.550000` avg usec
+ Tabby verify signature: `1538.910000` avg usec
+ Tabby reject signature: `1509.560000` avg usec
+ Signature validation test successful!
Generating a 256-bit entropy client key..... (more)
last edit by catid edited (>30 days ago) 3:25pm Fri. Mar 7th 2014 PST
by catid posted (>30 days ago) 1:58am Sat. Mar 1st 2014 PST
I just finished implementing periodic rekeying for Calico, my AEAD construction based on ChaCha and SipHash.

It's all unit tested and working properly, and the mobile version is updated.

If you want to take a look at how it works, the docs are here: https://github.com/catid/calico/blob/master/README.md.

The rekeying is documented around here: [url]https://github.com/catid/calic... (more)
by catid posted (>30 days ago) 1:18pm Mon. Feb 24th 2014 PST
I revised Calico again this weekend: https://github.com/catid/calico.

It now refers to itself as an AEAD and uses more of the right words for things, as well as switching to SipHash for MAC, which is somewhat faster for small/medium sized packets as well as being simpler.  I added more unit tests and checked my SipHash against the test vectors for SipHash to make sure it was solid.  ... (more)
last edit by catid edited (>30 days ago) 5:24pm Mon. Feb 24th 2014 PST
by catid posted (>30 days ago) 12:53am Thu. Feb 13th 2014 PST
The Longhair project ( http://github.com/catid/longhair ) is a pretty good example of a fast UDP packet erasure code.  It is fast for the range of values that are most useful for real-time video/games.

The way I imagined applying the Longhair codec is to switch to a new code every RTT/2 seconds.  Let's call this the "RTT/2-switch" method.  In this method redundant data from the previ... (more)
last edit by catid edited (>30 days ago) 10:41am Thu. Feb 13th 2014 PST
by catid posted (>30 days ago) 11:10am Wed. Oct 29th 2014 PDT
by (Anonymous) posted (>30 days ago) 2:43pm Thu. Aug 28th 2014 PDT
by catid posted (>30 days ago) 12:14am Mon. Feb 10th 2014 PST
Useful webtools:

http://gcc.godbolt.org/

http://decompiler.fit.vutbr.cz/decompilation/
by catid posted (>30 days ago) 12:59am Mon. Feb 3rd 2014 PST
My hobby project for the past two weeks has been implementing Cauchy Reed-Solomon (CRS) codes with low overhead, so that they can be used for real-time communication over wireless Internet.  It's finally finished and I have some results to share! =)

The code is somewhat hidden away in a branch of Wirehair right now: https://github.com/catid/wirehair/tree/crs/crs  It's not quite finis... (more)
last edit by catid edited (>30 days ago) 1:25am Mon. Feb 3rd 2014 PST
by catid posted (>30 days ago) 9:36pm Tue. Jan 21st 2014 PST
I looked at the V8 Math.Random() internal implementation today from a cryptographer perspective. It's based on two Marsaglia MWC generators running in parallel. The low 16-bit outputs of each are concatenated into a 32-bit output. It's seeded with /dev/urandom, which is good.

The problem is that each output gives away half the internal state of the generator, so you can recover the full state g... (more)
last edit by catid edited (>30 days ago) 9:36pm Tue. Jan 21st 2014 PST
by catid posted (>30 days ago) 1:22pm Wed. Jan 15th 2014 PST
Great consulting website: http://www.dlbeer.co.nz/
I really need to ditch this 90s aesthetic.
by catid posted (>30 days ago) 10:41am Wed. Jan 15th 2014 PST
This is the best one so far:

http://www.math.brown.edu/~jhs/Presentations/WyomingEllipticCurve.pdf
last edit by catid edited (>30 days ago) 10:41am Wed. Jan 15th 2014 PST
by catid posted (>30 days ago) 11:53am Tue. Jan 14th 2014 PST
Car updates over RF are a great application for fountain codes (e.g. https://github.com/catid/wirehair).

Using cheap per-unit GPRS radios, the cars can dial home for updates periodically, and log into an update server when updates are required.  The protocol can run over GPRS and would be UDP-based containing:

[code]
[CRC(8 bytes)][Control Header(1 byte)][Error Correction Header(... (more)
last edit by catid edited (>30 days ago) 10:23pm Tue. Jan 14th 2014 PST
by catid posted (>30 days ago) 4:18am Sun. Jan 12th 2014 PST
Tabby now supports an augmented PAKE scheme at high speed:

https://github.com/catid/tabby/blob/master/PASSWORD.md
last edit by catid edited (>30 days ago) 4:18am Sun. Jan 12th 2014 PST
by catid posted (>30 days ago) 6:13pm Mon. Jan 6th 2014 PST
The accelerometer on the mobile device will include the gravitational force, which is a constant.  To normalize the input x,y,z acceleration data to be relative to g = 1 across all devices, I take a very low-pass filter to the amplitude of each measurement to arrive at the long-term magnitude average, and then normalize the x,y,z values with that magnitude.

These samples are stored with their a... (more)
last edit by catid edited (>30 days ago) 6:31pm Mon. Jan 6th 2014 PST
by catid posted (>30 days ago) 6:55pm Sat. Jan 4th 2014 PST
The Tabby project (http://github.com/catid/tabby) is going great so far.  It seems to provide optimal PKI handshakes.

The next obvious step is password authentication.  If I'm going to unseat SRP6a from its throne there needs to be a pretty good argument in favor of its successor.

The disadvantages of SRP6a right now are:

+ Lack of integration with PKI.
+ Slow performance: 256... (more)
last edit by catid edited (>30 days ago) 3:50pm Thu. Jan 16th 2014 PST
by catid posted (>30 days ago) 12:42pm Thu. Jan 2nd 2014 PST
I found that the Linaro version of libjpeg-turbo has a tjDecompress2() function that can write at least 1 byte beyond the end of the provided buffer.  This bug was causing hard to trace crashes in other parts of the code or right after image loading at 0xdeadbaad on Android.  It took me embarrassingly long to find this bug in our product.  Here's an example of the work-around:

[code]
   unsig... (more)
last edit by catid edited (>30 days ago) 1:05pm Thu. Jan 2nd 2014 PST
by catid posted (>30 days ago) 9:30pm Sun. Dec 1st 2013 PST
Version 1.0 of Snowshoe is up!

https://github.com/catid/snowshoe/

This is now the world's fastest open-source Elliptic Curve cryptography library.  Meaning that each function that it provides completes faster than any alternative library available online.

Snowshoe is a conservative library designed for simplicity and testability that has undergone rigorous testing.  The functio... (more)
by catid posted (>30 days ago) 11:42am Fri. Nov 1st 2013 PDT
Useful simple library project: Ultrasonic 4-QAM+audible chirp+error correction => Send short sonic handset discovery codes for social apps.

http://chirp.io is doing this already.

It would be great to have an open-source library people can use as a sonic software modem.  I envision it being useful for sending short invite codes to nearby friends in online social mobile games, and then complet... (more)
last edit by catid edited (>30 days ago) 11:43am Fri. Nov 1st 2013 PDT
by catid posted (>30 days ago) 3:27am Sun. Oct 20th 2013 PDT
Noticed that curves with CM endomorphisms can use slightly more efficient Twisted Edwards addition laws with a change of variables:

https://github.com/catid/snowshoe/blob/master/LaineyCurves.md
last edit by catid edited (>30 days ago) 11:39am Fri. Nov 1st 2013 PDT
by catid posted (>30 days ago) 12:34am Fri. Oct 11th 2013 PDT
Here's my MAGMA script (first I've ever written, and all from scratch, and it worked! :) to test the claim of http://www.chesworkshop.org/ches2010/presentations/CHES2010_Session02_Talk03.pdf that the curve they specify for "ted1271gls" on slide 20 has secure group order (though not on its twist):

[code]
p := 2^127-1;
K<w> := GF(p^2);

mu := 2 + w;
aa := -mu;
dd := 109*mu;

A2... (more)
last edit by catid edited (>30 days ago) 12:44am Fri. Oct 11th 2013 PDT
by catid posted (>30 days ago) 12:12pm Wed. Oct 9th 2013 PDT
i ported libstrophe to iOS and rewrote their TLS to use Apple's built-in Secure Transport so no need for heavy multi-MB OpenSSL libraries:

https://github.com/gameclosure/libstrophe-ios

the library works on armv7 armv7s and i386 simulator
playing with TLS for mobile was interesting
because i want to release my own
now i know what OpenSSL, gnuTLS, Windows schannel and Apple Secur... (more)
by catid posted (>30 days ago) 1:08pm Wed. Sep 25th 2013 PDT
Here's our mobile product reacting to a CONSTANT FLURRY OF MEMORY WARNINGS from iOS 7 on iPhone 4s.  Whee.  I'm going to just have to ignore memory warnings on iPhone 4s, since apparently it's broken.

iPhone 4s on earlier versions of iOS did not do this.  Nor does iOS 7 on iPhone 5 or tablets..

[code]
2013-09-25 13:12:31.726 by.wee.bubble.pang[7937:60b] Received memory warning.
2013-09-25 ... (more)
last edit by catid edited (>30 days ago) 1:12pm Wed. Sep 25th 2013 PDT
by catid posted (>30 days ago) 6:07pm Sat. Sep 21st 2013 PDT
TL;DR Check out the code here: https://github.com/catid/brook/blob/master/Redundancy.cpp


Motivation

Streaming data over a channel in packets (like Internet or deep space) is incredibly well-studied and modeled.  The major variables are: Transmission delay average/maximum.  Ambient random packet loss probability.  Maximum "normal" burst loss duration in milliseconds.  Ther... (more)
last edit by catid edited (>30 days ago) 6:42pm Sat. Sep 21st 2013 PDT
by catid posted (>30 days ago) 2:58pm Tue. Sep 17th 2013 PDT
I've launched a new github project at https://github.com/catid/brook that will shortly be a wrapper for Wirehair for the streaming data case.  In this case, data is buffered into chunks all the same size and sent across at whatever the input rate is without flow control.  The Wirehair error correction code is used to add check symbols for redundancy.  Calico is used for data security an... (more)
last edit by catid edited (>30 days ago) 2:59pm Tue. Sep 17th 2013 PDT
by catid posted (>30 days ago) 9:26pm Tue. Aug 27th 2013 PDT
New library: secure random number generator.

+ Split out the existing stuff.


New library: genus 2 montgomery with Q-curves

+ http://www.lix.polytechnique.fr/~smith/NT-AC/Smith.pdf


New library: Tabby.

+ My HMQV and Schnorr stuff.
+ Improve SYN-cookies to include puzzles.  Make it load-reactive.
+ Include server data in the first message.
+ c2s add a random number so client can... (more)
last edit by catid edited (>30 days ago) 3:30pm Sat. Aug 31st 2013 PDT
by catid posted (>30 days ago) 2:49am Fri. Aug 16th 2013 PDT
So I recognize that my benchmarks for GCIF + lossy formats were flawed, since the diff tool was not being used correctly, and so the GCIF residual inputs are not in any way representative of the actual data.  After testing correctly, the residuals are actually harder to compress as you may expect since they're somewhat unpredictable.

But ASTC is interesting on its own:

I just finished readin... (more)
last edit by catid edited (>30 days ago) 12:46pm Fri. Aug 16th 2013 PDT
by catid posted (>30 days ago) 5:04pm Mon. Aug 5th 2013 PDT
There seem to be two ways to do LZ decoding interleaved with other decoders:

(1) You can limit LZ matches to row length, escape into LZ mode, handle the whole match in an inner loop, and then skip ahead a number of x pixels.

(2) Or you can incorporate it into the big x,y outer loop and for each pixel have if (lz_copy_count > 0), and copy each pixel one at a time.

The advantage of approach... (more)
by catid posted (>30 days ago) 2:01pm Wed. Jul 17th 2013 PDT
This is a very exciting development from my favorite rockstar cryptogenius/codeguru D. J. Bernstein:

http://binary.cr.yp.to/mcbits-20130616.pdf

He's been working on this on the backburner for years now and finally made some breakthroughs.  He's releasing software in August (around the 20th for a conference).

Why is this exciting?

Well right now we're using RSA for SSL on the... (more)
last edit by catid edited (>30 days ago) 2:02pm Wed. Jul 17th 2013 PDT
by catid posted (>30 days ago) 2:33am Sun. Jul 14th 2013 PDT
I've been able to achieve some additional compression by better tuning the LZ77 matches and revising the bitstream format.

Based on some test files I came up with a rough approximation of how residuals map to bits used in the final file to represent RGB data.  I'm doing a rough pass through the RGBA data to come up with filters quickly, then generating residuals using L1 norm instead of entropy... (more)
last edit by catid edited (>30 days ago) 2:56am Sun. Jul 14th 2013 PDT
by catid posted (>30 days ago) 6:43pm Thu. Jul 11th 2013 PDT
Something that I came up with ages ago is actually somewhat useful to identify phones:

What you do is run a simple time synchronization protocol over the Internet to calculate the difference in clock ticks since the iOS device started up with the server ticks.

The API is mach_absolute_time and you can convert it to nanoseconds from ticks.

The clock delta is fairly unique for each device, ... (more)
by catid posted (>30 days ago) 2:14am Thu. Jul 11th 2013 PDT
I came up with some ideas to improve the LZ bitstream format for GCIF that are paying off:

(1) Include "same as last 1-4 distance" code and use it as often as possible.

(2) Use most common 2D distance matches for escape codes in addition to common lengths.  This results in LZ matches under 5 pixels that often take only 7-9 bits to represent overall.

(3) Use only short literal lengths for ... (more)
last edit by catid edited (>30 days ago) 12:41pm Thu. Jul 11th 2013 PDT
by catid posted (>30 days ago) 1:22am Tue. Jul 9th 2013 PDT
For GCIF I tried a bunch of ways to encode the LZ77 data (length + distance) into the bitstream that did not help at all today.  The solution that seems to work the best is to use the same entropy encoder for the Y channel to encode the LZ match lengths also, indicating a match start, followed by a distance code generated from a separate encoder.


Things that did not work well:

- Using a fi... (more)
by catid posted (>30 days ago) 4:40pm Mon. Jul 1st 2013 PDT
Today I think I finally understood how prefix codes are supposed to be used for compression.  It's not really what I've been lead to believe from what I've read online.

Typically people talk about prefix codes like Huffman codes as an "entropy encoding."  A well-known issue with Huffman codes is that they have a minimum length of 1 bit, which is solved in arithmetic/range coders by design.  For... (more)
last edit by catid edited (>30 days ago) 5:47pm Mon. Jul 1st 2013 PDT
by catid posted (>30 days ago) 3:27pm Mon. Jul 1st 2013 PDT
I've been working on improving GCIF (my image format) to support an LZ77 mode so that it will perform better on computer art.  These images are ~90% LZ matches when encoded efficiently, so having a good LZ subsystem is important.  Right now I have a really fast 2D-LZ subsystem that preprocesses rectangular LZ matches with a simple hash table for search.  Unsurprisingly it turns out for most images... (more)
last edit by catid edited (>30 days ago) 3:28pm Mon. Jul 1st 2013 PDT
by catid posted (>30 days ago) 6:05pm Wed. May 22nd 2013 PDT
If you have physical access to the device and can change the public key stored on the device with all FFFFFF's here's how you could generate valid-looking RSA-encrypted messages in 20 minutes with just access to a web browser.

Background: http://en.wikipedia.org/wiki/RSA_(algorithm)

The public key consists of {n, e}.  Assume e is fixed at 2^^16+1 = 65537.  So the effect of overwriting the pu... (more)
last edit by catid edited (>30 days ago) 6:25pm Wed. May 22nd 2013 PDT
by catid posted (>30 days ago) 12:40pm Tue. May 21st 2013 PDT
I was contacted this morning to do some subcontracting for NASA for forward error correction.  This got me looking at Wirehair code again, and I realized that I hadn't used the restrict keyword.

So I added it in: https://github.com/catid/wirehair/commit/7861ba6dc2a67b66f27a835071d117a634f3113f

The result is another 20 MB/s of performance on my desktop!

New results on my iMac:
... (more)
by catid posted (>30 days ago) 6:47pm Thu. May 16th 2013 PDT
Right now GCIF doesn't have good performance for small palettes.  So I'm going to do something like what WebP does in this case: Combine several pixels together into one byte and then do palette-mode compression with the result.

I'm going to need to add byte-wise modes to the preprocessing steps since currently they expect to work with RGBA data.  Preprocessing during encoding is currently incr... (more)
by catid posted (>30 days ago) 6:28pm Tue. May 14th 2013 PDT
Let's write bots for all of the somewhat popular cellphone games that allow you to gift currency.  The bot wallets will just sit on the network with real phone numbers.

To send money into the exchange you gift a bot wallet in your game of choice, which then credits you on our server.

To receive money from the exchange you text the bot wallet with a code, which then gifts your phone.

Then,... (more)
by catid posted (>30 days ago) 3:52pm Tue. May 7th 2013 PDT
Choosing the palette index for each of the <= 256 colors is essential for producing good compression results using a PNG-like filter-based approach.

Palette index assignment does not affect LZ or mask results, nor does it offer any direct improvement in entropy encoding.

However, when neighboring pixels have similar values, the filters are more effective at predicting them, which increases t... (more)
by catid posted (>30 days ago) 4:55pm Thu. May 2nd 2013 PDT
Through working on compressing 2D RGBA data I've noticed a few parts of the data are effectively monochrome data:
(1) Which color filter to use for each SF/CF filter zone.
(2) Which spatial filter to use for each SF/CF filter zone.
(3) Palette mode encoding.
(4) The alpha channel encoding.
(5) Which alpha filter to use for each alpha filter zone. (new)

What I realized while working more on... (more)
last edit by catid edited (>30 days ago) 6:28pm Thu. May 2nd 2013 PDT
by catid posted (>30 days ago) 8:23pm Sun. Apr 28th 2013 PDT
This is version 1.0 of our file format: https://github.com/gameclosure/gcif

Please visit the repo and let me know if I missed something important!
last edit by catid edited (>30 days ago) 8:23pm Sun. Apr 28th 2013 PDT
by catid posted (>30 days ago) 6:49pm Mon. Apr 22nd 2013 PDT
PNG has a couple of spatial filters like "same as left pixel" and "same as above pixel" that it applies before trying to compress the data.

My GCIF project works similarly and has filters that it applies first.  The filters chosen are much more interesting and come from a few places including two written from scratch.

Linear filters are interesting because as you can imagine there are a lot ... (more)
last edit by catid edited (>30 days ago) 6:54pm Mon. Apr 22nd 2013 PDT
by catid posted (>30 days ago) 11:30am Mon. Apr 22nd 2013 PDT
Writing this down so I don't forget:

- 1-RLE and 255-RLE in addition to zRLE

- Using left chaos metric only

- Recent post-filter color palette with escape symbols

- Same-as-last post-filter escape symbol

- LZ4HC of post-filter data followed by entropy encoding

- LZ4HC of non-zero post-filter data (seemed to be better but this didn't include non-zero escape symbols and stuff..  se... (more)
last edit by catid edited (>30 days ago) 12:33pm Mon. Apr 22nd 2013 PDT
by catid posted (>30 days ago) 6:01pm Sun. Apr 21st 2013 PDT
Just sharing my experience getting gperftools working on mac.  The goal was to do CPU profiling of the decompressor to see where I need to optimize the C code.

So I checked out gprof first, which doesn't work with clang.  So I looked at Google's gperftools.

gperftools haven't been updated in a year and look abandoned at this point.  They require code changes to do ProfilerStart() and Profile... (more)
by catid posted (>30 days ago) 2:28am Sat. Apr 20th 2013 PDT
The image compression project is coming along.  It's been about a month since I started working on it, and I've decided it's time to get the decoder finished.

This week I refactored and otherwise improved the Huffman table compression significantly, so re-zipping the output files no longer reduces the file size.  That's a good quick test to see if an image compression scheme is decent.

Here'... (more)
last edit by catid edited (>30 days ago) 6:58pm Mon. Apr 22nd 2013 PDT
by catid posted (>30 days ago) 2:25am Sun. Apr 14th 2013 PDT
So I've been exploring using color palettes in specific areas of an image to try to effect compression.

Some images have high complexity in the colors chosen, but there are spatial correlations between the color values.  For instance, a gradient has a lot of color values but a very simple pattern.  This compresses very well with context modeling that does spatial filtering and color filtering f... (more)
last edit by catid edited (>30 days ago) 2:24am Sat. Apr 20th 2013 PDT
by catid posted (>30 days ago) 2:49am Wed. Apr 10th 2013 PDT
It was actually surprisingly easy to write a 2D LZ-like algorithm so that blocks of pixel matches can be written out.  It just took me a few hours to do from scratch.  And while I don't care much about how fast it runs, it runs at an interactive rate.  That being said I'm sure there's some room for improvement.

There is still some work to be done to incorporate the LZ code into the rest of the ... (more)
last edit by catid edited (>30 days ago) 5:01am Wed. Apr 10th 2013 PDT
by catid posted (>30 days ago) 2:37am Wed. Apr 3rd 2013 PDT
http://www.eurasip.org/Proceedings/Eusipco/Eusipco2012/Conference/papers/1569551007.pdf

This is a recent paper surveying efficient color filters.  Figure 4 is amazing.

It's definitely the best work I've been able to find on the topic and is exactly what I needed to know to decide the best color filters to apply to each 4x4 pixel zone.

Right now I'm using the same color filters ... (more)
last edit by catid edited (>30 days ago) 3:21am Wed. Apr 3rd 2013 PDT
by catid posted (>30 days ago) 8:24pm Thu. Mar 28th 2013 PDT
The way our average cellphone game lifecycle works is:

1. Money goes into ads to drive it up the charts.
2. Users jump on the game in droves and we have a huge spike in active users.
3. Users slowly lose interest and fall off the game.
4. Some stay behind and we have a steady-state of active users.

We want the best impression for users to be in those critical few weeks after launch.

Du... (more)
last edit by catid edited (>30 days ago) 8:37pm Thu. Mar 28th 2013 PDT
by catid posted (>30 days ago) 7:13pm Thu. Mar 28th 2013 PDT
Mini-announcement today:

Game Closure Image Format (GCIF) is functional for 1-bit alpha channel now:

https://github.com/gameclosure/gcif

Lots of progress on the pre-release side on RGB data and it's looking goood =)
last edit by catid edited (>30 days ago) 7:13pm Thu. Mar 28th 2013 PDT
by catid posted (>30 days ago) 11:57pm Tue. Mar 26th 2013 PDT
For lossless compression, the filtering has to be done byte-wise in the decoder.  This prompted me to consider alternative pixel representations to do the math word-wise instead.  I came up with three options that all failed.  But read on if you're interested...

The basic operations to speed up would be similar to this:

A[0] += B[0] + (C[0] - D[0]) / 2
A[1] += B[1] + (C[1] - D[1]) / 2
A[2]... (more)
last edit by catid edited (>30 days ago) 12:05am Wed. Mar 27th 2013 PDT
by catid posted (>30 days ago) 7:58pm Thu. Mar 21st 2013 PDT
BCIF initially does spatial filtering on each color plane, and then it does color filtering.

After that it uses a small number of static Huffman tables, each one selected per-pixel based on the pixel context.

There are a few things left out of BCIF that leave room for improvement.

(1) The Huffman decoder they implement is inefficient, leaving room for better compression at the same speed ... (more)
last edit by catid edited (>30 days ago) 8:01pm Thu. Mar 21st 2013 PDT
by catid posted (>30 days ago) 11:49am Mon. Mar 18th 2013 PDT
At Game Closure we often compress sprite sheets with PNG for deployment to cellphones.  This is pretty terrible for a number of reasons:

(1) PNG is OLD.

Really old.  And everything that goes with that.  There have been a ton of improvements in image compression since then.

(2) PNG isn't designed for sprite sheets.

So it cannot take advantage of common features of sprite sheets.


So... (more)
last edit by catid edited (>30 days ago) 12:32pm Mon. Mar 18th 2013 PDT
by catid posted (>30 days ago) 5:21am Sat. Mar 16th 2013 PDT
I am stunned sometimes that brilliant papers go unnoticed.

"In the author's opinion there is a missing piece that would be of use
in many real world situation: there is no practical algorithm, i.e. fast and
with compression ratios comparable to JPEG2000 or JPEG-LS, that aims
to optimize the decompression time."

In other words perhaps the #1 most performed activity in the world on cellphon... (more)
last edit by catid edited (>30 days ago) 6:37pm Sun. Mar 17th 2013 PDT
by catid posted (>30 days ago) 2:22pm Mon. Mar 11th 2013 PDT
Maybe this will get finished sometime:

[code]
/*
* s[0..3] : four 32-bit words, secret key
*/

exports.ChaCha = function(s0, s1, s2, s3) {
  var x0 = 9, x1 = 9, x2 = 9, x3 = 9;
  var x4 = 9, x5 = 9, x6 = 9, x7 = 9;
  var x8 = 9, x9 = 9, xa = 9, xb = 9;
  var xc = 9, xd = 9, xe = 9, xf = 9;

  var y0 = 1779033703 ^ s0, y1 = 3144134277 ^ s1, y2 = 1013904242 ^ s2, y3 = 2773480762 ^ s3;
  v... (more)
by catid posted (>30 days ago) 4:45pm Wed. Mar 6th 2013 PST
Long story short.  The magic line:

curl --data "entry.1063614982=DATATOSEND" https://docs.google.com/a/gameclosure.com/forms/d/1zYBFvzAlUZX4JWmWY1RkPxYlRJXUszLs4aWu2nt_kPt/formResponse

Henretty (at gameclosure) pointed me at a modern way to have users add themselves to a online list:

I added a new Google Form to my Google Drive account.  Once it's done, I viewed the form ... (more)
last edit by catid edited (>30 days ago) 5:21pm Wed. Mar 6th 2013 PST
by catid posted (>30 days ago) 11:04am Thu. Feb 14th 2013 PST
Decloaking out of stealth mode, guns blazing! ;D http://www.gameclosure.com/
by catid posted (>30 days ago) 1:56pm Sat. Feb 9th 2013 PST
Software is a really new thing, and we don't have it all figured out yet, philosophically or practically.  Something that I've believed for a while found the right words in me today, and maybe they gel with the way you see things too:

There are two main camps of software developers:

Engineers and Hackers.  Both noble and respectable approaches to development in general!

Your Hackers are g... (more)
by catid posted (>30 days ago) 4:38pm Sat. Jan 12th 2013 PST
Say you're writing an online multiplayer game, where players report their scores to the server after each game.  You want to make this more tamper-proof to help prevent cheating.  Here are some ideas:

Problems With Requiring Unanimity

The problem I see with all players having to upload the same scores is that if one player changes the score, then all of the scores are invalidated.  Th... (more)
last edit by catid edited (>30 days ago) 8:18pm Sat. Jan 12th 2013 PST
by catid posted (>30 days ago) 8:23pm Tue. Dec 25th 2012 PST
Okay straight off this is a great movie!

Unlike what you may expect almost ALL of the actors can sing.  Not having to project might have helped with this.  But the cool thing is, they can sing *and* they can act at the SAME TIME.  Like they're not just beautiful to listen to but you can follow the evolution of their thoughts by watching their facial features change.  This is some seriously good... (more)
by catid posted (>30 days ago) 12:41pm Mon. Nov 26th 2012 PST
This has been rattling around in my head for a while, just thought I'd share it in case anyone finds it interesting.

For a standard 20 character product key, redeem code (Google play cards), etc, you get about 103 bits to play with.  Throw a few away for a quick parity validation check that can be coded in JavaScript in the browser, and you have 101 bits for data.

The way I'd do it that seem... (more)
last edit by catid edited (>30 days ago) 12:43pm Mon. Nov 26th 2012 PST
by catid posted (>30 days ago) 11:44am Sat. Oct 13th 2012 PDT
Deciding what to do with my Decay Opus project (http://dkop.us).

The initial plan was to just put up some poetry that decays over a period of months until it is unreadable.

But, I have been wanting to do some WebGL for a while so the current plan is a lot more complex.

I want to build a limitless 3D terrain simulation you can walk around on.  There will be limitless stars and galaxies ove... (more)
last edit by catid edited (>30 days ago) 11:46am Sat. Oct 13th 2012 PDT
by catid posted (>30 days ago) 4:49pm Wed. Aug 22nd 2012 PDT
I think the most valuable place for me to be hobbying about is in the mobile marketplace right now.  It took me a very long time to get up to speed on online game networking libraries, and the market is now too mature to break into.  However, there is room at the bottom.

It seems like there is a need for a native Android/iPhone library that provides secure, real-time messaging.  It would just b... (more)
last edit by catid edited (>30 days ago) 5:12pm Wed. Aug 22nd 2012 PDT
by catid posted (>30 days ago) 6:42pm Fri. Jul 13th 2012 PDT
I've been looking at ZMQ again recently for something at work and decided against using it, but the investigation got me thinking about ZMQ and where it fits into various applications.

ZMQ has a few different usage patterns that it is optimized for: Router, Dealer, and Pub-Sub.  The router will act as a server and accept data from clients.  The dealer will push data out to a set of end-points i... (more)
by catid posted (>30 days ago) 5:04pm Fri. Jul 13th 2012 PDT
Background: The Opus Audio Codec

I have been considering adding lossless forward error correction (FEC) to the Opus codec (http://opus-codec.org) with Wirehair.  For low data rates, Opus uses the SILK codec's built-in FEC scheme.  The built-in FEC scheme seems to be lossy in that it provides a lower-quality audio reproduction after packet loss.  For the FB and WB modes of Opus... (more)
last edit by catid edited (>30 days ago) 6:26pm Fri. Jul 13th 2012 PDT
by (Anonymous) posted (>30 days ago) 9:51pm Sat. Feb 3rd 2018 PST
by (Anonymous) posted (>30 days ago) 2:31am Sat. Apr 15th 2017 PDT
by (Anonymous) posted (>30 days ago) 1:05pm Mon. Jun 15th 2015 PDT
by catid posted (>30 days ago) 4:12am Sat. Jun 9th 2012 PDT
Just a brief note before I crash.  Had an interesting idea for combating lag for browser games that I am going to be experimenting with soon:

Right now UDP is not widely available for JavaScript running in browsers, but TCP sockets are made available.  There is often a limit of 6 TCP connections per domain, but that is still a lot of connexions.  My thought is to open up the limit, and then tra... (more)
by catid posted (>30 days ago) 9:26am Thu. May 24th 2012 PDT
I've been trying to come up with more uses for Wirehair FEC to get the most benefit out of my work on that problem.  It seems like you could use it for Threshold Cryptography.

What's Threshold Cryptography?  Here's an example:

You have a bunch of people with parts of a key.  No one person knows the whole key.  If enough of them (N) people get together and share their key parts with each othe... (more)
last edit by catid edited (>30 days ago) 9:30am Thu. May 24th 2012 PDT
by catid posted (>30 days ago) 4:50pm Thu. Apr 19th 2012 PDT
I added a new send buffer allocator to LibCat to take advantage of allocation patterns during file transfer and other common packet exchanges.

For small sizes, the Windows allocator is fairly efficient.  However, for the size buffers used during file transfer (~1500 bytes), it starts getting slow.  To avoid allocating these larger buffers repeatedly, I added several slab allocators that re-use ... (more)
last edit by catid edited (>30 days ago) 5:03pm Thu. Apr 19th 2012 PDT
by catid posted (>30 days ago) 9:07pm Mon. Apr 16th 2012 PDT
Been crunching through a fairly long TODO list getting FEC-based file transfer finished in LibCat.  Here are some highlights:

Bugfix for WirehairFEC : Odd-length messages now encode and decode properly, so it's stable enough for use now in LibCat.  I've incorporated it into LibCat so it is easy to use from the parent project.

Crypto optimizations for LibCat :
+ HMAC-MD5 was replaced with VM... (more)
last edit by catid edited (>30 days ago) 9:12pm Mon. Apr 16th 2012 PDT
by catid posted (>30 days ago) 1:31am Thu. Apr 5th 2012 PDT
For LibCat's Tunnel Authenticated Encryption "Calico" protocol, I have replaced HMAC-MD5 with a custom VMAC-ChaCha implementation, which is a much more efficient Wegman-Carter-based MAC.  It now takes about 5-6 usec to do short message encryption/decryption in the TEST_ECC demo app, whereas it used to take 8-9 usec.

Here's it running on my laptop:

[code]
Key Pair Generation time = 139.738 u... (more)
last edit by catid edited (>30 days ago) 1:32am Thu. Apr 5th 2012 PDT
by catid posted (>30 days ago) 1:08am Tue. Apr 3rd 2012 PDT
Taking a break from homework to leave this stub here.  Fill this in with ARM + Android development notes.

The goal is to write piconquest using wirehair through NDK to transfer pictures over UDP from my VPS.  The UI will be presented with javascript in a WebKit view.  Phone ID will be collected from the phone and used to identify each user so login is optional.  Custom login with active google ... (more)
last edit by catid edited (>30 days ago) 1:09am Tue. Apr 3rd 2012 PDT
by catid posted (>30 days ago) 3:14pm Thu. Mar 15th 2012 PDT
Now that the matrix parameters are somewhat tuned, I can start benchmarking the performance of my FEC codec, Wirehair, available here: http://wirehairfec.com

At the heart of the codec is a sparse matrix solver that operates in linear time.  About SQRT(N) of the matrix rows are dense, but they are built using Shuffle-2 Codes (see previous posts) so the MultiplyDenseValues step takes 2... (more)
last edit by catid edited (>30 days ago) 9:34pm Sun. Mar 18th 2012 PDT
by catid posted (>30 days ago) 6:45pm Tue. Mar 13th 2012 PDT
Continued from http://catid.mechafetus.com/news/news.php?view=298.

It is clear from the invertibility data, in the previous post, that if the matrix size is DxD, that when D Mod 4 = 0, a Shuffle-2 Code matrix is not invertible.  And when D Mod 4 = 2, then it is better than the random matrix for small D.  However for D > 22, even the best sizes are not as often invertible as the rando... (more)
last edit by catid edited (>30 days ago) 9:59pm Tue. Mar 13th 2012 PDT
by catid posted (>30 days ago) 6:15pm Tue. Mar 13th 2012 PDT
For Wirehair, I wanted to find a way to generate a random-looking binary matrix that is very easy to multiply, with the added requirement that it is invertible about as often as a normal random binary matrix.  This is useful for making Wirehair run much faster while keeping the same level of error-correcting performance.

After stumbling through a couple of options, I designed a new matrix form ... (more)
last edit by catid edited (>30 days ago) 9:53pm Tue. Mar 13th 2012 PDT
by catid posted (>30 days ago) 2:56am Sun. Mar 4th 2012 PST
I've invented a constant-time algorithm that brings wirehair overhead down from 1.7 to 0.03.  So for very little additional processing time, the overhead is significantly improved: Instead of seeing sometimes 4+ additional blocks required during simulation, it's more common to see 0 and sometimes 1.  This is something I've never seen done before in any whitepapers I've read on the topic and gives ... (more)
last edit by catid edited (>30 days ago) 3:56am Sun. Mar 4th 2012 PST
by catid posted (>30 days ago) 6:02pm Fri. Mar 2nd 2012 PST
Just a little interesting code snippet:

    To compute (j - 1) % 255, given that j is already in the range 0..254:

      --j;
      j = (u8)(j + (j >> 8));

    To compute (j + 1) % 255, given that j is already in the range 0..254:

      ++j;
      j = (u8)(j + ((j + 1) >> 8));
last edit by catid edited (>30 days ago) 3:52am Sun. Mar 4th 2012 PST