Monday, June 12, 2006

Compression, Statistics and such

In the process of doing the usual stuff that I do when I do not struggle with my studies, I ran into the problem of having a number of streams with a very even distribution of byte values. I know that these bytes are executable code somehow encoded. I have a lot of reason to suspect that they are compressed, not encrypted, but I have not been able to make sense of it yet.

This brought me to the natural question: Do common encryption algorithms have statistical fingerprints that would allow them to be distinguished from one another, more-or-less irrespective of the underlying data ? It is clear that this gets harder as the amount of redundancy decreases.

It was surprising (at least for me) that nobody else has worked on this yet (publically).

Also, it made me regret that due to some time constraints involving some more algebraic courses I was unable to attend the Statistics I and II lectures given at my University by Prof. Dette. Had I attended, I would know better how to make sense of the capabilities that software like R could give me.

Another example of the fundamental law of mathematics: For every n denoting the number of days you have studied mathematics there exists a practical problem that make you wish you had studied 2n days already.

No comments: