Peter J. Philipp The PCP compression algorithm December 2010 File for the future. A compression algorithm for super-fast computers ------------------------------------------------ This paper is written in 2010. The idea of this has been around since 2005. Personal Computers are not fast enough for what I'm about to write. I have theorized around a compression algorithm that can be stacked, meaning you can compress a compressed block of data. This formula can then be used indefinitely to grow a 19 byte block to endlessly long. Pretend you have a 38 byte packet all consisting of X's (88 decimal) then the packet will look like this: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 38 X's I can now count up a few things: 38 * 88 = 3344 If I XOR all the X's together I get a final XOR result of 0. If I checksum (with md5) the 38 X's then I get: dione$ echo -n "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" | md5 625d4570022c770c2565ca7d1bd5cb3f so then I stick the results together into a packet: the 3344 goes into a 16 bit (2 byte) field the 0 goes into an 8 bit (1 bytes) field the md5 checksum goes into a 16 byte field -- +--+-+----------------+ | | | | +--+-+----------------+ a PCP packet (Peter's Compression Program) -- This gives me 19 bytes, exactly half of the 38 bytes used to hold the X's. This is the compression. The decompression then is done as follows: 3344 is laid out as all sums[1] that result in 3344. So 1,3343;2,3342... and so on. To the advantage of the algorithm the sums can only be 38 spots maximum for the 38 bytes so that eliminates a great chunk of what is valid. Also anything over 255 cannot fit into a byte so this isn't valid either, we're aiming for a small window in a large set of possible numbers. Every round of laying out the new sums is XOR'ed and if the result is not 0 then the next round continues, if the result is 0 though a number of things happen. The spaces are reverse-bubble sorted and every round is md5 checksummed. If any combination of the 38 spots (can contain zeros) match the md5 checksum then it is believed that this is the original payload of 38 X's. No collisions are expected. [1] at current speeds we can only lay out the sums of the number 20 for all sums. Here's a demonstration: $ time ./numtest 20 > /dev/null 0m2.53s real 0m2.52s user 0m0.01s system $ ./numtest 4 4, 3, 1, 3, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 3, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, As you can see numtest program displays the sums of 4 in all possible ways and sometimes multiple. Here is the numtest.c program: #include #include #include void collapse(char *pf, int num, int); void print_nums(char *pf, int num); int main(int argc, char **argv) { int num, orignum; int i; u_char pf[255]; if (argc != 2) { fprintf(stderr, "must provide a number\n"); exit(1); } orignum = num = atoi(argv[1]); if (num > 255) { fprintf(stderr, "number must be below 256\n"); exit(1); } memset(&pf, 0, sizeof(pf)); pf[0] = num; collapse(pf, num, 0); } void collapse(char *pf, int num, int spot) { char buf[512]; memcpy(&buf, pf, sizeof(buf)); while (buf[spot] > 1) { print_nums(buf, 0); buf[spot]--; buf[spot + 1]++; collapse(buf, buf[spot + 1], spot + 1); } print_nums(buf, 0); } void print_nums(char *pf, int num) { int i; char buf[512]; for (i = 0; pf[i]; i++) printf("%d, ", pf[i]); printf("\n"); } -- Uses: The use of this compression algorithm is interesting. It can be used everywhere where large processing facilities exist. (1) It may be used for trans-atlantic compression of data where large installation or grid computers on either side of the atlantic can decompress the eithers compression. (2) As compression is cheap (very cheap compared to decompression) it can also be used for space travel for satellites that can't send back too much data, a grid of computers on earth would then speed up extracting the entire data. Perhaps it could be used when the first inter-stellar missions are conducted to another star system. By then computers will surely be formidable.