python implementation of a new integer encoding algorithm.
Dave Angel
davea at davea.name
Tue Feb 17 09:12:56 EST 2015
On 02/17/2015 06:22 AM, janhein.vanderburg at gmail.com wrote:
> In http://optarbvalintenc.blogspot.nl/ I propose a new way to encode arbitrarily valued integers and the python code that can be used as a reference for practical implementations of codecs.
>
> The encoding algorithm itself is optimized for transmission and storage requirements without any processor load consideration whatsoever.
>
> The next step is the development of the python code that minimizes processor requirements without compromising the algorithm.
>
> Is this the right forum to ask for help with this step or to obtain references to more suitable platforms?
>
This is a fine forum for such a discussion. I for one would love to
participate. However, note that it isn't necessary true that "the
smaller the better" is a good algorithm. In context, there are
frequently a number of tradeoffs, even ignoring processor time (as you
imply).
Many years ago I worked on some code that manipulated the intermediate
files for a compiler. I had no say in the format, I just had to deal
with it.
They had a field type called a "compressed integer." It could vary
between one byte and I think about six. And the theory was that it took
less space than the equivalent format of fixed size integers. The catch
from my point was that these integers could appear in the middle of a
struct, and thus access to the later fields of the struct required a
dynamic calculation. This put a huge onus on my code to read or write
the data serially. I ended up solving it by writing code that generated
40 thousand lines of C++ header and body code, so that the rest of the
code didn't care.
Was it worth it? To reduce the size of some files that only lived a few
seconds on disk? I seriously doubt it. But I learned a lot in the process.
On another project, the goal was to be able to recover data from
archives in spite of physical damage to some files. So I had to add
selective redundancy for that. In the process, I also compress the
data, but confine the compression algorithm to relatively small pieces
of data, and label those pieces independently, so that any single
decompression error affects only one unit of data.
So going back to your problem, and assuming that the other issues are
moot, what's your proposal? Are you compressing relative to a straight
binary form of storage? Are you assuming anything about the relative
likelihood of various values of integers? Do you provide anything to
allow for the possibility that your prediction for probability
distribution isn't met?
--
DaveA
More information about the Python-list
mailing list