Re: [Python-Dev] Base-96
Guido van Rossum wrote:
This sounds more like something to bring up in python-ideas@python.org. Also, rather than being vague about the motivation ("would be very interesting", you ought to think of a realistic use case. For example, are there existing encodings of binary data using base-96? I'm not aware of any.
On Fri, Aug 1, 2008 at 4:06 PM, Kless
wrote: I think that would be very interesting thay Python would have a module for working on base 96 too. [1]
It could be converted to base 96 the digests from hashlib module, and random bytes used on crypto (to create the salt, the IV, or a key).
As you can see here [2], the printable ASCII characters are 94 (decimal code range of 33-126). So only left to add another 2 characters more; the space (code 32), and one not-printable char (which doesn't create any problem) by last.
[1] http://svn.python.org/view/python/trunk/Modules/binascii.c [2] http://en.wikipedia.org/wiki/ISO/IEC_8859-1
96 is approximately 2^6.585 For some reason, integral powers of two seem so much more, well, POWERFUL, if you know what I mean. Frankly I think you are being either optimistic or charitable in suggesting that such a use case might exist. There's a reason that DEC called their equivalent of base64 "6-bit encoding". But then I wanted to keep integer division as it was, so I am clearly a techno-luddite. If the world wants fractional bits I'm sure it's only a matter of time before some genius decides to design a 67.9-bit computer. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
It's true, I didn't pay attention to that.
So the next encoding possible would of base-128 (7-bits encoding),
althought I don't know if were possible since that there would than
use non-printable characters and could change the text (by use of
chars. as Backspace or Delete).
On 2 ago, 03:21, Steve Holden
96 is approximately 2^6.585
For some reason, integral powers of two seem so much more, well, POWERFUL, if you know what I mean. Frankly I think you are being either optimistic or charitable in suggesting that such a use case might exist.
There's a reason that DEC called their equivalent of base64 "6-bit encoding".
But then I wanted to keep integer division as it was, so I am clearly a techno-luddite. If the world wants fractional bits I'm sure it's only a matter of time before some genius decides to design a 67.9-bit computer.
The standard high-bit-density encoding past base-64 is base-85
(http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes
as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works,
is an RFC somewhere, ... and maybe should find it's way into the
Python standard library's codec package at some point.
- Josiah
On Sat, Aug 2, 2008 at 12:57 AM, Kless
It's true, I didn't pay attention to that.
So the next encoding possible would of base-128 (7-bits encoding), althought I don't know if were possible since that there would than use non-printable characters and could change the text (by use of chars. as Backspace or Delete).
On 2 ago, 03:21, Steve Holden
wrote: 96 is approximately 2^6.585
For some reason, integral powers of two seem so much more, well, POWERFUL, if you know what I mean. Frankly I think you are being either optimistic or charitable in suggesting that such a use case might exist.
There's a reason that DEC called their equivalent of base64 "6-bit encoding".
But then I wanted to keep integer division as it was, so I am clearly a techno-luddite. If the world wants fractional bits I'm sure it's only a matter of time before some genius decides to design a 67.9-bit computer.
Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com
Josiah Carlson wrote:
The standard high-bit-density encoding past base-64 is base-85 (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, is an RFC somewhere,
RFC 1924, published on April 1, 1996, to shorten the representation of IPv6 addresses, so that you can write ssh '4)+k&C#VzJ4br>0wv%Yp' instead of having to write ssh 1080:0:0:0:8:800:200C:417A Most notably, section 7 (implementation issues) points out Many current processors do not find 128 bit integer arithmetic, as required for this technique, a trivial operation. This is not considered a serious drawback in the representation, but a flaw of the processor designs. For arbitrary-sized data, you'd have to give up 128-bit arithmetic, of course, and represent the input data to encode as a long integer. Regards, Martin P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis"
Josiah Carlson wrote:
The standard high-bit-density encoding past base-64 is base-85 (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, is an RFC somewhere,
RFC 1924, published on April 1, 1996, to shorten the representation of IPv6 addresses, so that you can write
ssh '4)+k&C#VzJ4br>0wv%Yp'
instead of having to write
ssh 1080:0:0:0:8:800:200C:417A
Most notably, section 7 (implementation issues) points out
Many current processors do not find 128 bit integer arithmetic, as required for this technique, a trivial operation. This is not considered a serious drawback in the representation, but a flaw of the processor designs.
For arbitrary-sized data, you'd have to give up 128-bit arithmetic, of course, and represent the input data to encode as a long integer.
Regards, Martin
P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
Original intent (encoding IPV6 addresses) != current usefulness (a more efficient ascii encoding of binary data). Generally, I'm of the opinion that base64 (as an ascii encoding of binary data) is sufficient for any needs I have, but there are cases where having a more efficient representation would be useful. I would also not suggest addition in the 2.6/3.0 timeframe, at best it would be 2.7/3.1, and only if someone submits a patch with testcases (note that the wiki page provides C source for one-shot encoding and decoding that doesn't require 128-bit arithmetic). Sounds to me like a project for the OP. - Josiah
On Sat, Aug 2, 2008 at 10:37 AM, Josiah Carlson
On Sat, Aug 2, 2008 at 10:09 AM, "Martin v. Löwis"
wrote: Josiah Carlson wrote:
The standard high-bit-density encoding past base-64 is base-85 (http://en.wikipedia.org/wiki/Ascii85), which encodes 4 binary bytes as 5 ascii bytes, versus 3 binary bytes as 4 ascii bytes. It works, is an RFC somewhere,
RFC 1924, published on April 1, 1996, to shorten the representation of IPv6 addresses, so that you can write
ssh '4)+k&C#VzJ4br>0wv%Yp'
instead of having to write
ssh 1080:0:0:0:8:800:200C:417A
Most notably, section 7 (implementation issues) points out
Many current processors do not find 128 bit integer arithmetic, as required for this technique, a trivial operation. This is not considered a serious drawback in the representation, but a flaw of the processor designs.
For arbitrary-sized data, you'd have to give up 128-bit arithmetic, of course, and represent the input data to encode as a long integer.
Regards, Martin
P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
Same here.
Original intent (encoding IPV6 addresses) != current usefulness (a more efficient ascii encoding of binary data).
That was an April Fool's RFC. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum
That was an April Fool's RFC.
See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Sat, Aug 02, 2008 at 02:15:29PM -0700, Guido van Rossum wrote:
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum
wrote: That was an April Fool's RFC.
See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-)
There were a lot of Python jokes for the Apr 1st. What a pity we have ceased to make such jokes. http://mail.python.org/pipermail/python-list/2001-April/076593.html http://mail.python.org/pipermail/python-list/2003-April/197232.html http://mail.python.org/pipermail/python-list/2004-April/256320.html (Despite being a joke it really works!) http://mail.python.org/pipermail/python-list/2005-April/315453.html http://mail.python.org/pipermail/python-list/2005-April/315457.html http://mail.python.org/pipermail/python-list/2006-April/375866.html Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann schrieb:
On Sat, Aug 02, 2008 at 02:15:29PM -0700, Guido van Rossum wrote:
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum
wrote: That was an April Fool's RFC.
See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-)
There were a lot of Python jokes for the Apr 1st. What a pity we have ceased to make such jokes.
You forget the April 1st PEPs: http://www.python.org/dev/peps/pep-0313/ http://www.python.org/dev/peps/pep-3117/ Georg
Georg Brandl wrote:
Oleg Broytmann schrieb:
On Sat, Aug 02, 2008 at 02:15:29PM -0700, Guido van Rossum wrote:
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum
wrote: That was an April Fool's RFC.
See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-)
There were a lot of Python jokes for the Apr 1st. What a pity we have ceased to make such jokes.
You forget the April 1st PEPs:
http://www.python.org/dev/peps/pep-0313/ http://www.python.org/dev/peps/pep-3117/
Not to mention the April 1 Licensing blog entry: http://pyfound.blogspot.com/2006/04/python-25-licensing-change.html regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
Hm, I'm sure there were many more, perhaps in different places. I
recall participating with Larry Wall in the announcement of Parrot, a
Python/Perl merger -- hence the name of the Perl 6 VM. And others. I'd
love to see people post more references here! (Georg already posted
the April Fool's PEPs.)
On Sun, Aug 3, 2008 at 2:19 AM, Oleg Broytmann
On Sat, Aug 02, 2008 at 02:15:29PM -0700, Guido van Rossum wrote:
On Sat, Aug 2, 2008 at 11:57 AM, Guido van Rossum
wrote: That was an April Fool's RFC.
See also http://en.wikipedia.org/wiki/April_Fools%27_Day_RFC -- it has a ton of these. Great fun reading through some of them on an idle Saturday afternoon. :-)
There were a lot of Python jokes for the Apr 1st. What a pity we have ceased to make such jokes.
http://mail.python.org/pipermail/python-list/2001-April/076593.html
http://mail.python.org/pipermail/python-list/2003-April/197232.html
http://mail.python.org/pipermail/python-list/2004-April/256320.html (Despite being a joke it really works!)
http://mail.python.org/pipermail/python-list/2005-April/315453.html
http://mail.python.org/pipermail/python-list/2005-April/315457.html
http://mail.python.org/pipermail/python-list/2006-April/375866.html
Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> Hm, I'm sure there were many more, perhaps in different places. I figured it's a slow Sunday so I'd collect them on the wiki: http://wiki.python.org/moin/AprilFools I found the Python/Perl joint development press release, but only on the Wayback machine. It appears that when redesigning the python.org website that page was deemed inappropriate. Skip
On Sun, Aug 3, 2008 at 10:37 AM,
Guido> Hm, I'm sure there were many more, perhaps in different places.
I figured it's a slow Sunday so I'd collect them on the wiki:
Great!
I found the Python/Perl joint development press release, but only on the Wayback machine. It appears that when redesigning the python.org website that page was deemed inappropriate.
Alas, way too much stuff was dropped by the redesign. ;-( I should track down the Tim Peters award (or whatever it was called) and link that. We should probably cross-link with the Python humor page on python.org (unless that's also been axed). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Sun, Aug 03, 2008, Guido van Rossum wrote:
On Sun, Aug 3, 2008 at 10:37 AM,
wrote: I found the Python/Perl joint development press release, but only on the Wayback machine. It appears that when redesigning the python.org website that page was deemed inappropriate.
Alas, way too much stuff was dropped by the redesign. ;-(
I should track down the Tim Peters award (or whatever it was called) and link that.
We should probably cross-link with the Python humor page on python.org (unless that's also been axed).
IIRC, the humor page was axed due to lack of updates -- I recommend finding the material using Wayback and just adding it to the wiki. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Adopt A Process -- stop killing all your children!
>> We should probably cross-link with the Python humor page on >> python.org (unless that's also been axed). >> aahz> IIRC, the humor page was axed due to lack of updates -- I aahz> recommend finding the material using Wayback and just adding it to aahz> the wiki. It's still there: http://www.python.org/doc/humor/ it's just been absorbed into the documentation. ;-) I can't find Tim Peters' award page and don't know if you can search the Wayback Machine. (I suspect you have to know a precise URL.) Skip
On Mon, Aug 4, 2008 at 11:12 AM,
We should probably cross-link with the Python humor page on python.org (unless that's also been axed).
aahz> IIRC, the humor page was axed due to lack of updates -- I aahz> recommend finding the material using Wayback and just adding it to aahz> the wiki.
It's still there:
http://www.python.org/doc/humor/
it's just been absorbed into the documentation. ;-)
I added a link to the wiki page.
I can't find Tim Peters' award page and don't know if you can search the Wayback Machine. (I suspect you have to know a precise URL.)
Added this too; searching for <pythonic award tim peters> gave me an email that had the correct URLs. It's still one of my favorites. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Hm, I'm sure there were many more, perhaps in different places.
Although it wasn't April 1, here's one I posted in response to python-dev discussions. http://mail.python.org/pipermail/python-list/2001-May/084169.html There was also another one concerning how to reduce the number of ways of copying a list, but Google doesn't seem to want to find it. -- Greg
Martin v. Löwis
P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
According to Wikipedia, "its main modern use is in Adobe's PostScript and Portable Document Format file formats". It is also used by git for diffs of binary files, and those diffs are supposedly understood by other VCSes like Mercurial... indeed, Mercurial has a Python extension for base85 encoding (but licensed under the GPL): http://selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/base85.c (I suppose Bazaar has something similar) Endly, since this encoding allows to pack more bytes into the same number of ASCII characters than its traditional alternatives, it is likely to gain traction in applications which need to create a pure ASCII representation of binary data. Regards Antoine.
On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou
It is also used by git for diffs of binary files, and those diffs are supposedly understood by other VCSes like Mercurial...
I'm very interested in this (for Rietveld). Where can I learn more about how git handles diffs of binary files? Does it actually show adds and deletes of sections of the file? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Le samedi 02 août 2008 à 14:07 -0700, Guido van Rossum a écrit :
On Sat, Aug 2, 2008 at 12:58 PM, Antoine Pitrou
wrote: It is also used by git for diffs of binary files, and those diffs are supposedly understood by other VCSes like Mercurial...
I'm very interested in this (for Rietveld). Where can I learn more about how git handles diffs of binary files? Does it actually show adds and deletes of sections of the file?
Well, I'm not sure. I just tried with Mercurial, first committing a
binary file with the following structure:
part1 part3
and then changing it to the following structure:
part1 part2 part3 part2
(part{1,2,3} being some binary chunks of 400 bytes each
from /dev/urandom)
The "git-style" diff given by Mercurial is then:
diff --git a/binfile b/binfile
index acfa6ffc5287c6e9cd400af7b8ab09d072a28b02..5b9a69212ae8f39bf41fbf2194db2b730dcb0ae9
GIT binary patch
literal 1600
zc%1Fi`#%#1003~2SQat~^D2)~Q>OB2Y-`ijkz&}wZbC#-o=0t7jaVepQ|D0_8*3gJ
zMSY0rj%^W39zC2fTEo0@yVCs|_rrbvhcAJotX5m|hB+RX(Aa5xSa4Y^GkS%y10Hva
z3^q{I&mAF9vs@GpEP1!lCxtq*vdKD&+M&87%65P%egC&>7+Bgzx0-lUziyCW?%ELc
z;eHsAnXOY+YY~y3f6~CD+?JujZGa*JV=V-x-twhC^~z}e+->VcW=&UqfNg97Mxf3d
zP2!VM#<4|n+(B|5rOUMBfQ=w}vEdoi_TK&saEG1S{mn@ndj^rKLkR~K7EJZGGO3U9
z<g>qYkn__U%akFI(fHNxLoP?qI1sW!5@Div?MoBQU<}W8T2DXa{`gkjO1RO?{-Yz3
z-yd-sVx%pSu0elCXI*-RPErV~&bEbl*yk6ff?mV From that I don't know what can be done with the diff. Looking at the
Mercurial source code suggests that you can encode deltas in the patch,
but that Mercurial doesn't support it (see "# TODO: deltas"):
http://www.selenic.com/hg/index.cgi/file/cbdfd08eabc9/mercurial/patch.py#l11... A basic explanation of binary diffs here:
http://www.selenic.com/pipermail/mercurial/2008-July/020184.html
The explanation mentions base-64 but it was corrected in a later message
here:
http://www.selenic.com/pipermail/mercurial/2008-July/020192.html
Regards
Antoine.
PS: here are the commands I've typed:
$ hg init bindiff
$ cd bindiff/
$ dd if=/dev/urandom of=part1 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part2 bs=1 count=400
[snip output]
$ dd if=/dev/urandom of=part3 bs=1 count=400
[snip output]
$ cat part1 part3 > binfile
$ hg add binfile
$ hg ci -m "added binfile"
$ cat part1 part2 part3 > binfile
$ hg di
diff -r 19cfb10c4a01 binfile
Binary file binfile has changed
$ hg di --git
[produces the patch above]
On Aug 2, 2008, at 13:58 PM, Antoine Pitrou wrote:
Martin v. Löwis
writes: P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
According to Wikipedia, "its main modern use is in Adobe's PostScript and Portable Document Format file formats". ... git ... mercurial ... bzr
It's sort of too bad about the April Fool's RFC, because now people tend to think that an encoding with a non-power-of-2 base is just a joke. I had to overcome that when working with my programming partner, but he eventually decided that base-62 was indeed a useful encoding for our purposes. :-) I've written a few ascii encoders over the years, mostly in Python, plus an optimized C version of base-32 (with a real live Duff's Device): base62.py: http://allmydata.org/source/z-base-62/trunk-hashedformat/z-base-62/ base62/base62.py base36.py: http://allmydata.org/source/z-base-36/trunk-hashedformat/z-base-36/ base36/base36.py base32.py: http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/ base32/base32.py base32.c: http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/base32.c Regards, Zooko
On Mon, Aug 11, 2008 at 8:43 AM, zooko
On Aug 2, 2008, at 13:58 PM, Antoine Pitrou wrote:
Martin v. Löwis
writes: P.S. Just in case it isn't clear: I would oppose any specific proposal to add this Ascii85 algorithm to the standard library. It would sound like we don't have any real problems to solve.
According to Wikipedia, "its main modern use is in Adobe's PostScript and Portable Document Format file formats".
... git ... mercurial ... bzr
It's sort of too bad about the April Fool's RFC, because now people tend to think that an encoding with a non-power-of-2 base is just a joke.
The best April Fool's jokes (imo) are the ones that are obviously silly right off, but that 1) work, 2) no sane person would ever use, and 3) offer up something useful hidden in the joke. The April Fool's RFC fits the bill perfectly, because out of it all comes base85, which is an actual improvement over base64 (25% expansion of data vs. 33%). That some people missed that part of the joke isn't terribly surprising (I have in other situations). - Josiah
I had to overcome that when working with my programming partner, but he eventually decided that base-62 was indeed a useful encoding for our purposes. :-)
I've written a few ascii encoders over the years, mostly in Python, plus an optimized C version of base-32 (with a real live Duff's Device):
base62.py:
http://allmydata.org/source/z-base-62/trunk-hashedformat/z-base-62/base62/ba...
base36.py:
http://allmydata.org/source/z-base-36/trunk-hashedformat/z-base-36/base36/ba...
base32.py:
http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/base32/base3...
base32.c:
http://allmydata.org/source/z-base-32/trunk-hashedformat/base32/base32.c
Regards,
Zooko _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com
Kless wrote:
So the next encoding possible would of base-128 (7-bits encoding)
A while ago I wanted to pack as much information as possible into a string of printable characters, and I came up with a base-95 encoding that packs 9 bytes into 11 characters. The application involved representing data using Python string literals, so it was important that only printable characters were used. I settled on the 9/11 combination as a reasonable compromise between packing efficiency and not having the block size too long. If anyone's interested, I could dig out the encoding and decoding routines I wrote. -- Greg
participants (12)
-
"Martin v. Löwis"
-
Aahz
-
Antoine Pitrou
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Josiah Carlson
-
Kless
-
Oleg Broytmann
-
skip@pobox.com
-
Steve Holden
-
zooko