[Python-bugs-list] [ python-Bugs-473009 ] binascii_b2a_base64() improper str limit
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 21 Oct 2001 18:26:43 -0700
Bugs item #473009, was opened at 2001-10-19 21:42
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=473009&group_id=5470
Category: Python Library
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Dave Cinege (dcinege)
Assigned to: Nobody/Anonymous (nobody)
Summary: binascii_b2a_base64() improper str limit
Initial Comment:
Modules/binascii.c
binascii_b2a_base64() contains the following
restrictive code:
if ( bin_len > BASE64_MAXBIN ) {
PyErr_SetString(Error, "Too much data
for base64 line");
return NULL;
}
This is an error. The base64 method of encoding data
has no length limitation. The MIME message RCF has
such a limitation of base64 encoded data. The
function should not assume it's only input must be
MIME compatible. The base64 python module itself
is designed for MIME I/O only, and properly limits
itself. The binascii function should be left raw.
binascii_a2b_base64() properly accepts input of any
size.
How I came across this bug: I use base64 to ascii
armor binary data in log entries in a distributed
network monitoring system. For the sake of ease of
parsing (human and machine) all log entries are
delimited by a single line. I commonly have unbroken
base64 encoded fields of 64KB in size or greater.
Unfortunatly I am unable to encode this data like
this:
result64 = binascii.b2a_base64(s)
I must do this:
result64 = re.sub('[ |\n]','',base64.encodestring(s))
Which is *much* slower. : <
I feel this is an outright bug and should be
corrected. If their is some argument for backward
compatibly an optional function argument should be
present to allow bypassing this limitation.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-21 18:26
Message:
Logged In: YES
user_id=6380
I'm with David. It's up to the higher level code (e.g. the
base64 module) to avoid writing lines longer than 76
characters; the underlying function in binascii doesn't have
to act as a policeman here. There may be other applications
of the same encoding where the 76-char limit does not apply.
----------------------------------------------------------------------
Comment By: Dave Cinege (dcinege)
Date: 2001-10-20 21:34
Message:
Logged In: YES
user_id=314434
>Can you cite any relevant standard that defines base64 to
>work in that way? Base64 is defined in RFC 2045 section
>6.8., which clearly says
>The encoded output stream must be represented in lines
>of no more than 76 characters each.
This is difficult to do because base64 itself has not
(yet) been seperatly
defined in it's own RFC. It should be and this issue has
been brought
up recently on the W3 lists.
IE:
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2001AprJun/0212.html
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2001AprJun/0210.html
The part of the RFC you have quoted is relevent to the use
of base64
encoding in the context of MIME, the purpose clearly being
to
ensure compatibly with email (SMTP, POP3, MUA, etc)
standards.
However this 76 character line length rule is irrelevent
when dealing
with arbitary binary data, not meant for MIME encapulated
transmission.
This is clearly seen the describtion of the actual base64
algorithms
itself:
The encoding process represents 24-bit groups of input
bits as output
strings of 4 encoded characters. Proceeding from left
to right, a
24-bit input group is formed by concatenating 3 8bit
input groups.
These 24 bits are then treated as 4 concatenated 6-bit
groups, each
of which is translated into a single digit in the
base64 alphabet.
When encoding a bit stream via the base64 encoding, the
bit stream
must be presumed to be ordered with the
most-significant-bit first.
That is, the first bit in the stream will be the
high-order bit in
the first 8bit byte, and the eighth bit will be the
low-order bit in
the first 8bit byte, and so on.
...
In base64
data, characters other than those in Table 1, line
breaks, and other
white space probably indicate a transmission error,
about which a
warning message or even a message rejection might be
appropriate
under some circumstances.
Additionally the use of 'unlimited length' base64 encoding
of binary data
has reached critical mass. For braod based example HTTP
based authorization
'encrypts' the username:password in base64. However no
length limit can
be used, else it would arbiltarily limit the amount of
data that could
be passed without interfering with the HTTP protocol
itself.
IE: (Lines should not appear wrapped)
'Logging in' to a webserver with
Username:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY
Z0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV
WXYZ0123456789
Password:
test
Will have the web broswer send the AUTH request header as
follows:
Authorization: Basic
YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODlhYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ekFCQ0RFRkdISUpLTE1OT1BRUlNUVVZXWFlaMDEyMzQ1Njc4OTp0ZXN
The latter field is an 'unlimited' length base64 encoding.
(Testing done with KDE Konqueror, other browsers may vary)
Due to it's simple application you will find many a
reference stating:
''The Base64 algorithm has become "the standard" for
encoding binary data.''
Clearly line length limitation are counter productive to
such use.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-10-20 06:30
Message:
Logged In: YES
user_id=21627
Can you cite any relevant standard that defines base64 to
work in that way? Base64 is defined in RFC 2045 section
6.8., which clearly says
The encoded output stream must be represented in lines
of no more than 76 characters each.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=473009&group_id=5470