[Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer)

Jeffrey Yasskin jyasskin at gmail.com
Sat Sep 29 20:10:07 CEST 2007


On 9/29/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:
> >Until just before 3.0a1, they were unequal. We decided to raise
> >TypeError because we noticed many bugs in code that was doing things
> >like
> >
> >   data = f.read(4096)
> >   if data == "": break
>
> Thought experiment: what if read() always returned strings, and to
> read bytes, you had to use something like 'f.readinto(ob, 4096)',
> where 'ob' is a mutable bytes instance or memory view?
>
> In Python 2.x, there's only one read() method because (prior to
> unicode), there was only one type of reading to do.
>
> But as the above example makes clear, in 3.x you simply *can't* write
> code that works correctly with an arbitrary file that might be binary
> or text, at least not without typechecking the return value from
> read().  (In which case, you might as well inspect the file
> object.)  So, the above problem could be fixed by having .read()
> raise an error (or simply not exist) on a binary file object.

Perhaps write
  if len(data) == 0: break
since that's what you really mean.

Any other code that compares the result of read() to either a bytes or
a str really is taking a text or binary file object specifically and
not working on an arbitrary file.

> In this way, the problem is fixed at the point where it really
> occurs: i.e., at the point of not having decided whether the stream
> is bytes or text.
>
> This also seems to fit better (IMO) with the best practice of
> enforcing str/unicode/encoding distinctions at the point where data
> enters the program, rather than delaying the error to later.
>
>
> >I thought about  using warning too, but since nobody wants warnings,
> >that would be pretty much the same as raising TypeError except for the
> >most dedicated individuals (and if I were really dedicated I'd just
> >write my own eq() function anyway).
>
> The use case I'm concerned about is code that's not type-specific
> getting a TypeError by comparing arbitrary objects.  For example, if
> you write Python code to create a Python code object (e.g. the
> compiler package or my own BytecodeAssembler), you need to create a
> list of constants as you generate the code, and you need to be able
> to search the list for an equal constant.  Since strings and bytes
> can both be constants, a simple list.index() test could now raise a
> TypeError, as could "item in list".
>
> So raising an error to make bad code fail sooner, will also take down
> unsuspecting code that isn't really broken, and *force* the writing
> of special comparison code -- which won't be usable with things like
> list.remove and the "in" operator.
>
> In comparison, forcing code to be bytes vs. text aware at the point
> of I/O directs attention to the place where you can best decide what
> to do about it.  (After all, the comparison that raises the TypeError
> might occur deep in a library that's expecting to work with text.)
>
>
> >And the warning would do nothing
> >about the issue brought up by Jim Jewett, the unpredictable behavior
> >of a dict with both bytes and strings as keys.
>
> I've looked at all of Jim's messages for September, but I don't see
> this.  I do see where raising TypeError for comparisons causes a
> problem with dictionaries, but I don't see how an unequal comparison
> creates "unpredictable" behavior (as opposed to predictable failure to match).
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
>


-- 
Namasté,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." — "Skinny Legs and
All", by Tom Robbins


More information about the Python-3000 mailing list