<br><br><div><span class="gmail_quote">On 9/29/07, <b class="gmail_sendername">Jeffrey Yasskin</b> <<a href="mailto:jyasskin@gmail.com">jyasskin@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 9/29/07, Phillip J. Eby <<a href="mailto:pje@telecommunity.com">pje@telecommunity.com</a>> wrote:<br>> At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:<br>> >Until just before 3.0a1, they were unequal. We decided to raise
<br>> >TypeError because we noticed many bugs in code that was doing things<br>> >like<br>> ><br>> > data = f.read(4096)<br>> > if data == "": break<br>><br>> Thought experiment: what if read() always returned strings, and to
<br>> read bytes, you had to use something like 'f.readinto(ob, 4096)',<br>> where 'ob' is a mutable bytes instance or memory view?<br></blockquote><div><br>Using what encoding? read() should raise an exception on a file opened as binary in that case. And instead of readinto() how about readbytes() that just returns bytes and raises an exception on non-binary mode files. (readinto for buffers is a good idea and i think we should have it but that idea could be taken further to allow for even more scattered IO into a mutable buffer; thats another discussion and should be a PEP of its own)
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> But as the above example makes clear, in 3.x you simply *can't* write
<br>> code that works correctly with an arbitrary file that might be binary<br>> or text, at least not without typechecking the return value from<br>> read(). (In which case, you might as well inspect the file<br>
> object.) So, the above problem could be fixed by having .read()<br>> raise an error (or simply not exist) on a binary file object.<br><br>Perhaps write<br> if len(data) == 0: break<br>since that's what you really mean.
</blockquote><div><br>data = f.read()<br>if not data: break<br></div><br>Is the preferred way to write that. Regardless, I agree. read() returning a different type based on the file open mode is going to cause problems. I do -NOT- like the idea of bytes vs string comparison raising an exception. read() and readbytes() methods that raise exceptions when used on the wrong mode of file would "solve" the problem in a more obvious way.
<br><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Any other code that compares the result of read() to either a bytes or<br>a str really is taking a text or binary file object specifically and
<br>not working on an arbitrary file.<br><br>> In this way, the problem is fixed at the point where it really<br>> occurs: i.e., at the point of not having decided whether the stream<br>> is bytes or text.<br>>
<br>> This also seems to fit better (IMO) with the best practice of<br>> enforcing str/unicode/encoding distinctions at the point where data<br>> enters the program, rather than delaying the error to later.<br>>
<br>><br>> >I thought about using warning too, but since nobody wants warnings,<br>> >that would be pretty much the same as raising TypeError except for the<br>> >most dedicated individuals (and if I were really dedicated I'd just
<br>> >write my own eq() function anyway).<br>><br>> The use case I'm concerned about is code that's not type-specific<br>> getting a TypeError by comparing arbitrary objects. For example, if<br>> you write Python code to create a Python code object (
e.g. the<br>> compiler package or my own BytecodeAssembler), you need to create a<br>> list of constants as you generate the code, and you need to be able<br>> to search the list for an equal constant. Since strings and bytes
<br>> can both be constants, a simple list.index() test could now raise a<br>> TypeError, as could "item in list".<br>><br>> So raising an error to make bad code fail sooner, will also take down<br>> unsuspecting code that isn't really broken, and *force* the writing
<br>> of special comparison code -- which won't be usable with things like<br>> list.remove and the "in" operator.<br>><br>> In comparison, forcing code to be bytes vs. text aware at the point<br>
> of I/O directs attention to the place where you can best decide what<br>> to do about it. (After all, the comparison that raises the TypeError<br>> might occur deep in a library that's expecting to work with text.)
<br>><br>><br>> >And the warning would do nothing<br>> >about the issue brought up by Jim Jewett, the unpredictable behavior<br>> >of a dict with both bytes and strings as keys.<br>><br>> I've looked at all of Jim's messages for September, but I don't see
<br>> this. I do see where raising TypeError for comparisons causes a<br>> problem with dictionaries, but I don't see how an unequal comparison<br>> creates "unpredictable" behavior (as opposed to predictable failure to match).
<br>><br>> _______________________________________________<br>> Python-3000 mailing list<br>> <a href="mailto:Python-3000@python.org">Python-3000@python.org</a><br>> <a href="http://mail.python.org/mailman/listinfo/python-3000">
http://mail.python.org/mailman/listinfo/python-3000</a><br>> Unsubscribe: <a href="http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com">http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
</a><br>><br><br><br>--<br>Namasté,<br>Jeffrey Yasskin<br><a href="http://jeffrey.yasskin.info/">http://jeffrey.yasskin.info/</a><br><br>"Religion is an improper response to the Divine." — "Skinny Legs and
<br>All", by Tom Robbins<br>_______________________________________________<br>Python-3000 mailing list<br><a href="mailto:Python-3000@python.org">Python-3000@python.org</a><br><a href="http://mail.python.org/mailman/listinfo/python-3000">
http://mail.python.org/mailman/listinfo/python-3000</a><br>Unsubscribe: <a href="http://mail.python.org/mailman/options/python-3000/greg%40krypto.org">http://mail.python.org/mailman/options/python-3000/greg%40krypto.org</a>
<br></blockquote></div><br>