[Python-Dev] Allowing u.encode() to return non-strings

"Martin v. Löwis" martin at v.loewis.de
Wed Jun 30 02:15:43 EDT 2004


Bill Janssen wrote:

> Unicode is really the only kind of *string* type that's supported,
> which is problematic, as it's not integrated with the file streams
> support.  For instance, how do I write a function that opens a file
> containing text in some multi-byte format (which, we'll assume, I know
> the name of -- perhaps from a content-type field), and reads the first
> three characters of the text?  Can't.  

That's really not true. To process such a file, you do

f = codecs.open(filename, "r", encoding="big-5")
data = f.read()
first_three = data[:3]


> Any file that is not explicitly opened as binary (with the 'b' flag
> (and, by the way, why isn't the 'b' flag the default for file opening?

Because it isn't in C.

> I'd go further.  I'd introduce the notation
> 
>     v = b"abc"

Yes, introduction of byte string literals, and changing standard
string literals, has been proposed before. There is the -U option
for the interpreter that changes all literals to Unicode literals.
Unfortunately, a lot of code breaks under this change, so such
breakage needs to be fixed before the change can happen.

Regards,
Martin




More information about the Python-Dev mailing list