[Python-Dev] Allowing u.encode() to return non-strings
"Martin v. Löwis"
martin at v.loewis.de
Wed Jun 30 02:15:43 EDT 2004
Bill Janssen wrote:
> Unicode is really the only kind of *string* type that's supported,
> which is problematic, as it's not integrated with the file streams
> support. For instance, how do I write a function that opens a file
> containing text in some multi-byte format (which, we'll assume, I know
> the name of -- perhaps from a content-type field), and reads the first
> three characters of the text? Can't.
That's really not true. To process such a file, you do
f = codecs.open(filename, "r", encoding="big-5")
data = f.read()
first_three = data[:3]
> Any file that is not explicitly opened as binary (with the 'b' flag
> (and, by the way, why isn't the 'b' flag the default for file opening?
Because it isn't in C.
> I'd go further. I'd introduce the notation
> v = b"abc"
Yes, introduction of byte string literals, and changing standard
string literals, has been proposed before. There is the -U option
for the interpreter that changes all literals to Unicode literals.
Unfortunately, a lot of code breaks under this change, so such
breakage needs to be fixed before the change can happen.
More information about the Python-Dev