[I18n-sig] Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Wed, 07 Feb 2001 18:24:37 -0800


"Martin v. Loewis" wrote:
> 
> ...
> 
> I'd admit that codecs.open seems wrong also - it is not a codec that
> is being opened. New builtins are worse, IMO (what is an f, a str, or
> a bin?). Adding flags to open looks acceptable, though.

open already has two optional arguments. I want to add a new mandatory
argument. I don't see a way to do it cleanly. Actually, I thought of
something which I'll explain in more detail further down.

"fopen" stands for "file open". Now that you mention it, "fileopen" is
probably the best name -- more descriptive even than today's "open". It
would have a mandatory encoding attribute which can be None only if you
use the "b" flag to indicate that you want binary data.
----
   fileopen (filename, encoding, [mode[, bufsize]]))

Return a new file object (described earlier under Built-in Types). The
first and third argument are the same as for stdio's fopen(): filename
is the file name to be opened, mode indicates how the file is to be
opened: 'r' for reading, 'w' for writing (truncating an existing file),
and 'a' opens it for appending (which on some Unix systems means that
all writes append to the end of the file, regardless of the current seek
position).

Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+'
truncates the file). If the file cannot be opened, IOError is raised. If
mode is omitted, it defaults to 'r'.

The encoding attribute should be a string indicating the encoding of the
file. Common values are "ASCII" (for English-only text), "ISO Latin 1"
for most Western scripts. "UTF-8" and "UTF-16" are often used for mixed
language documents. "Shift-JIS" and "Big5" are typically used to read
Eastern scripts.

The special value "RAW" means that the file object should return bytes
as-is with no translation into a "byte string".

The optional bufsize argument specifies the file's desired buffer size:
0 means unbuffered, 1 means line buffered, any other positive value
means use a buffer of (approximately) that size. A negative bufsize
means to use the system default, which is usually line buffered for for
tty devices and fully buffered for other files. If omitted, the system
default is used.
---

"open" could actually be extended to be like "fileopen" if we look at
the second parameter and interpret it according to its contents. If it
matches the regexp [rwa]+?b? then we treat it as the "deprecated form."
Otherwise we treat it as an encoding. I don't think we have to worry
about an encoding whose name matches that pattern any time soon!

So in documentation encoding would NOT be optional but in practice there
would be a period in which it would be optional so that people could
migrate their code.

 Paul Prescod