[Web-SIG] WSGI adoption
Phillip J. Eby
pje at telecommunity.com
Tue Nov 30 17:00:49 CET 2004
At 08:59 AM 11/30/04 +0000, Alan Kennedy wrote:
>[Phillip J. Eby]
> > Yes, I meant decode-after-load, specifying the encoding as one of the
> > configuration variables.
>I'm a little confused. When you say "specifying the encoding as one of the
>configuration variables", do you mean a configuration variable that is
>specified inside or outside the ConfigParser .ini configuration file? Or
>Obviously, if you put the encoding declaration inside the config file
>itself, then you face the chicken and egg problem of needing to know what
>encoding the file is in before you can decode it to find out what its
>contents are, including what encoding it is in .......
That was why I said it would only work for encodings that don't require
escaping [, ], #, ;, =, and whitespace.
>XML solves this problem with the "<?xml" declaration: it is a fixed set of
>characters at the very beginning of the file from which you can guess the
>character encoding of the file. More here
Reading this section makes it seem to me that we can easily support:
"""UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any
other 7-bit, 8-bit, or mixed-width encoding which ensures that the
characters of ASCII have their normal positions, width, and values """
...as long as the configuration keys (and the string specifying the
encoding) are guaranteed to be ASCII. It seems to me that most of the
Asian codecs use unusual characters for escaping, such as $, \, and the
ASCII escape character, so it shouldn't be too hard to steer clear of these
in our keys.
I would also recommend that application authors inform their users if their
deployment files are in an encoding that is not bundled with Python.
>So if we're going to use ConfigParser *and* support encodings, then we
>need to either
>A: Make the user specify the encoding *outside* the configuration file
>B: Require some form of "magic string" at the top of the file so that we
>can guess the encoding. And write the guessing algorithm.
As long as the encoding is restricted to basically the same set of
encodings that work for Python source code, it should only be necessary to
have the encoding specified as a configuration variable in the file.
However, if it's considered desirable to also detect a BOM, we can
implement that by reading the first four bytes of the file, and then either
backing up if there's no BOM, or wrapping the file object with the
appropriate decoding wrapper before passing it to ConfigParser.
Of course, at that point we could just as well implement the exact same
detection algorithm as PEP 263, except that we could also support wide
encodings as long as there's a BOM.
More information about the Web-SIG