[Python-3000] basestring removal, __file__ and co_filename

Christian Heimes lists at cheimes.de
Thu Oct 11 17:07:59 CEST 2007


Hello Python!

I've written a patch that removes basestring from py3k:
http://bugs.python.org/issue1258 During the testing of the patch I hit a
problem with __file__ and codeobject.co_filename. Both __file__ and
co_filename are byte strings and not unicode which is causing some
trouble. Guido asked me to provide another patch which decodes the
string using the default filesystem encoding.

Most of the patch was straight forward and easy but I hit one spot
that's causing some trouble. It's a chicken and egg issue.
codeobject.co_filename is a PyString instance. I like to perform

filename = PyString_AsDecodedObject(filename,
Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8",
NULL);

in order to decode the string with either the fs encoding or UTF-8 but
it's not possible. It's way too early in the bootstrapping process of
Python and the codecs aren't registered yet. In fact large parts of the
codecs package is implemented in Python ...

Ideas?

I could check if Py_FilesystemDefaultEncoding is one of the encodings
that are implemented in Python (UTF-8, 16, 32, latin1, mbcs) but what if
the fs default encoding is some obscure encoding?

Christian


More information about the Python-3000 mailing list