[Python-Dev] deleting setdefaultencoding iin site.py is evil

M.-A. Lemburg mal at egenix.com
Thu Aug 27 10:34:36 CEST 2009

Chris Withers wrote:
> M.-A. Lemburg wrote:
>> Let's look at this from another angle: sys.setdefaultencoding()
>> is only made available for use in site.py. 
> ...see this:
> http://mail.python.org/pipermail/python-dev/2009-August/091391.html
> I would like to use sitecustomize.py for all the very good reasons given
> in this thread:
> - I don't want to change the default encoding for every project that
> uses the python installation in question
> - I don't even want to change the default encoding for every python
> script run by the current user
> - I only want to change the default encoding for one particular project.
> Sadly, for the reasons I describe in the thread, site.py won't find a
> sitecustomize.py in this situation...
>> If you use it anywhere else, you're on your own. 
> No problem with that. To be specific, this is a Zope 2.12 instance
> driven by this buildout:
> [instance]
> recipe = zc.recipe.egg
> eggs = ${buildout:eggs}
> interpreter = py
> entry-points=
>    runzope=Zope2.Startup.run:run
>    zopectl=Zope2.Startup.zopectl:main
> scripts = runzope zopectl
> initialization =
>    import sys
>    reload(sys)
>    sys.setdefaultencoding('utf-8')
>    sys.argv[1:1] = ['-C','${buildout:directory}/etc/instance.conf']
> The call to sys.setdefaultencoding is *very* early in the scheme of
> things... The runzope script that gets run only has some sys.path
> manipulation before sys.setdefaultencoding gets called. What problems
> could there be by calling sys.setdefaultencoding there?
>> Such usage
>> is not supported and may very well break your interpreter 
> Can you give an example?

You can get strange effects caused by the fact that some
string objects will now compare equal while not necessarily
having the same hash value.

Unicode objects and strings have the same hash value provided
that they are both ASCII.

With the ASCII default encoding, a non-ASCII string cannot
be compared to a Unicode object, so the problem does not

>> or
>> cause data corruption (the default encoded versions of Unicode
>> objects are cached inside the objects).
> When called as early as in the above script, what objects would have
> encoded strings cached in them?

Difficult to say. This depends a lot on the environment
where you are running the script.

Note that the codecs are loaded at a very early stage in
the interpreter startup and a lot of them do use Unicode
strings. This wasn't the case in Python 1.6 when
the whole site.py approach to setting the default
encoding was designed, but added later on, in Python 2.1
IIRC, when noone really considered using a different
default encoding anymore.

Using UTF-8 as new default encoding will not cause much
trouble with this, since it is an ASCII superset.

However, changing it more than once will cause the earlier
Unicode objects to still use the old default encoding

Using a different non-ASCII compatible encoding, such
as UTF-16, will cause breakage for the same reason.

The default encoded string version of a Unicode object is
cached in the object and never recreated after it has
first been successfully encoded.

When only changing the default encoding once and using
UTF-8 as the new default encoding, you'll only run into
the hash value problem.

If that's not an issue for your
application, e.g. you don't mix Unicode and string key
objects in your dictionaries and don't rely on the special
relationship between hashes and comparisons elsewhere,
you should be fine.

>> Now, in your particular case, you're probably better off just
>> tweaking site.py directly in your custom Python interpreter
>> rather than relying on sitecustomize.py (see setencoding() in
>> site.py).
> Why?

To get the job done :-)

You could rewrite setencoding() to get the encoding information
from e.g. an os.environ variable or some config file.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Aug 27 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list