Before I start complaining, I want to mention what a huge help it has
been to be able to directly compare the AST exposed by ast.py in
making Jython a better Python. Thanks for that!
Now on to the complaints: Though I recently added support for this in
Jython, I don't like that nodes can be defined without required
attributes, for example:
node = ast.Assign()
Is valid, even though it requires "node.targets" and "node.value" (I'm
less concerned about the required lineno and col_offset, as I can
understand holding off on these so that you can just use
fix_missing_locations to fill these in for you).
My other (bigger) gripe is that you can define these values with
arbitrary objects that will blow up at parse time. So for example you
node = ast.Assign()
node.targets = "whatever"
Which, when you try to parse, blows up with "TypeError: Assign field
"targets" must be a list, not a str". I'd be much happier if this
blew up right away when you try to make the assignment. At the
moment, this is how it works in Jython (though I could support this
with some contorting).
BTW -- I *am* volunteering to attempt to implement these things in
CPython if there is support :)
In an attempt to figure out some twisted.web code, I was reading
through the Python Standard Library’s mimetypes module today, and
was shocked at the poor quality of the code. I wonder how the
mimetypes code made it into the standard library, and whether anyone
has ever bothered to read it or update it: it is an embarrassment.
Much of the code is redundant, portions fail to execute, control
flow is routed through a horribly confusing mess of spaghetti, and
most of the complexity has no clear benefit as far as I can tell. I
probably should drop the subject and get back to work, but as a good
citizen, it’s hard to just ignore this sort of thing.
mimetypes.py stores its types in a pair of dictionaries, one for
"strict" use, and the other for "non-standard types". It creates the
strict dictionary by default out of apache's mime.types file, and
then overrides the entries it finds with a set of exceptions. Then
it creates the non-standard dictionary, which is set to match if the
strict parameter is set to False when guessing types. Just in this
basic design, and in the list of types in the file, there are
* Various apache mime types files are read, if found, but the
ordering of the files is such that older versions of apache are
sometimes read after newer ones, overriding updated mime types
with out-of-date versions if multiple versions of apache are
installed on the system.
* The vast majority of types declared in mimetypes.py are
duplicates of types already declared by Apache. In a few cases
this is to change the apache default (make an exception, that
is), but in most cases the mime type and extension are
completely identical. This huge number of redundant types makes
the file substantially harder to follow. No comments are
provided to explain why various sets of exceptions are made to
Apache's default mime types, and in several cases mimetypes.py
seems to just be out of date as compared to recent versions of
Apache, for instance not knowing about the 'text/troff' type
which was registered in January 2006 in RFC 4263.
* The 'non-standard' type dictionary is nearly useless, because
all of the types it declares are already in apache's mime.types
file, meaning that types are, as far as I can tell trying to
follow ugly program flow, *never* drawn from the non-strict
dictionary, except in the improbable situation where the
mimetypes module is initialized with a custom set of
apache-mime.types–like files, which does not include those
'non-standard' types. I personally cannot see a use case for
initializing the module with a custom set of mime types, but
then leaving the very few types included as non-strict to the
defaults: this seems like a fragile and pathological use case.
Given this, I don’t see any benefit to dragging the 'strict'
parameter along all the way through the code, and would advise
getting rid of it altogether. Does anyone know of any code that
uses the mimetypes module with strict set to False, where the
non-strict code path ever *actually* is executed?
But though these problems, which affect actual use of the code and
are therefore probably most important, are significant, they really
pale in comparison to the awful quality of implementation. I'll try
to briefly outline my understanding of how code flows in
mimetypes.py, and what the problems are. I haven't stepped through
the code in a debugger, this is just from reading it, so I apologize
in advance if I get something wrong. This is, however, some of the
worst code I’ve seen in the standard library or anywhere else.
* It defines __all__: I didn’t even realize __all__ could be used
for single-file modules (w/o submodules), but it definitely
shouldn’t be here. This specific __all__ oddly does not include
all of the documented variables and functions in the mimetypes
class. It’s not clear why someone calling import * here wouldn’t
want the bits not included.
* It creates a _default_mime_types() function which declares a
bunch of global variables, and then immediately calls
_default_mime_types() below the definition. There is literally
no difference in result between this and just putting those
variables at the top level of the file, so I have no idea why
this function exists, except to make the code more confusing.
* It allows command line usage: I don’t think this is necessary
for a part of the standard library like this. There are better
tools for finding mime types from the command line which ship
with most operating systems.
* Its API is pretty poorly designed. It offers 6 functions when
about 3 are needed, and it takes a couple reads-through of the
code to figure out exactly what any of them are supposed to do.
* The operation is crazy: It defines a MimeTypes class which
actually stores the type mappings, but this class is designed to
be a singleton. The way that such a design is enforced is
through the use of the module-global 'init' function, which
makes an instance of the class, and then maps all of the
functions in the module global namespace to instance methods.
But confusingly, all such functions are also defined
independently of the init function, with definitions such as:
def guess_type(url, strict=True):
if not inited:
return guess_type(url, strict)
I’d be amazed if anyone could guess what that code was trying to
do. I did a double-take when I saw it.
Of course, that return call is only ever reached the first time
this function is called, if init() has not happened yet. This
was all presumably done for lazy initialization, so that the
type information would only be loaded when needed. Needless to
say, there are more pythonic ways to accomplish such a goal.
Oh, also, the other good one here is that it means that someone
who writes `from mimetypes import guess_types` gets something
different than someone who writes:
`import mimetypes; guess_types = mimetypes.guess_types`. In the
former case, this wrapper function is saved as guess_type, which
each time just calls the (changed after init())
mimetypes.guess_types function. This caused a performance
nightmare before March of this year, when there was no check for
`if not inited` before running init() (amazing!?).
* Because the type datastore is set up to be a singleton, any time
init() is called in one section of code, it resets any types
which have been added manually: this means that if init() is
called by different pieces of code in the same python program,
they will interfere with each-others’ type databases, and break
each-other. This is extremely fragile and, in my opinion, crazy.
It is hard for me to imagine any use case that would benefit
from this ability to clobber custom type mappings, and I very
much doubt that any code calling the mimetypes module realizes
that the contract of the API is so flimsy by definition. In
practice, I would not advise consumers of this API to ever call
init() manually, or to ever add custom mime type mappings,
because they are setting themselves up for hard-to-track bugs
down the line.
* The 'inited' flag is a documented part of the interface, in the
standard library documentation. I cannot imagine any reason to
set this flag manually: setting it to false when it was true
will have no effect, because the top-level functions have
already been replaced by instance methods of the 'db' MimeTypes
instance. Setting it to true when it was false will make the
code just break outright.
* In python 3, this has been changed a bit. There’s still an
inited flag, and it still in the docs, but now awful code from
above has been changed slightly, to:
def guess_type(url, strict=True):
if _db is None:
return _db.guess_type(url, strict)
Which is still embarrassingly confusing. On the upside, the
inited flag now does literally nothing, but remains defined, and
in the docs.
* The 'types_map' and 'common_types' (for 'strict' and
'common' types, respectively) dictionaries are also a documented
part of the interface. When init() is called, a new MimeTypes
instance makes a (different) types_map which is a tuple of two
dictionaries, for 'strict' and 'common' types. Then this
instance reads the apache mime.types files and adds the types to
its pair of self.types_map dictionaries, and then after that
looks at the global types_map and common_types dictionaries and
adds *those* types to its self.types_map. Then at the end it
replaces the global types_map with self.types_map[True] and
replaces common_types with self.types_map[False]. Unfortunately,
while changing these dictionaries will have an effect on the
operation of the library, it will not update the types_map_inv
mapping, so inverse lookups will not behave as the changer
expects. If these dictionaries are going to remain documented,
the documentation should be clear to describe them as read only
to avoid very confusing bugs.
* Speaking of these dictionaries, .copy() is called on those two
and a few other inside MimeTypes.__init__(), which happens every
time the global init() function is called, but then init() puts
the copies back in the global namespace, meaning that the
original is discarded. Basically the only reason for the .copy()
is to make sure that the correct updates are applied to the
apache mimetype defaults, but the code will gladly re-read all
of the apache files even after its mapped types are already in
these dictionaries, essentially making re-initializing a (very
expensive) no-op. All we’re doing is a lot of unnecessary extra
disk reads and memory allocations and deallocations. The only
time this has any effect is when a non-singleton MimeTypes
instance is created, as in the read_mime_types function.
* And that read_mime_types function is a doozy. It tries to open a
filename, spits back None if there’s an IOError (instead of
raising the exception as it should), and then creates a new
MimeTypes instance (remember, this is identical to the singleton
MimeTypes instance because it starts itself from that one’s
mappings), adds any new types it finds in the file with that
name, and then returns the 'strict' types_map from it. I’m not
sure whether any sane user of this API would expect it to return
the existing type mappings *plus* the extra ones in the provided
filename, but I really can’t imagine this function ever being
particularly useful: it requires you are reading mime types in
apache format, but not the apache mime type files you already
looked at, and then the only way to find out what new mappings
were defined is to take the difference of the default mappings
with the result of the function.
* The code itself, on a line-by-line basis, is unpythonic and
unnecessarily verbose, confusing, and slow. The code should be
rewritten to use python 2.3–2.6 features: even leaving its
functionality identical it could be cut to about half the number
of lines, and made clearer.
In case the above doesn’t make this clear: this code is extremely
confusing. Trying to read it has caused all the people around me to
look up as I shout "what the fuck??!" at the screen every few
minutes, as each new revelation gives another surprise. I’m not
convinced that I completely understand what the code does, because
it has been quite effectively obfuscated, but I understand enough to
want to throw the whole thing out, and start essentially from
So the question is, what should be done about this? I’d like to hear
how people use the mimetypes module, and how they expect it to work,
to figure out the sanest possible mostly-backwards-compatible
replacement which could be dropped in (ideally this would just allow
the use of default mimetypes and rip out the ability to alter the
default datastore: or is there some easy way to change this away
from a singleton without breaking code which calls these methods?),
and then extend that replacement to support a somewhat saner model
for anyone who actually wants to extend the set of mappings. My
guess is that replacement code could actually fix subtle bugs in
existing uses of this module, by people who had a sane expectation
of how it was supposed to work.
At the very least, the parts about figuring out exactly which
exceptions to Apache’s set of default types are useful would be a
good idea, and I’d maybe even recommend including an up-to-date copy
of Apache’s mime.types file in the Python Standard Library, and then
only overriding its definitions for future versions of Apache (and
then overriding the combination of both of those with further
exceptions deemed useful for python, with comments explaining why
each exception), so that we’re not bothering to look up horribly
out-of-date types in multiple locations from Apache 1, 1.2, 1.3,
etc. I’d also recommend making the API for overriding definitions be
the same as the code used to declare the default overrides, because
as it is there are three ways do define types: a) in a mime.types
formatted file, b) in a python dictionary that gets initialized with
a confusing bit of code, and c) through the add_type function.
Does anyone else have thoughts about this, or maybe some good (it
had better be *really* good) explanations why this code is the way
it is? I'd be happy to try to rewrite it, but I think I’d need a bit
of help figuring out how to make the rewrite backwards-compatible.
Note: someone else has had fun with this module:
I sent fourteen requests for licenses in to Microsoft. I've asked them
to let me know which they grant (since they may choose to limit the
number) and will inform you all personally when I hear their decision.
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Watch PyCon on video now! http://pycon.blip.tv/
I'm cross-posting this to distributions@freedesktop and python-dev,
because the topic is relevant to both groups and should be solved in
In Python's default configuration (on linux), both purelib (location for
pure python modules) and platlib (location for platform-dependent binary
extensions) point to $prefix/lib/pythonX.Y/site-packages.
That is no good for two main reasons.
One, python depends on the "lib" directory. (from distro's point of
view, prefix is /usr, so let's talk /usr/lib) Due to this, it's
impossible to install python under /usr/lib64 without heavy patching.
Repeated attempts to bring python developers to acknowledge and rectify
the situation have all failed (common argument here is "that would mean
redesign of distutils and huge parts of whatnot").
Conversely, that also means that multiarch setup (/usr/lib or lib32 with
32bit python and /usr/lib64 with 64bit python) is not possible with
Two, the default configuration makes purelib and platlib identical,
which somehow defeats the purpose of the distinction in the first place.
You either need to patch the default, or supply some alternate
configuration to take advantage of this feature.
And that's not the end of it - the next step is to make python aware of
two different locations on sys.path, one for purelib and one for
platlib, which is a different story altogether.
As distributors, we like to take advantage of purelib/platlib separation
to package pure python modules as platform-independent (noarch for
rpm-speakers). And that's not easy to do properly.
Let's put our heads together and choose good default locations for
purelib and platlib. Then add support to python for recognizing the
locations by default, and possibly leave note in FHS that "this is the
This is IMO a good first step to making python multiarch-aware, and it
would also help a bit with LSB integration .
I've come up with three basic options for the configuration (substitute
"/usr" with "$prefix" if you're not a distributor). This list is by no
means comprehensive, it's just what looked reasonable at the time of
1 - the traditional way
purelib = /usr/lib/pythonX.Y/site-packages
platlib = /usr/lib(64)/pythonX.Y/site-packages
+ this is already the default for 32bit systems
+ major distributions (including Fedora, Mandriva and now finally
openSUSE too) do this
- 32bit systems have no separation, poor they!
- with multiarch setup, /usr/lib is "cluttered" by both
platform-dependent files for 32bit and platform-independent files shared
by the platforms. Also, 64bit python can pick up 32bit modules. That
doesn't cause problems in practice, but doesn't fell like a clean design.
2 - the sharedir way
purelib = /usr/share/python/X.Y
platlib = /usr/lib(64)/pythonX.Y/site-packages
+ clean separation of purelib - nice!
+ unheard of - a good place to start anew
- FHS states that /usr/share is for data. But OTOH, they don't say much
about platform-independent bytecode. We could probably get an exception
- unheard of - everyone will be surprised
3 - the perl way
purelib = /usr/lib/pythonX.Y
platlib = /usr/lib/pythonX.Y/lib-dynload-(platform-identifier)/site-packages
+ possibility of multiarch packages that would install pure python parts
into purelib and extensions or accelerators for more platforms at once -
and therefore, possibility to split large modules into
platform-dependent and platform-independent parts and save space on
+ "idea compatibility" with perl and ruby, one less install layout to learn
- completely different from what we have now - would require the most
work from both python developers and distributions
python packager for SUSE Linux
Is the Express Edition of Visual C++ 2008 suitable for compiling
packages for Python 2.6 on Windows?
(And Python 2.6 itself for that matter...)
Ditto for 2.5, 3.1 and the trunk (which I guess becomes 3.2?)
Simplistix - Content Management, Batch Processing & Python Consulting
I'd like to express additional interest in python patch 1660179, discussed
On several occasions, I've had the desire for something like this. I've
made due with lambda functions, but as was mentioned, the lambda is clumsy
and harder to read than functools.compose would be.
A potentially common use-case is when a library has a multi-decorator use
case in which they want to compose a meta decorator out of one or more
Consider the hypothetical library.
# we have three decorators we use commonly
# do something with func
# do something to func.__doc__
func.data = data # this may not be legal,
but assume it does something useful
# we could use these decorators explicitly throughout our project
@dec_inject_some_data('foo data 1')
@dec_inject_some_data('foo data 2')
For two functions, that's not too onerous, but if it's used throughout the
application, it would be nice to abstract the collection of decorators. One
could do this with lambdas.
return lambda func:
But to me, a compose function is much easier to read and much more
consistent with the decorator usage syntax itself.
return compose(dec_register_function_for_x, dec_alter_docstring,
The latter implementation seems much more readable and elegant. One doesn't
even need to know the decorator signature to effectively compose
I've heard it said that Python is not a functional language, but if that
were really the case, then functools would not exist. In addition to the
example described above, I've had multiple occasions where having a general
purpose function composition function would have simplified the
implementation by providing a basic functional construct. While Python isn't
primarily a functional language, it does have some functional constructs,
and this is one of the features that makes Python so versatile; one can
program functionally, procedurally, or in an object-oriented way, all within
the same language.
I admit, I may be a bit biased; my first formal programming course was
taught in Scheme.
Nevertheless, I believe functools is the ideal location for a very basic and
general capability such as composition.
I realize this patch was rejected, but I'd like to propose reviving the
patch and incorporating it into functools.
I wrote to this list a few weeks ago asking about Py_TPFLAGS_HEAPTYPE
(http://thread.gmane.org/gmane.comp.python.devel/105648). It occurred
to me today that I could probably make object instances INCREF and
DECREF my type appropriately, without setting Py_TPFLAGS_HEAPTYPE, by
writing my own tp_alloc and tp_dealloc functions. My tp_alloc function
PyObject *my_tp_alloc(PyTypeObject *type, Py_ssize_t nitems)
PyObject *obj = PyType_GenericAlloc(type, nitems);
This seems right since it is PyType_GenericAlloc that contains this
if (type->tp_flags & Py_TPFLAGS_HEAPTYPE)
I don't want to set Py_TPFLAGS_HEAPTYPE, but I want to get that
Py_INCREF(), so far so good.
But I couldn't find the corresponding Py_DECREF() in typeobject.c to
the above Py_INCREF(). Notably, object_dealloc() does not call
Py_DECREF(self->ob_type) if self->ob_type has the Py_TPFLAGS_HEAPTYPE
So where does the Py_DECREF() for the above Py_INCREF() live? I
expected to find this code snippet somewhere, but couldn't:
if (type->tp_flags & Py_TPFLAGS_HEAPTYPE)
Several days ago, around the time the python.org servers went down, I
submitted a PEP to editor(a)python.org. When things to have been worked,
I submitted the PEP again. I have not seen any activity on the PEP in
Python-Dev or any reply acknowledging that it was received. Did I
misunderstand the process of submitting a PEP?
Mail Delivery System wrote:
> This is the mail system at host server1.simplistix.co.uk.
> I'm sorry to have to inform you that your message could not
> be delivered to one or more recipients. It's attached below.
> For further assistance, please send mail to postmaster.
> If you do so, please include this problem report. You can
> delete your own text from the attached returned message.
> The mail system
> <john(a)arbash-meinel.com>: host mail2.domainpeople.com[18.104.22.168] said: 550
> relay not permitted (in reply to RCPT TO command)
Looks like something is not happy with your mail setup...
Simplistix - Content Management, Batch Processing & Python Consulting