[Python-Dev] standard library mimetypes module pathologically broken?
jacobolus at gmail.com
Sun Aug 2 20:56:29 CEST 2009
Jim Jewett wrote:
> [It may be worth creating a patch; I think most of these comments
> would be better on the bug-tracker.]
I'm going to do that shortly.
> (1) In a few cases, it looked like you were changing parameter names
> between "files" and "filenames". This might break code that was
> calling it with keyword arguments -- as I typically would for this
> type of function.
Sorry, that was a mistake.
> (1a) If you are going to change the .sig, you might as well do it
> right, and make the default be "knownfiles" rather than the empty
> (2) The comment about why inited was set true at the beginning of the
> function instead of the end should probably be kept, or at least
> (3) Default values:
> (3a) Why the list of known files going back to Apache 1.2, in that
> order? Is there any risk in using too *new* of a MimeTypes file?
> I would assume that the goal is to pick up whatever changes the user
> has made locally, but in that case, it still makes sense to have the
> newest file be the last one read, in case Apache has made bugfixes.
I did not change this in my patch, but I completely agree. Indeed, I
think it makes more sense to grab the newest Apache mime.types and
just include them with the standard library, either as an in-code
python object, or as a mime.types file to be parsed.
> (3b) Also, this would improve cross-platform consistency; if I read
> that correctly, the Apache files will override the python defaults on
> unix or a mac, but not on windows. That will change the results on
> the majority of items in _common_types. (application vs text, whether
> to put an x- in front of the word pict.)
Quite possibly true. It actually seems
> (3c) rtf is listed in non-standard, but
> http://www.iana.org/assignments/media-types/ does define it. (Though
> whether to guess application vs text is not defined, and python
> chooses differently from apache.)
> (3d) jpg is listed as non-standard. It turns out that this is just
> for the inverse mapping, where image/jpg is non-standard (for
> image/jpeg) but that is worth a comment. (see #5)
> (3e) In _types_map, the lines marked duplicates are duplicate keys,
> not duplicate values; it would be more clear to also comment out the
> (first) line itself, instead of just marking it a duplicate. (Or
> better yet, to mention that it is just being added for the inverse
> mapping, if that is the case.)
I completely agree that this whole section should be considered
carefully. Just any changes might have more impact on backwards
compatibility than the code flow changes I made, so I thought they
could be in a separate patch.
> (4) Why bother to lazyinit? Is there any sane usecase for a
> MimeTypes that hasn't been inited?
Only because the original was written that way, back in 1997 or
whatever. I don't think there's necessarily any need for it these
days: reading the default files even should be blazingly fast, unless
the disk is otherwise thrashing: each is about a a 37k file, and there
are at most going to be 3 or 4 of them installed on one machine for
different versions of Apache.
More information about the Python-Dev