[Python-Dev] standard library mimetypes module pathologically broken?

Jacob Rus jacobolus at gmail.com
Sun Aug 2 20:56:29 CEST 2009


Jim Jewett wrote:
> [It may be worth creating a patch; I think most of these comments
> would be better on the bug-tracker.]

I'm going to do that shortly.

> (1)  In a few cases, it looked like you were changing parameter names
> between "files" and "filenames".  This might break code that was
> calling it with keyword arguments -- as I typically would for this
> type of function.

Sorry, that was a mistake.

> (1a)  If you are going to change the .sig, you might as well do it
> right, and make the default be "knownfiles" rather than the empty
> tuple.

Seems reasonable.

> (2)  The comment about why inited was set true at the beginning of the
> function instead of the end should probably be kept, or at least
> reworded.
>
> (3) Default values:
>
> (3a) Why the list of known files going back to Apache 1.2, in that
> order?  Is there any risk in using too *new* of a MimeTypes file?
>
> I would assume that the goal is to pick up whatever changes the user
> has made locally, but in that case, it still makes sense to have the
> newest file be the last one read, in case Apache has made bugfixes.

I did not change this in my patch, but I completely agree. Indeed, I
think it makes more sense to grab the newest Apache mime.types and
just include them with the standard library, either as an in-code
python object, or as a mime.types file to be parsed.

> (3b)  Also, this would improve cross-platform consistency; if I read
> that correctly, the Apache files will override the python defaults on
> unix or a mac, but not on windows.  That will change the results on
> the majority of items in _common_types.  (application vs text, whether
> to put an x- in front of the word pict.)

Quite possibly true. It actually seems

> (3c)  rtf is listed in non-standard, but
> http://www.iana.org/assignments/media-types/ does define it.  (Though
> whether to guess application vs text is not defined, and python
> chooses differently from apache.)
>
> (3d)  jpg is listed as non-standard.  It turns out that this is just
> for the inverse mapping, where image/jpg is non-standard (for
> image/jpeg) but that is worth a comment.  (see #5)
>
> (3e)  In _types_map, the lines marked duplicates are duplicate keys,
> not duplicate values; it would be more clear to also comment out the
> (first) line itself, instead of just marking it a duplicate.  (Or
> better yet, to mention that it is just being added for the inverse
> mapping, if that is the case.)

I completely agree that this whole section should be considered
carefully. Just any changes might have more impact on backwards
compatibility than the code flow changes I made, so I thought they
could be in a separate patch.

> (4)  Why bother to lazyinit?    Is there any sane usecase for a
> MimeTypes that hasn't been inited?

Only because the original was written that way, back in 1997 or
whatever. I don't think there's necessarily any need for it these
days: reading the default files even should be blazingly fast, unless
the disk is otherwise thrashing: each is about a a 37k file, and there
are at most going to be 3 or 4 of them installed on one machine for
different versions of Apache.

Cheers,
Jacob Rus


More information about the Python-Dev mailing list