standard library mimetypes module pathologically broken?

[It may be worth creating a patch; I think most of these comments would be better on the bug-tracker.] (1) In a few cases, it looked like you were changing parameter names between "files" and "filenames". This might break code that was calling it with keyword arguments -- as I typically would for this type of function. (1a) If you are going to change the .sig, you might as well do it right, and make the default be "knownfiles" rather than the empty tuple. (2) The comment about why inited was set true at the beginning of the function instead of the end should probably be kept, or at least reworded. (3) Default values: (3a) Why the list of known files going back to Apache 1.2, in that order? Is there any risk in using too *new* of a MimeTypes file? I would assume that the goal is to pick up whatever changes the user has made locally, but in that case, it still makes sense to have the newest file be the last one read, in case Apache has made bugfixes. (3b) Also, this would improve cross-platform consistency; if I read that correctly, the Apache files will override the python defaults on unix or a mac, but not on windows. That will change the results on the majority of items in _common_types. (application vs text, whether to put an x- in front of the word pict.) (3c) rtf is listed in non-standard, but http://www.iana.org/assignments/media-types/ does define it. (Though whether to guess application vs text is not defined, and python chooses differently from apache.) (3d) jpg is listed as non-standard. It turns out that this is just for the inverse mapping, where image/jpg is non-standard (for image/jpeg) but that is worth a comment. (see #5) (3e) In _types_map, the lines marked duplicates are duplicate keys, not duplicate values; it would be more clear to also comment out the (first) line itself, instead of just marking it a duplicate. (Or better yet, to mention that it is just being added for the inverse mapping, if that is the case.) (4) Why bother to lazyinit? Is there any sane usecase for a MimeTypes that hasn't been inited? I see value in not reading the default files, but none in not reading at least the files that were asked for. I could see value in only partial initialization if there were several long steps, but right now, initialization is all-or-nothing. If the thing is useless without an init, then it makes sense to just get done it immediately and skip the later checks; anyone who could have actually saved time should just remove the import. -jJ

Jim Jewett wrote:
[It may be worth creating a patch; I think most of these comments would be better on the bug-tracker.]
I'm going to do that shortly.
Sorry, that was a mistake.
Seems reasonable.
I did not change this in my patch, but I completely agree. Indeed, I think it makes more sense to grab the newest Apache mime.types and just include them with the standard library, either as an in-code python object, or as a mime.types file to be parsed.
Quite possibly true. It actually seems
I completely agree that this whole section should be considered carefully. Just any changes might have more impact on backwards compatibility than the code flow changes I made, so I thought they could be in a separate patch.
(4) Why bother to lazyinit? Is there any sane usecase for a MimeTypes that hasn't been inited?
Only because the original was written that way, back in 1997 or whatever. I don't think there's necessarily any need for it these days: reading the default files even should be blazingly fast, unless the disk is otherwise thrashing: each is about a a 37k file, and there are at most going to be 3 or 4 of them installed on one machine for different versions of Apache. Cheers, Jacob Rus

Jim Jewett wrote:
[It may be worth creating a patch; I think most of these comments would be better on the bug-tracker.]
I'm going to do that shortly.
Sorry, that was a mistake.
Seems reasonable.
I did not change this in my patch, but I completely agree. Indeed, I think it makes more sense to grab the newest Apache mime.types and just include them with the standard library, either as an in-code python object, or as a mime.types file to be parsed.
Quite possibly true. It actually seems
I completely agree that this whole section should be considered carefully. Just any changes might have more impact on backwards compatibility than the code flow changes I made, so I thought they could be in a separate patch.
(4) Why bother to lazyinit? Is there any sane usecase for a MimeTypes that hasn't been inited?
Only because the original was written that way, back in 1997 or whatever. I don't think there's necessarily any need for it these days: reading the default files even should be blazingly fast, unless the disk is otherwise thrashing: each is about a a 37k file, and there are at most going to be 3 or 4 of them installed on one machine for different versions of Apache. Cheers, Jacob Rus
participants (2)
-
Jacob Rus
-
Jim Jewett