[Python-Dev] standard library mimetypes module pathologically broken?

Sat Aug 1 01:07:34 CEST 2009

Brett Cannon wrote:
>>>>  * It creates a _default_mime_types() function which declares a
>>>>    bunch of global variables, and then immediately calls
>>>>    _default_mime_types() below the definition. There is literally
>>>>    no difference in result between this and just putting those
>>>>    variables at the top level of the file, so I have no idea why
>>>>    this function exists, except to make the code more confusing.
>>>
>>> It could potentially be used for testing, but that's a guess.
>>
>> Here's an abridged version of this function. I don’t think there’s any
>> reason for this that I can see.
>>
>>    def _default_mime_types():
>>        global suffix_map
>>        global encodings_map
>>        global types_map
>>        global common_types
>>
>>        suffix_map = {
>>            '.tgz': '.tar.gz', #...
>>            }
>>
>>        encodings_map = {
>>            '.gz': 'gzip', #...
>>            }
>>
>>        types_map = {
>>            '.a'      : 'application/octet-stream', #...
>>            }
>>
>>        common_types = {
>>            '.jpg' : 'image/jpg', #...
>>            }
>>
>>    _default_mime_types()
>
> As R. David pointed out, it is being used by regrtest to clean up after
> running the test suite.

Yeah, basically the issue is that the default mime types should be
separate objects from the final set after apache's files have been
parsed and custom additions have been made. If these ones at the top
level are renamed and not modified after creation, if new objects with
all the updated stuff is put at these names, and if the test code is
changed to instead reset the ones at these names based on the default
objects, I think that will maybe fix things.  I'll try to write some
potential patches in the next day or two and submit them here for
advice.

>> The problem is that the semantics as documented are really ambiguous,
>> and what I would consider the reasonable interpretation is different
>> from what the code actually does. So anyone using this code naively is
>> going to run into trouble, and anyone relying on how the code actually
>> works is going behind the back of the docs, but they sort of have to
>> in order to use much of the functionality of the module. I agree this
>> puts us in a tricky spot.
>
> Well, perhaps the docs can be updated to match the code where cleanup would
> change the semantics.

I think that would make the docs extremely confusing, and I’m not even
sure it would be possible. The current semantics are vaguely okay if
an API consumer sticks to straight-forward use cases, such as any
which don’t break when the current docs are followed (anything
complicated is going to break unless the code is read a few times),
and assuming such uses it would be possible to swap out most of the
implementation for something relatively straight-forward. But if any
of the edges are pushed, the semantics quickly turn insane, to the
point I’m not sure they’re document-able. Anyone expecting the code to
work that way is going to have a buggy program anyway, so I’m not sure
it makes sense to bend over backwards leaving the particular set of
bugs unchanged.

Cheers,
Jacob Rus