[Python-Dev] standard library mimetypes module pathologically broken?

Jacob Rus jacobolus at gmail.com
Sat Aug 1 00:38:32 CEST 2009


Brett Cannon wrote:
> Jacob Rus wrote:
>>  * It defines __all__: I didn’t even realize __all__ could be used
>>    for single-file modules (w/o submodules), but it definitely
>>    shouldn’t be here.
>
> __all__ is used to control what a module exports when used in an import *,
> nothing more. Thus it's use in a module compared to a package is completely
> legitimate.
>
>> This specific __all__ oddly does not include
>>    all of the documented variables and functions in the mimetypes
>>    class. It’s not clear why someone calling import * here wouldn’t
>>    want the bits not included.
>
> If something is documented by not listed in __all__ that is a bug.

In this case, everything in the module is documented, including parts
that should be private, but only a small number are in __all__.  My
recommendation would be to make those private parts be _ variables and
remove them from the docs (using them has no legitimate use cases I
can see), and rip out __all__.

>>  * It creates a _default_mime_types() function which declares a
>>    bunch of global variables, and then immediately calls
>>    _default_mime_types() below the definition. There is literally
>>    no difference in result between this and just putting those
>>    variables at the top level of the file, so I have no idea why
>>    this function exists, except to make the code more confusing.
>
> It could potentially be used for testing, but that's a guess.

Here's an abridged version of this function. I don’t think there’s any
reason for this that I can see.

    def _default_mime_types():
        global suffix_map
        global encodings_map
        global types_map
        global common_types

        suffix_map = {
            '.tgz': '.tar.gz', #...
            }

        encodings_map = {
            '.gz': 'gzip', #...
            }

        types_map = {
            '.a'      : 'application/octet-stream', #...
            }

        common_types = {
            '.jpg' : 'image/jpg', #...
            }

    _default_mime_types()

> Probably came from someone who is very OO happy. Not everyone comes to
> Python ready to embrace its procedural or slightly functional facets.

Yes, it seems so to me too.

> So the problem of changing fundamentally how the code works, even for a
> cleanup, is that it will break someone's code out there because they
> depended on the module's crazy way of doing things. Now if they are cheating
> and looking at things that are meant to be hidden you might be able to clean
> things up, but if the semantics are exposed to the user, then there is not
> much we can do w/o breaking someone's code.

The problem is that the semantics as documented are really ambiguous,
and what I would consider the reasonable interpretation is different
from what the code actually does. So anyone using this code naively is
going to run into trouble, and anyone relying on how the code actually
works is going behind the back of the docs, but they sort of have to
in order to use much of the functionality of the module. I agree this
puts us in a tricky spot.

> Honestly, if the code is as bad as it seems -- including its API --, the
> best bet would be to come up with a new module for handling MIME types from
> scratch, put it up on the Cheeseshop/PyPI, and get the community behind it.
> If the community picks it up as the de-facto replacement for mimetypes and
> the code has settled we can then talk about adding it to the standard
> library and begin deprecating mimetypes.
> And thanks for willing to volunteer to fix this.

Okay.  Well I'd still like to hear a bit about what people really need
before trying to make a new API. I'm not such an experienced API
designer, and I haven’t really plumbed the depths of mimetypes use
cases (though it seems to me like quite a simple module of not more
than 100 lines of code or so would suffice). At the very least, I
think some changes can be made to this code without altering its basic
function, which would clean up the actual mime types it returns,
comment the exceptions to Apache and explain why they're there, and
make the code flow understandable to someone reading the code.

Cheers,
Jacob Rus


More information about the Python-Dev mailing list