[Python-Dev] standard library mimetypes module pathologically broken?

Brett Cannon brett at python.org
Sat Aug 1 00:52:17 CEST 2009


On Fri, Jul 31, 2009 at 15:38, Jacob Rus <jacobolus at gmail.com> wrote:

> Brett Cannon wrote:
> > Jacob Rus wrote:
> >>  * It defines __all__: I didn’t even realize __all__ could be used
> >>    for single-file modules (w/o submodules), but it definitely
> >>    shouldn’t be here.
> >
> > __all__ is used to control what a module exports when used in an import
> *,
> > nothing more. Thus it's use in a module compared to a package is
> completely
> > legitimate.
> >
> >> This specific __all__ oddly does not include
> >>    all of the documented variables and functions in the mimetypes
> >>    class. It’s not clear why someone calling import * here wouldn’t
> >>    want the bits not included.
> >
> > If something is documented by not listed in __all__ that is a bug.
>
> In this case, everything in the module is documented, including parts
> that should be private, but only a small number are in __all__.  My
> recommendation would be to make those private parts be _ variables and
> remove them from the docs (using them has no legitimate use cases I
> can see), and rip out __all__.
>

Well, if the module had stuff that did not lead with an underscore then you
can't remove it. You can deprecate it under the old name and rename it with
an underscore, but backwards-compatibility says someone out there is using
those functions so you can't just batch rename them w/o the proper warning.


>
> >>  * It creates a _default_mime_types() function which declares a
> >>    bunch of global variables, and then immediately calls
> >>    _default_mime_types() below the definition. There is literally
> >>    no difference in result between this and just putting those
> >>    variables at the top level of the file, so I have no idea why
> >>    this function exists, except to make the code more confusing.
> >
> > It could potentially be used for testing, but that's a guess.
>
> Here's an abridged version of this function. I don’t think there’s any
> reason for this that I can see.
>
>    def _default_mime_types():
>        global suffix_map
>        global encodings_map
>        global types_map
>        global common_types
>
>        suffix_map = {
>            '.tgz': '.tar.gz', #...
>            }
>
>        encodings_map = {
>            '.gz': 'gzip', #...
>            }
>
>        types_map = {
>            '.a'      : 'application/octet-stream', #...
>            }
>
>        common_types = {
>            '.jpg' : 'image/jpg', #...
>            }
>
>    _default_mime_types()
>

As R. David pointed out, it is being used by regrtest to clean up after
running the test suite.


>
> > Probably came from someone who is very OO happy. Not everyone comes to
> > Python ready to embrace its procedural or slightly functional facets.
>
> Yes, it seems so to me too.
>
> > So the problem of changing fundamentally how the code works, even for a
> > cleanup, is that it will break someone's code out there because they
> > depended on the module's crazy way of doing things. Now if they are
> cheating
> > and looking at things that are meant to be hidden you might be able to
> clean
> > things up, but if the semantics are exposed to the user, then there is
> not
> > much we can do w/o breaking someone's code.
>
> The problem is that the semantics as documented are really ambiguous,
> and what I would consider the reasonable interpretation is different
> from what the code actually does. So anyone using this code naively is
> going to run into trouble, and anyone relying on how the code actually
> works is going behind the back of the docs, but they sort of have to
> in order to use much of the functionality of the module. I agree this
> puts us in a tricky spot.
>

Well, perhaps the docs can be updated to match the code where cleanup would
change the semantics.


>
> > Honestly, if the code is as bad as it seems -- including its API --, the
> > best bet would be to come up with a new module for handling MIME types
> from
> > scratch, put it up on the Cheeseshop/PyPI, and get the community behind
> it.
> > If the community picks it up as the de-facto replacement for mimetypes
> and
> > the code has settled we can then talk about adding it to the standard
> > library and begin deprecating mimetypes.
> > And thanks for willing to volunteer to fix this.
>
> Okay.  Well I'd still like to hear a bit about what people really need
> before trying to make a new API. I'm not such an experienced API
> designer, and I haven’t really plumbed the depths of mimetypes use
> cases (though it seems to me like quite a simple module of not more
> than 100 lines of code or so would suffice).


I'm sure you can get help from the community with any of this.


> At the very least, I
> think some changes can be made to this code without altering its basic
> function, which would clean up the actual mime types it returns,
> comment the exceptions to Apache and explain why they're there, and
> make the code flow understandable to someone reading the code.


That all sounds reasonable.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090731/8786d75d/attachment.htm>


More information about the Python-Dev mailing list