mimetypes.guess_type should not return deprecated mimetype application/x-javascript

mimetypes are parsed from the file /etc/mime.types cat /etc/mime.types | grep javascript application/javascript js application/x-javascript js actual: mimetypes.guess_type("x.js") == "application/x-javascript" -> deprecated mimetype expected: mimetypes.guess_type("x.js") == "application/javascript" spoiler: here, the "x-" part is deprecated. mimetypes.guess_type returns the deprecated mimetype because python returns the last entry in the /etc/mime.types file which is sorted alphabetically proposed solution: use a smarter conflict-resolution when parsing /etc/mime.types. when two entries are found for one file-extension, avoid using a deprecated mimetype. rfc4329 lists 16 items for "unregistered media type" these could be hard-coded as set-of-strings, or as regex. related bug report https://bugs.python.org/issue46035 mimetypes.guess_type https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type unregistered media type https://datatracker.ietf.org/doc/html/rfc4329#section-3

On 18Jan2022 08:43, Paul Bryan <pbryan@anode.ca> wrote:
Maybe, but I disagree about the proposed solution. There are circumstances where you want a specific MIME type guess, and the best way to do that is the apply a specific ordering to your mime.types file so that the desired type comes first, and is found first. And looking at milahu's example, that's exactly what was there: application/javascript js application/x-javascript js It would be reasonable and generally desireable to take the first one. The problem with saying "oh, let's examine the types and exclude or deprioritise the ones we don't like such as x-*" as that removes control from the user (the caller of mimetypes.guess_type() and the author of /etc/mime.types). It embeds fixed policy _inside_ mimetypes.guess_type() where it can't be turned off without growing a heap of weird mode flags. The better fix it to honour the order of the mime.types file as _expressing_ policy, which is what the OP already has - they just don't have guess_type() doing it that way. Arguments for embedding _policy_ inside guess_type() will be met with my standard example: the ancient Netscape proxy config had regexp based redirect rules, not uncommon. But Netscape prioritised these by the length of the regexp instead of the config file ordering. Insanity abounded with regexp wackiness purely to make some rules longer than others. Cheers, Cameron Simpson <cs@cskk.id.au>

Perhaps there should be a guess_all_types() function in addition to guess_type() that returns all possible types, so that the user can select the type they want using any criterion. This would be nicely symmetrical with the existing guess_extension and guess_all_extensions functions.

On 18Jan2022 08:43, Paul Bryan <pbryan@anode.ca> wrote:
Maybe, but I disagree about the proposed solution. There are circumstances where you want a specific MIME type guess, and the best way to do that is the apply a specific ordering to your mime.types file so that the desired type comes first, and is found first. And looking at milahu's example, that's exactly what was there: application/javascript js application/x-javascript js It would be reasonable and generally desireable to take the first one. The problem with saying "oh, let's examine the types and exclude or deprioritise the ones we don't like such as x-*" as that removes control from the user (the caller of mimetypes.guess_type() and the author of /etc/mime.types). It embeds fixed policy _inside_ mimetypes.guess_type() where it can't be turned off without growing a heap of weird mode flags. The better fix it to honour the order of the mime.types file as _expressing_ policy, which is what the OP already has - they just don't have guess_type() doing it that way. Arguments for embedding _policy_ inside guess_type() will be met with my standard example: the ancient Netscape proxy config had regexp based redirect rules, not uncommon. But Netscape prioritised these by the length of the regexp instead of the config file ordering. Insanity abounded with regexp wackiness purely to make some rules longer than others. Cheers, Cameron Simpson <cs@cskk.id.au>

Perhaps there should be a guess_all_types() function in addition to guess_type() that returns all possible types, so that the user can select the type they want using any criterion. This would be nicely symmetrical with the existing guess_extension and guess_all_extensions functions.
participants (4)
-
Cameron Simpson
-
Jelle Zijlstra
-
milahu@gmail.com
-
Paul Bryan