mimetypes.guess_type should not return deprecated mimetype application/x-javascript

mimetypes are parsed from the file /etc/mime.types
cat /etc/mime.types | grep javascript application/javascript js application/x-javascript js
actual: mimetypes.guess_type("x.js") == "application/x-javascript" -> deprecated mimetype
expected: mimetypes.guess_type("x.js") == "application/javascript"
spoiler: here, the "x-" part is deprecated.
mimetypes.guess_type returns the deprecated mimetype because python returns the last entry in the /etc/mime.types file which is sorted alphabetically
proposed solution: use a smarter conflict-resolution when parsing /etc/mime.types. when two entries are found for one file-extension, avoid using a deprecated mimetype. rfc4329 lists 16 items for "unregistered media type" these could be hard-coded as set-of-strings, or as regex.
related bug report https://bugs.python.org/issue46035
mimetypes.guess_type https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type
unregistered media type https://datatracker.ietf.org/doc/html/rfc4329#section-3

+1
"x-" prefix indicates ad hoc (unofficial), not deprecated.
I agree, an official MIME type should be preferred over an unofficial one.
On Tue, 2022-01-18 at 16:26 +0000, milahu@gmail.com wrote:
mimetypes are parsed from the file /etc/mime.types
cat /etc/mime.types | grep javascript application/javascript js application/x-javascript js
actual: mimetypes.guess_type("x.js") == "application/x-javascript" -> deprecated mimetype
expected: mimetypes.guess_type("x.js") == "application/javascript"
spoiler: here, the "x-" part is deprecated.
mimetypes.guess_type returns the deprecated mimetype because python returns the last entry in the /etc/mime.types file which is sorted alphabetically
proposed solution: use a smarter conflict-resolution when parsing /etc/mime.types. when two entries are found for one file-extension, avoid using a deprecated mimetype. rfc4329 lists 16 items for "unregistered media type" these could be hard-coded as set-of-strings, or as regex.
related bug report https://bugs.python.org/issue46035
mimetypes.guess_type https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type
unregistered media type https://datatracker.ietf.org/doc/html/rfc4329#section-3 _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V53XGQ... Code of Conduct: http://python.org/psf/codeofconduct/

On 18Jan2022 08:43, Paul Bryan pbryan@anode.ca wrote:
+1
"x-" prefix indicates ad hoc (unofficial), not deprecated.
I agree, an official MIME type should be preferred over an unofficial one.
Maybe, but I disagree about the proposed solution. There are circumstances where you want a specific MIME type guess, and the best way to do that is the apply a specific ordering to your mime.types file so that the desired type comes first, and is found first.
And looking at milahu's example, that's exactly what was there:
application/javascript js application/x-javascript js
It would be reasonable and generally desireable to take the first one.
The problem with saying "oh, let's examine the types and exclude or deprioritise the ones we don't like such as x-*" as that removes control from the user (the caller of mimetypes.guess_type() and the author of /etc/mime.types). It embeds fixed policy _inside_ mimetypes.guess_type() where it can't be turned off without growing a heap of weird mode flags.
The better fix it to honour the order of the mime.types file as _expressing_ policy, which is what the OP already has - they just don't have guess_type() doing it that way.
Arguments for embedding _policy_ inside guess_type() will be met with my standard example: the ancient Netscape proxy config had regexp based redirect rules, not uncommon. But Netscape prioritised these by the length of the regexp instead of the config file ordering. Insanity abounded with regexp wackiness purely to make some rules longer than others.
Cheers, Cameron Simpson cs@cskk.id.au

Perhaps there should be a guess_all_types() function in addition to guess_type() that returns all possible types, so that the user can select the type they want using any criterion. This would be nicely symmetrical with the existing guess_extension and guess_all_extensions functions.

On 18Jan2022 13:21, Jelle Zijlstra jelle.zijlstra@gmail.com wrote:
Perhaps there should be a guess_all_types() function in addition to guess_type() that returns all possible types, so that the user can select the type they want using any criterion. This would be nicely symmetrical with the existing guess_extension and guess_all_extensions functions.
Indeed. +1
That would allow applying policy beyond the mime.types file ordering.
Cheers, Cameron Simpson cs@cskk.id.au
participants (4)
-
Cameron Simpson
-
Jelle Zijlstra
-
milahu@gmail.com
-
Paul Bryan