mimetypes broken on Windows

Hi folks, The built-in mimetypes module is broken on Windows, and it has been since Python 2.7 alpha 1. On all Windows systems I've tried, guess_type() returns the wrong mime type for common types like .png and .jpg. For example (on Python 2.7.4 and 3.3.1):
These should be 'image/png' and 'image/jpeg', respectively. There's an open issue for this: http://bugs.python.org/issue15207. However, it hasn't gotten any love in the last few months, so per r.david.murray's comment, I'm posting it here. Dave Chambers, who opened the bug, has proposed a fix, which is significantly better (i.e., not totally broken for common types). However, as I mentioned in http://bugs.python.org/issue15207#msg177030, using the Windows registry for this at all is basically a bad idea, because: 1) Important keys like .jpg and .png aren't in the registry anyway. 2) Some that do exist are wrong in the Windows registry. This includes .zip, which is "application/x-zip-compressed" (at least in my registry) but should be "application/zip". 3) It makes the first call to guess_type() slow (~100ms), which isn't terrible, but with the above concerns, not worth it. 4) Perhaps most importantly: the keys in the Windows registry depend on what programs you have installed. And the users and programs can change registry keys at will. Obviously one can work around this bug, either by calling mimetypes.init(files=[]) before any calls to guess_type, or calling init() with your own mime types file. However, "broken out of the box" is going to cause a lot of people headaches. :-) So my proposal is simply to get rid of read_windows_registry() altogether, and fall back to the default type mapping in mimetypes.py on Windows systems. This is correct and fast, even if not complete. As always, folks can always use their own mimetypes file if they want. In summary: the current behaviour is buggy and broken, the behaviour proposed in Issue 15207 is problematic, getting this from the Windows registry is bad idea, and we should revert the whole registry thing. :-) If folks agree with my reasoning above, I can provide a patch to fix this, along with a patch to the Windows unit tests. -Ben P.S. Kind of proving my point about the fragility of using the registry, the Python 2.7.4 unit test test_registry_parsing in test_mimetypes.py fail on my machine. It's because I've installed some SQL server, and text/plain is my registry is mapped from .sql (instead of .txt), causing this: Traceback (most recent call last): File "C:\python27\lib\test\test_mimetypes.py", line 85, in test_registry_parsing eq(self.db.guess_type("foo.txt"), ("text/plain", None)) AssertionError: Tuples differ: (None, None) != ('text/plain', None)

On 4/15/2013 10:04 PM, Ben Hoyt wrote:
The actual mapping is fixed and more or less system independent while the windows registry is for volatile system and user dependent mappings.
And change what a given key is mapped to.
I basicallly agree, but am not sure what to do about back-compatibility considerations. But we do not have to reproduce buggy behavior.

On Tue, 16 Apr 2013 14:00:53 -0400, Terry Jan Reedy <tjreedy@udel.edu> wrote:
I basically agree as well, but as a non-windows user I'm not willing to commit any change without approval from a committer who actually understands what's going on. My understanding is that referencing the windows registry is a relatively new feature (I'm not sure exactly how new), and that it is itself causing more backward compatibility problems than would likely be caused by removing it. But as I said, I'm not enough of a Windows expert to be comfortable making that decision. I'm glad this was brought up on python-dev; it's been nagging at me that this issue hasn't been getting resolved. --David

(Sorry if this reply doesn't thread as I intend -- I wasn't configured to get python-dev emails, so I'm replying to my original with copy-n-paste.) On Tue, 16 Apr 2013 14:00:53 -0400, Terry Jan Reedy <tjreedy at udel.edu> wrote:
Agreed. What we have is just plain wrong. Dave Chambers' fix is better, but still problematic. What we *could* do is implement Dave Chambers' fix in read_windows_registry(), but not call this by default. So a user would have to explicitly call it if they really want Windows registry. But I actually don't think even that's necessary. I honestly can't see how anyone will be "depending" on the current behaviour, as it's just plain buggy (.png and .jpg give the wrong mime type). So I don't think backwards-compatibility is an issue here. As R. David Murray mentioned, reading the registry is quite new (Python 2.7 alpha 1, I believe), and has caused several problems already. There's been encoding issues, and there's even a duplicate of issue 15207, "part 3" of http://bugs.python.org/issue10551 But yes, I would love to see a Windows Python committer chip in, even if it's just with "agreed, please provide a patch". -Ben

On 16/04/2013 23:22, Ben Hoyt wrote:
But yes, I would love to see a Windows Python committer chip in, even if it's just with "agreed, please provide a patch".
I can chip in with an apology, at least. This has been on my to-do list for ages; but I have had absolutely minimal time to work on Python this last year. I'll set aside an hour later today to look over the different options and patches on offer and at least come back with an opinion on what should happen next, even if I have to ask someone else to apply the patch. Obviously should some other developer want to dive in, please do. Thanks for bringing it back to the table, Ben. TJG

On 17/04/2013 08:28, Tim Golden wrote:
I've responded over there for now --> http://bugs.python.org/issue15207#msg187158 TJG

On 4/15/2013 10:04 PM, Ben Hoyt wrote:
The actual mapping is fixed and more or less system independent while the windows registry is for volatile system and user dependent mappings.
And change what a given key is mapped to.
I basicallly agree, but am not sure what to do about back-compatibility considerations. But we do not have to reproduce buggy behavior.

On Tue, 16 Apr 2013 14:00:53 -0400, Terry Jan Reedy <tjreedy@udel.edu> wrote:
I basically agree as well, but as a non-windows user I'm not willing to commit any change without approval from a committer who actually understands what's going on. My understanding is that referencing the windows registry is a relatively new feature (I'm not sure exactly how new), and that it is itself causing more backward compatibility problems than would likely be caused by removing it. But as I said, I'm not enough of a Windows expert to be comfortable making that decision. I'm glad this was brought up on python-dev; it's been nagging at me that this issue hasn't been getting resolved. --David

(Sorry if this reply doesn't thread as I intend -- I wasn't configured to get python-dev emails, so I'm replying to my original with copy-n-paste.) On Tue, 16 Apr 2013 14:00:53 -0400, Terry Jan Reedy <tjreedy at udel.edu> wrote:
Agreed. What we have is just plain wrong. Dave Chambers' fix is better, but still problematic. What we *could* do is implement Dave Chambers' fix in read_windows_registry(), but not call this by default. So a user would have to explicitly call it if they really want Windows registry. But I actually don't think even that's necessary. I honestly can't see how anyone will be "depending" on the current behaviour, as it's just plain buggy (.png and .jpg give the wrong mime type). So I don't think backwards-compatibility is an issue here. As R. David Murray mentioned, reading the registry is quite new (Python 2.7 alpha 1, I believe), and has caused several problems already. There's been encoding issues, and there's even a duplicate of issue 15207, "part 3" of http://bugs.python.org/issue10551 But yes, I would love to see a Windows Python committer chip in, even if it's just with "agreed, please provide a patch". -Ben

On 16/04/2013 23:22, Ben Hoyt wrote:
But yes, I would love to see a Windows Python committer chip in, even if it's just with "agreed, please provide a patch".
I can chip in with an apology, at least. This has been on my to-do list for ages; but I have had absolutely minimal time to work on Python this last year. I'll set aside an hour later today to look over the different options and patches on offer and at least come back with an opinion on what should happen next, even if I have to ask someone else to apply the patch. Obviously should some other developer want to dive in, please do. Thanks for bringing it back to the table, Ben. TJG

On 17/04/2013 08:28, Tim Golden wrote:
I've responded over there for now --> http://bugs.python.org/issue15207#msg187158 TJG
participants (4)
-
Ben Hoyt
-
R. David Murray
-
Terry Jan Reedy
-
Tim Golden