Are undocumented exceptions considered bugs?
Hi, I'm not sure this is the right place to ask this question, but I thought I'd give it a shot since it also concerns the Python standard library. I'm writing an automated test case generation tool for Python programs that explores all possible execution paths through a program. When applying this tool on Python's 2.7.3 urllib package, it discovered input strings for which the urllib.urlopen(url) call would raise a TypeError. For instance: urllib.urlopen('\x00\x00\x00') [...] File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 86, in urlopen return opener.open(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 207, in open return getattr(self, name)(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 462, in open_file return self.open_local_file(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 474, in open_local_file stats = os.stat(localname) TypeError: must be encoded string without NULL bytes, not str In the urllib documentation it is only mentioned that the IOError is raised when the connection cannot be established. Since the input passed is a string (and not some other type), is the TypeError considered a bug (either in the documentation, or in the implementation)? Thanks a lot, Stefan
On Sat, Mar 23, 2013 at 4:05 AM, Stefan Bucur
Hi,
I'm not sure this is the right place to ask this question, but I thought I'd give it a shot since it also concerns the Python standard library.
It's the right place to ask :)
I'm writing an automated test case generation tool for Python programs that explores all possible execution paths through a program. When applying this tool on Python's 2.7.3 urllib package, it discovered input strings for which the urllib.urlopen(url) call would raise a TypeError.
That sounds like a really interesting tool.
For instance:
urllib.urlopen('\x00\x00\x00')
[...] File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 86, in urlopen return opener.open(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 207, in open return getattr(self, name)(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 462, in open_file return self.open_local_file(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 474, in open_local_file stats = os.stat(localname) TypeError: must be encoded string without NULL bytes, not str
In the urllib documentation it is only mentioned that the IOError is raised when the connection cannot be established. Since the input passed is a string (and not some other type), is the TypeError considered a bug (either in the documentation, or in the implementation)?
The general answer is that there are certain exceptions that usually aren't documented because almost all code can trigger them if you pass the right kind of invalid argument. For example, almost any API can emit TypeError or AttributeError if you pass an instance of the wrong type, and many can emit ValueError, IndexError or KeyError if you pass an incorrect value. Other errors like SyntaxError, ImportError, NameError and UnboundLocalError usually indicate bugs or environmental configuration issues, so are also typically omitted when documenting the possible exceptions for particular APIs. In this specific case, the error message is confusing-but-not-really-wrong, due to the "two-types-in-one" nature of Python 2.x strings - 8-bit strings are used as both text sequences (generally not containing NUL characters) and also as arbitrary binary data, including encoded text (quite likely to contain NUL bytes). I think a bug report for this would be appropriate, with the aim of making that error message less confusing (it's a fairly obscure case, though). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Mar 23, 2013 at 11:21 AM, Nick Coghlan
In this specific case, the error message is confusing-but-not-really-wrong, due to the "two-types-in-one" nature of Python 2.x strings - 8-bit strings are used as both text sequences (generally not containing NUL characters) and also as arbitrary binary data, including encoded text (quite likely to contain NUL bytes).
With your terminology, three types: char, non-NUL-text, encoded-text (e.g. what happens with ord('ab')) That's pretty silly, considering that these are all one Python type, and TypeError is raised into Python code. Obviously it can't change, because of historical reasons, but documenting it would be straightforward and helpful. These are not errors you can just infer will happen, you need to see it via documentation, reading the source, or experimentation (and re experimentation, then you have to establish whether or not this was an accident or deliberate). -- Devin
On Sun, Mar 24, 2013 at 2:09 AM, Devin Jeanpierre
On Sat, Mar 23, 2013 at 11:21 AM, Nick Coghlan
wrote: In this specific case, the error message is confusing-but-not-really-wrong, due to the "two-types-in-one" nature of Python 2.x strings - 8-bit strings are used as both text sequences (generally not containing NUL characters) and also as arbitrary binary data, including encoded text (quite likely to contain NUL bytes).
With your terminology, three types: char, non-NUL-text, encoded-text (e.g. what happens with ord('ab'))
That's pretty silly, considering that these are all one Python type, and TypeError is raised into Python code. Obviously it can't change, because of historical reasons, but documenting it would be straightforward and helpful. These are not errors you can just infer will happen, you need to see it via documentation, reading the source, or experimentation (and re experimentation, then you have to establish whether or not this was an accident or deliberate).
Thanks for your answers, guys, and sorry for replying so late (had a research paper submission deadline in the mean time...). Filing a bug report for this issue sounds like a good idea. I have just submitted http://bugs.python.org/issue17624 Stefan
participants (3)
-
Devin Jeanpierre
-
Nick Coghlan
-
Stefan Bucur