[Python-Dev] Difference in RE between 3.2 and 3.3 (or Aaron Swartz memorial)
Xavier Morel
catch-all at masklinn.net
Thu Mar 7 11:31:03 CET 2013
On 2013-03-07, at 11:08 , Matej Cepl wrote:
> On 2013-03-06, 18:34 GMT, Victor Stinner wrote:
>> In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's
>> not surprising that minor details like singleton differ. You should
>> not use "is" to compare strings in Python, or your program will fail
>> on other Python implementations (like PyPy, IronPython, or Jython) or
>> even on a different CPython version.
>
> I am sorry, I don't understand what you are saying. Even though
> this has been changed to
> https://github.com/mcepl/html2text/blob/fix_tests/html2text.py#L90
> the tests still fail.
>
> But, Amaury is right: the function doesn't make much sense.
> However, ...
>
> when I have “fixed it” from
> https://github.com/mcepl/html2text/blob/master/html2text.py#L95
>
> def onlywhite(line):
> """Return true if the line does only consist of whitespace characters."""
> for c in line:
> if c is not ' ' and c is not ' ':
> return c is ' '
> return line
>
> to
> https://github.com/mcepl/html2text/blob/fix_tests/html2text.py#L90
>
> def onlywhite(line):
> """Return true if the line does only consist of whitespace
> characters."""
> for c in line:
> if c != ' ' and c != ' ':
> return c == ' '
> return line
The second test looks like some kind of corruption, it's supposedly
iterating on the characters of a line yet testing for two spaces? Is it
possible that the original was a literal tab embedded in the source code
(instead of '\t') and that got broken at some point?
According to its name + docstring, the implementation of this method
should really be replaced by `return line and line.isspace()` (the first
part being to handle the case of an empty line: in the current
implementation the line will be returned directly if no whitespace is
found, which will be "negative" for an empty line, and ''.isspace() ->
false). Does that fix the failing tests?
More information about the Python-Dev
mailing list