[Python-Dev] Difference in RE between 3.2 and 3.3 (or Aaron Swartz memorial)

Xavier Morel catch-all at masklinn.net
Thu Mar 7 11:31:03 CET 2013


On 2013-03-07, at 11:08 , Matej Cepl wrote:

> On 2013-03-06, 18:34 GMT, Victor Stinner wrote:
>> In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's
>> not surprising that minor details like singleton differ. You should
>> not use "is" to compare strings in Python, or your program will fail
>> on other Python implementations (like PyPy, IronPython, or Jython) or
>> even on a different CPython version.
> 
> I am sorry, I don't understand what you are saying. Even though 
> this has been changed to 
> https://github.com/mcepl/html2text/blob/fix_tests/html2text.py#L90 
> the tests still fail.
> 
> But, Amaury is right: the function doesn't make much sense. 
> However, ...
> 
> when I have “fixed it” from
> https://github.com/mcepl/html2text/blob/master/html2text.py#L95
> 
> def onlywhite(line):
>     """Return true if the line does only consist of whitespace characters."""
>     for c in line:
>         if c is not ' ' and c is not '  ':
>             return c is ' '
>     return line
> 
> to 
> https://github.com/mcepl/html2text/blob/fix_tests/html2text.py#L90
> 
> def onlywhite(line):
>     """Return true if the line does only consist of whitespace 
>     characters."""
>         for c in line:
>            if c != ' ' and c != ' ':
>               return c == ' '
>         return line

The second test looks like some kind of corruption, it's supposedly
iterating on the characters of a line yet testing for two spaces? Is it
possible that the original was a literal tab embedded in the source code
(instead of '\t') and that got broken at some point?

According to its name + docstring, the implementation of this method
should really be replaced by `return line and line.isspace()` (the first
part being to handle the case of an empty line: in the current
implementation the line will be returned directly if no whitespace is
found, which will be "negative" for an empty line, and ''.isspace() ->
false). Does that fix the failing tests?


More information about the Python-Dev mailing list