[Python-Dev] Draft Guide for code migration and modernation

Walter Dörwald walter@livinglogic.de
Tue, 04 Jun 2002 16:06:29 +0200


Guido van Rossum wrote:

>>string.zfill has a "decadent feature": It also works for
>>non-string objects by calling repr before formatting.
> 
> 
> Hm, but repr() was the wrong thing to call here anyway. :-(

The old code used `x`. Should we change it to use str()?

>>>          c in string.whitespace --> c.isspace()
>>
>>This changes the meaning slightly for unicode characters, because
>>chr(i).isspace() != unichr(i).isspace()
>>for i in { 0x1c, 0x1d, 0x1e, 0x1f, 0x85, 0xa0 }
> 
> 
> That's unfortunate, because I'd like unicode to be an extension of
> ASCII also in this kind of functionality.  What are these and why are
> they considered spaces?

http://www.unicode.org/Public/UNIDATA/NamesList.txt says:
001C 
<control>
	= INFORMATION SEPARATOR FOUR
	= file separator (FS)
001D 
<control>
	= INFORMATION SEPARATOR THREE
	= group separator (GS)
001E 
<control>
	= INFORMATION SEPARATOR TWO
	= record separator (RS)
001F 
<control>
	= INFORMATION SEPARATOR ONE
	= unit separator (US)
0085 
<control>
	= NEXT LINE (NEL)
00A0 
NO-BREAK SPACE
	x (space - 0020)
	x (figure space - 2007)
	x (narrow no-break space - 202F)
	x (word joiner - 2060)
	x (zero width no-break space - FEFF)
	# <noBreak> 0020

> Would it hurt to make them spaces in ASCII
> too?

stringobject.c::string_isspace() currently uses the isspace()
function from <ctype.h>.

>>New ones:
>>
>>Pattern:  "foobar"[:3] == "foo" -> "foobar".startswith("foo")
>>           "foobar"[-3:] == "bar" -> "foobar".endswith("bar")
>>Version:  ??? (It was added on the string_methods branch)
> 
> 
> 2.0.
> 
> 
>>Benefits: Faster because no slice has to be created.
>>           No danger of miscounting.
>>Locating: grep "\[\w*-[0-9]*\w*:\w*\]" | grep "=="
>>           grep "\[\w*:\w*[0-9]*\w*\]" | grep "=="
> 
> 
> Are these regexes really worth making part of the migration guide?
> \w* isn't a good pattern to catch an arbitrary expression, it only
> catches simple identifiers!

Ouch, that was meant to be

grep "\[[[:space:]]*-[[:digit:]]*[[:space:]]*:[[:space:]]*\]" | grep "=="
grep "\[[[:space:]]*:[[:space:]]*[[:digit:]]*[[:space:]]*\]" | grep "=="

This doesn't find "foobar"[-len("bar"):]=="bar", only constants.

But at least it's a little better than vgrep. ;)

Bye,
    Walter Dörwald