Clever hack or code abomination?

Fri Dec 2 00:34:16 EST 2011

On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote:

> On Fri, Dec 2, 2011 at 11:15 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Try this on for size.
>>
>>
>>                f = type(q)(c[c.index(chr(45))+1:])+type(q)(1) c
>>                = str.join('\n', list(map(chr, (45, 48))) +
>>                [c])[::2]
>>            c = (lambda a,b: a+b)(c[:c.index(chr(45))+1],
>>            type(c)(f))
> 
> I would consider integer representations of ASCII to be code smell. It's
> not instantly obvious that 45 means '-', even if you happen to know the
> ASCII table by heart (which most people won't). This is one thing that I
> like about C's quote handling; double quotes for a string, or single
> quotes for an integer with that character's value. It's clearer than the
> Python (and other) requirement to have an actual function call:
> 
> for (int i=0;i<10;++i) {
>     digit[i]='0'+i;
>     letter[i]='A'+i;
> }

I would disagree that this is clear at all. You're adding what looks like 
a character, but is actually an integer, with an integer. And then just 
to add insult to injury, you're storing integers into arrays that are 
named as if they were characters. In what mad universe would you describe 
65 as a letter?

To say nothing of the fact that C's trick only works (for some definition 
of works) for ASCII. Take for example one of the many EBCDIC encodings, 
cp500. If you expect 'I' + 1 to equal 'J', you will be sorely 
disappointed:

py> u'I'.encode('cp500')
'\xc9'
py> u'J'.encode('cp500')
'\xd1'

Characters are not integers, and C conflates them, to the disservice of 
all. If fewer people learned C, fewer people would have such trouble 
understanding Unicode.

Anyone unfamiliar with C's model would have trouble guessing what 'A' + 1 
should mean. Should it be?

-  an error
-  'B'
-  'A1'
-  the numeric value of variable A plus 1
-  66  (assuming ascii encoding)
-  194  (assuming cp500 encoding)
-  some other number
-  something else?

How about 1000 + 'A'?

> versus
> 
> for i in range(10):
>     digit[i]=chr(ord('0')+i)
>     letter[i]=chr(ord('A')+i)

It's a tad more verbose, but it's explicit about what is being done. Take 
the character '0', find out what ordinal value it encodes to, add 1 to 
that value, re-encode back to a character. That's exactly what C does, 
only it does it explicitly.

Note that this still doesn't work the way we might like in EBCDIC, but 
the very fact that you are forced to think about explicit conversion 
steps means you are less likely to make unwarranted assumptions about 
what characters convert to.

Better than both, I would say, would be for string objects to have 
successor and predecessor methods, that skip ahead (or back) the 
specified number of code points (defaulting to 1):

'A'.succ()  => 'B'
'A'.succ(5)  => 'F'

with appropriate exceptions if you try to go below 0 or above the largest 
code point.

-- 
Steven