Clever hack or code abomination?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Dec 2 00:34:16 EST 2011
On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote:
> On Fri, Dec 2, 2011 at 11:15 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Try this on for size.
>>
>>
>> f = type(q)(c[c.index(chr(45))+1:])+type(q)(1) c
>> = str.join('\n', list(map(chr, (45, 48))) +
>> [c])[::2]
>> c = (lambda a,b: a+b)(c[:c.index(chr(45))+1],
>> type(c)(f))
>
> I would consider integer representations of ASCII to be code smell. It's
> not instantly obvious that 45 means '-', even if you happen to know the
> ASCII table by heart (which most people won't). This is one thing that I
> like about C's quote handling; double quotes for a string, or single
> quotes for an integer with that character's value. It's clearer than the
> Python (and other) requirement to have an actual function call:
>
> for (int i=0;i<10;++i) {
> digit[i]='0'+i;
> letter[i]='A'+i;
> }
I would disagree that this is clear at all. You're adding what looks like
a character, but is actually an integer, with an integer. And then just
to add insult to injury, you're storing integers into arrays that are
named as if they were characters. In what mad universe would you describe
65 as a letter?
To say nothing of the fact that C's trick only works (for some definition
of works) for ASCII. Take for example one of the many EBCDIC encodings,
cp500. If you expect 'I' + 1 to equal 'J', you will be sorely
disappointed:
py> u'I'.encode('cp500')
'\xc9'
py> u'J'.encode('cp500')
'\xd1'
Characters are not integers, and C conflates them, to the disservice of
all. If fewer people learned C, fewer people would have such trouble
understanding Unicode.
Anyone unfamiliar with C's model would have trouble guessing what 'A' + 1
should mean. Should it be?
- an error
- 'B'
- 'A1'
- the numeric value of variable A plus 1
- 66 (assuming ascii encoding)
- 194 (assuming cp500 encoding)
- some other number
- something else?
How about 1000 + 'A'?
> versus
>
> for i in range(10):
> digit[i]=chr(ord('0')+i)
> letter[i]=chr(ord('A')+i)
It's a tad more verbose, but it's explicit about what is being done. Take
the character '0', find out what ordinal value it encodes to, add 1 to
that value, re-encode back to a character. That's exactly what C does,
only it does it explicitly.
Note that this still doesn't work the way we might like in EBCDIC, but
the very fact that you are forced to think about explicit conversion
steps means you are less likely to make unwarranted assumptions about
what characters convert to.
Better than both, I would say, would be for string objects to have
successor and predecessor methods, that skip ahead (or back) the
specified number of code points (defaulting to 1):
'A'.succ() => 'B'
'A'.succ(5) => 'F'
with appropriate exceptions if you try to go below 0 or above the largest
code point.
--
Steven
More information about the Python-list
mailing list