Clever hack or code abomination?

Fri Dec 2 03:54:24 EST 2011

On Fri, 02 Dec 2011 17:02:01 +1100, Chris Angelico wrote:

> On Fri, Dec 2, 2011 at 4:34 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote:
>>> I would consider integer representations of ASCII to be code smell.
>>> It's not instantly obvious that 45 means '-', even if you happen to
>>> know the ASCII table by heart (which most people won't).
> 
> Note, I'm not saying that C's way is perfect; merely that using the
> integer 45 to represent a hyphen is worse.

Dude, it was deliberately obfuscated code. I even named the function 
"obfuscated_prefixes". I thought that would have been a hint <wink>

It's kinda scary that of all the sins against readability committed in my 
function, including isinstance(type(c), type(type)) which I was 
particularly proud of, the only criticism you came up with was that 
chr(45) is hard to read. I'm impressed <grins like a mad thing>

[...]
>> Note that this still doesn't work the way we might like in EBCDIC, but
>> the very fact that you are forced to think about explicit conversion
>> steps means you are less likely to make unwarranted assumptions about
>> what characters convert to.
> 
> I don't know about that. Anyone brought up on ASCII and moving to EBCDIC
> will likely have trouble with this, no matter how many function calls it
> takes.

Of course you will, because EBCDIC is a pile of festering garbage :)

But IMAO you're less likely to have trouble with with Unicode if you 
haven't been trained to treat characters as synonymous with integers.

And besides, given how rare such byte-manipulations on ASCII characters 
are in Python, it would be a shame to lose the ability to use '' and "" 
for strings just to avoid calling ord and chr functions.

>> Better than both, I would say, would be for string objects to have
>> successor and predecessor methods, that skip ahead (or back) the
>> specified number of code points (defaulting to 1):
>>
>> 'A'.succ()  => 'B'
>> 'A'.succ(5)  => 'F'
>>
>> with appropriate exceptions if you try to go below 0 or above the
>> largest code point.
> 
> ... and this still has that same issue. Arithmetic on codepoints depends
> on that.

We shouldn't be doing arithmetic on code points. Or at least we shouldn't 
unless we are writing a Unicode library that *needs* to care about the 
implementation. We should only care about the interface, that the 
character after 'A' is 'B'. Implementation-wise, we shouldn't care 
whether A and B are represented in memory by 0x0041 and 0x0042, or by 
0x14AF and 0x9B30. All we really need to know is that B comes immediately 
after A. Everything else is implementation.

But I fear that the idea of working with chr and ord is far to ingrained 
now to get rid of it.

-- 
Steven