Clever hack or code abomination?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Dec 2 03:54:24 EST 2011
On Fri, 02 Dec 2011 17:02:01 +1100, Chris Angelico wrote:
> On Fri, Dec 2, 2011 at 4:34 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote:
>>> I would consider integer representations of ASCII to be code smell.
>>> It's not instantly obvious that 45 means '-', even if you happen to
>>> know the ASCII table by heart (which most people won't).
>
> Note, I'm not saying that C's way is perfect; merely that using the
> integer 45 to represent a hyphen is worse.
Dude, it was deliberately obfuscated code. I even named the function
"obfuscated_prefixes". I thought that would have been a hint <wink>
It's kinda scary that of all the sins against readability committed in my
function, including isinstance(type(c), type(type)) which I was
particularly proud of, the only criticism you came up with was that
chr(45) is hard to read. I'm impressed <grins like a mad thing>
[...]
>> Note that this still doesn't work the way we might like in EBCDIC, but
>> the very fact that you are forced to think about explicit conversion
>> steps means you are less likely to make unwarranted assumptions about
>> what characters convert to.
>
> I don't know about that. Anyone brought up on ASCII and moving to EBCDIC
> will likely have trouble with this, no matter how many function calls it
> takes.
Of course you will, because EBCDIC is a pile of festering garbage :)
But IMAO you're less likely to have trouble with with Unicode if you
haven't been trained to treat characters as synonymous with integers.
And besides, given how rare such byte-manipulations on ASCII characters
are in Python, it would be a shame to lose the ability to use '' and ""
for strings just to avoid calling ord and chr functions.
>> Better than both, I would say, would be for string objects to have
>> successor and predecessor methods, that skip ahead (or back) the
>> specified number of code points (defaulting to 1):
>>
>> 'A'.succ() => 'B'
>> 'A'.succ(5) => 'F'
>>
>> with appropriate exceptions if you try to go below 0 or above the
>> largest code point.
>
> ... and this still has that same issue. Arithmetic on codepoints depends
> on that.
We shouldn't be doing arithmetic on code points. Or at least we shouldn't
unless we are writing a Unicode library that *needs* to care about the
implementation. We should only care about the interface, that the
character after 'A' is 'B'. Implementation-wise, we shouldn't care
whether A and B are represented in memory by 0x0041 and 0x0042, or by
0x14AF and 0x9B30. All we really need to know is that B comes immediately
after A. Everything else is implementation.
But I fear that the idea of working with chr and ord is far to ingrained
now to get rid of it.
--
Steven
More information about the Python-list
mailing list