[I18n-sig] Re: [Python-Dev] Pre-PEP: Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 06 Feb 2001 15:05:29 -0800

"Martin v. Loewis" wrote:
> > If we simply allowed string objects to support higher character
> > numbers I *cannot see* how that could break existing code.
> To take a specific example: What would you change about imp and
> py_compile.py? What is the type of imp.get_magic()? If character
> string, what about this fragment?
> ...
> Would that continue to write the same file that the current version
> writes?

Yes. Why wouldn't it?

You haven't specified an encoding for the file write so it would default
to what it does today. You aren't using any large characters so there is
no need for multi-byte encoding. Below is some code that may further
illuminate my idea.

wr_long is basically your code but it shows that chr and unichr are
interchangable by allowing you to pass in "func". magic is also passed
in as a string or unicode string with no ill effects.

I had to define a unicode() and oldstr() function to work around a bug
in the way Python does default conversions between Unicode strings and
ordinary strings. It should just map equivalent ordinals as my functions

import imp

def wr_long(f, x, func, magic):
    """Internal; write a 32-bit int to a file in little-endian
    f.write(func( x        & 0xff))
    f.write(func((x >> 8)  & 0xff))
    f.write(func((x >> 16) & 0xff))
    f.write(func((x >> 24) & 0xff))

def unicode(string):
    return u"".join([unichr(ord(char)) for char in string])

def oldstr(string):
    return "".join([chr(ord(char)) for char in string])

wr_long(open("out1.txt","wb"), 5, chr, str(imp.get_magic()))
wr_long(open("out2.txt","wb"), 5, chr, str(imp.get_magic()))
wr_long(open("out3.txt","wb"), 5, unichr, unicode(imp.get_magic()))
wr_long(open("out4.txt","wb"), 5, unichr, str(imp.get_magic()))

assert( open("out1.txt").read() == 
        open("out2.txt").read() == 
        open("out3.txt").read() == 

 Paul Prescod