python tr equivalent (non-ascii)
Fredrik Lundh
fredrik at pythonware.com
Wed Aug 13 04:33:13 EDT 2008
kettle wrote:
> I was wondering how I ought to be handling character range
> translations in python.
>
> What I want to do is translate fullwidth numbers and roman alphabet
> characters into their halfwidth ascii equivalents.
> In perl I can do this pretty easily with tr:
>
> tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
>
> and I think the string.translate method is what I need to use to
> achieve the equivalent in python. Unfortunately the maktrans method
> doesn't seem to accept character ranges and I'm also having trouble
> with it's interpretation of length. What I came up with was to first
> fudge the ranges:
>
> my_test_string = u"ABCDEFG"
> f_range = "".join([unichr(x) for x in
> range(ord(u"\uff00"),ord(u"\uff5e"))])
> t_range = "".join([unichr(x) for x in
> range(ord(u"\u0020"),ord(u"\u007e"))])
>
> then use these as input to maketrans:
> my_trans_string =
> my_test_string.translate(string.maketrans(f_range,t_range))
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 0-93: ordinal not in range(128)
maketrans only works for byte strings.
as for translate itself, it has different signatures for byte strings
and unicode strings; in the former case, it takes lookup table
represented as a 256-byte string (e.g. created by maketrans), in the
latter case, it takes a dictionary mapping from ordinals to ordinals or
unicode strings.
something like
lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))
new_string = old_string.translate(lut)
could work (untested).
</F>
More information about the Python-list
mailing list