python tr equivalent (non-ascii)
kettle
Josef.Robert.Novak at gmail.com
Wed Aug 13 06:12:48 EDT 2008
On Aug 13, 5:33 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> kettle wrote:
> > I was wondering how I ought to be handling character range
> > translations in python.
>
> > What I want to do is translate fullwidth numbers and roman alphabet
> > characters into their halfwidth ascii equivalents.
> > In perl I can do this pretty easily with tr:
>
> > tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
>
> > and I think the string.translate method is what I need to use to
> > achieve the equivalent in python. Unfortunately the maktrans method
> > doesn't seem to accept character ranges and I'm also having trouble
> > with it's interpretation of length. What I came up with was to first
> > fudge the ranges:
>
> > my_test_string = u"ABCDEFG"
> > f_range = "".join([unichr(x) for x in
> > range(ord(u"\uff00"),ord(u"\uff5e"))])
> > t_range = "".join([unichr(x) for x in
> > range(ord(u"\u0020"),ord(u"\u007e"))])
>
> > then use these as input to maketrans:
> > my_trans_string =
> > my_test_string.translate(string.maketrans(f_range,t_range))
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in ?
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > 0-93: ordinal not in range(128)
>
> maketrans only works for byte strings.
>
> as for translate itself, it has different signatures for byte strings
> and unicode strings; in the former case, it takes lookup table
> represented as a 256-byte string (e.g. created by maketrans), in the
> latter case, it takes a dictionary mapping from ordinals to ordinals or
> unicode strings.
>
> something like
>
> lut = dict((0xff00 + ch, 0x0020 + ch) for ch in range(0x80))
>
> new_string = old_string.translate(lut)
>
> could work (untested).
>
> </F>
excellent. i didnt realize from the docs that i could do that. thanks
More information about the Python-list
mailing list