[Patches] string.translate behaviour

M.-A. Lemburg mal@lemburg.com
Tue, 30 May 2000 16:42:03 +0200


Guido van Rossum wrote:
> 
> > From: Peter Schneider-Kamp <petersc@stud.ntnu.no>
> >
> > "M.-A. Lemburg" wrote:
> > >
> > > Note that Unicode uses a new approach here (which I find much
> > > more useful, BTW):
> > >
> > > """
> > > S.translate(table) -> unicode
> > >
> > > Return a copy of the string S, where all characters have been mapped
> > > through the given translation table, which must be a mapping of
> > > Unicode ordinals to Unicode ordinals or None. Unmapped characters
> > > are left untouched. Characters mapped to None are deleted.
> > > """
> >
> > Okay, maybe I am missing the point, but if I want to change
> > a to b, b to c and c to a I would like to write:
> >
> > s.translate("abc","bca")
> >
> > But as far as I can see I have to write something like this:
> >
> > s.translate(range(97)+[98,99,97])
> >
> > to get this behaviour. Not exactly intuitive.
> 
> Actually, you would have to write
> 
> s.translate({ord('a'):ord('b'), ord('b'):ord('c'), ord('c'):ord('a')})
> 
> I think it would make more sense to change the API so that the keys
> and values can be either ordinals or characters, so you can write
> 
> s.translate({'a':'b', 'b':'c', 'c':'a'})
> 
> The Unicode version should support this too.

Note that I chose ordinals as keys because of the fact that
someone may want to use sequences for the lookups (much
faster !).

If you want either ordinals or characters, then the algorithm
would have to do the following:

1. lookup ord(c)
2. lookup c
3. copy c as is 

Since the path 1-2-3 would be the most common (you usually
only want to change a few characters out of the whole set),
this would slow down the method considerably.

I'd vote for some elegant translate table constructor which
implements all the generalizations and then comes up with either
a dictionary, or, if the user wants this, a sequence which can
then be used as input for the method.

> (I'm not too keen on the ("abc", "bca") API, because in practice
> people will soon want to be able to use generalizations like
> ("a-z", "n-za-m") which cause way too much trouble to parse.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/