[Tutor] clean text
spir
denis.spir at free.fr
Tue May 19 20:22:17 CEST 2009
Le Tue, 19 May 2009 10:49:15 -0700,
Emile van Sebille <emile at fenx.com> s'exprima ainsi:
> On 5/19/2009 10:19 AM spir said...
> > Le Tue, 19 May 2009 11:36:17 +0200,
> > spir <denis.spir at free.fr> s'exprima ainsi:
> >
> > [...]
> >
> > Thank you Albert, Kent, Sanders, Lie, Malcolm.
> >
> > This time regex wins! Thought it wouldn't because of the additional func
> > call (too bad we cannot pass a mapping to re.sub). Actually the diff. is
> > very small ;-) The relevant change is indeed using a dict. Replacing
> > string concat with ''.join() is slower (tested with 10 times and 100
> > times bigger strings too). Strange... Membership test in a set is only
> > very slightly faster than in dict keys.
>
> Hmm... this seems faster assuming it does the same thing...
>
> xlate = dict( (chr(c),chr(c)) for c in range(256))
> xlate.update(control_char_map)
>
> def cleanRepr5(text):
> return "".join([ xlate[c] for c in text ])
>
>
> Emile
Thank you, Emile.
I thought at this solution (having a dict for all chars). But I cannot use it because later I will extend the app to cope with unicode (~ 100_000 chars). So that I really need to filter which chars have to be converted.
A useful help I guess would be to have a builtin func that returns conventional char/string repr without "'...'" around.
Denis
PS
By the way, you don't need (anymore) to build a list comprehension for an outer func that walks through a sequence:
"".join( xlate[c] for c in text )
is a shortcut for
"".join( (xlate[c] for c in text) )
[a generator expression already inside () needs no additional parens -- as long as there is no additional arg -- see PEP 289 http://www.python.org/dev/peps/pep-0289/]
------
la vita e estrany
More information about the Tutor
mailing list