[Tutor] clean text

A.T.Hofkamp a.t.hofkamp at tue.nl
Tue May 19 13:28:09 CEST 2009


spir wrote:
> def _cleanRepr(text):
> 	''' text with control chars replaced by repr() equivalent '''
> 	chars = []
> 	for char in text:
> 		n = ord(char)
> 		if (n < 32) or (n > 126 and n < 160):
> 			char = repr(char)[1:-1]
> 		chars.append(char)
> 	return ''.join(chars)
> 
> But what else can I do?

You seem to break down the string to single characters, replace a few of them, 
and then build the whole string back.

Maybe you can insert larger chunks of text that do not need modification, ie 
something like

start = 0
for idx, char in text:
     n = ord(char)
     if n < 32 or 126 < n < 160:
         chars.append(text[start:idx])
         chars.append(repr(char)[1:-1])
         start = idx + 1
chars.append(text[start:])
return ''.join(chars)


An alternative of the above is to keep track of the first occurrence of each 
of the chars you want to split on (after some 'start' position), and compute 
the next point to break the string as the min of all those positions instead 
of slowly 'walking' to it by testing each character seperately.

That would reduce the number of iterations you do in the loop, at the cost of 
maintaining a large number of positions of the next breaking point.


Albert


More information about the Tutor mailing list