[Tutor] clean text

Tue May 19 17:59:00 CEST 2009

spir wrote:
> Hello,
> 
> This is a follow the post on performance issues.
> Using a profiler, I realized that inside error message creation, most of the time was spent in a tool func used to clean up source text output.
> The issue is that when the source text holds control chars such as \n, then the error message is hardly readible. MY solution is to replace such chars with their repr():
> 
> def _cleanRepr(text):
> 	''' text with control chars replaced by repr() equivalent '''
> 	result = ""
> 	for char in text:
> 		n = ord(char)
> 		if (n < 32) or (n > 126 and n < 160):
> 			char = repr(char)[1:-1]
> 		result += char
> 	return result
> 
> For any reason, this func is extremely slow. While the rest of error message creation looks very complicated, this seemingly innocent consume > 90% of the time. The issue is that I cannot use repr(text), because repr will replace all non-ASCII characters. I need to replace only control characters.
> How else could I do that?
> 
> Denis
> ------
> la vita e estrany
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

If you're using python 3x., how about using the str.translate() ?

In python 3.x, str.translate() accepts dictionary argument which can do 
a single-char to multi-char replacement.

controls = list(range(0, 32)) + list(range(127, 160))
table = {char: repr(chr(char))[1:-1] for char in controls}

def _cleanRepr(text):
     return text.translate(table)