String comparison question

Michael Spencer mahs at telcopartners.com
Sun Mar 19 22:15:50 EST 2006


Olivier Langlois wrote:
> Hi Michael!
> 
> Your suggestion is fantastic and is doing exactly what I was looking
> for! Thank you very much.
> There is something that I'm wondering though. Why is the solution you
> proposed wouldn't work with Unicode strings?
> 
Simply, that str.translate with two arguments isn't implemented for unicode 
strings.  I don't know the underlying reason, or how hard it would be to change.

If you do need the comparison functionality for unicode strings, you'll have
to go with a different approach.  For example, using regular expressions:

import re
def compare2(a, b):
     """Compare two basestrings, disregarding whitespace -> bool"""
     return re.sub("\s*", "", a) == re.sub("\s*", "", b)

This is slower than the str.translate approach, though it has the advantage that 
you could easily modify it to normalize, rather than eliminate whitespace.  This 
would be a more useful comparison in many cases.

def compare3(a, b):
     """Compare two basestrings, normalizing whitespace -> bool"""
     return re.sub("\s*", " ", a) == re.sub("\s*", " ", b)

Continuing the disclaimers: none these approaches makes any attempt to deal 
specially with quoted whitespace or any other sort of escapes.

Michael




More information about the Python-list mailing list