Delete all not allowed characters..

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Thu Oct 25 14:27:17 EDT 2007


On Thu, 25 Oct 2007 07:52:36 -0700, Abandoned wrote:

> Hi..
> I want to delete all now allowed characters in my text. I use this
> function:
> 
> def clear(s1=""):
>     if s1:
>         allowed =
> [u'+',u'0',u'1',u'2',u'3',u'4',u'5',u'6',u'7',u'8',u'9',u' ', u'Ş',
> u'ş', u'Ö', u'ö', u'Ü', u'ü', u'Ç', u'ç', u'İ', u'ı', u'Ğ', u'ğ', 'A',
> 'C', 'B', 'E', 'D', 'G', 'F', 'I', 'H', 'K', 'J', 'M', 'L', 'O', 'N',
> 'Q', 'P', 'S', 'R', 'U', 'T', 'W', 'V', 'Y', 'X', 'Z', 'a', 'c', 'b',
> 'e', 'd', 'g', 'f', 'i', 'h', 'k', 'j', 'm', 'l', 'o', 'n', 'q', 'p',
> 's', 'r', 'u', 't', 'w', 'v', 'y', 'x', 'z']
>         s1 = "".join(ch for ch in s1 if ch in allowed) return s1


You don't need to make allowed a list. Make it a string, it is easier to 
read.

allowed = u'+0123456789 ŞşÖöÜüÇçİıĞğ' \
u'ACBEDGFIHKJMLONQPSRUTWVYXZacbedgfihkjmlonqpsrutwvyxz'


> ....And my problem this function replace the character to "" but i want
> to " "
> for example:
> input: Exam%^^ple
> output: Exam   ple


I think the most obvious way is this:

def clear(s):
    allowed = u'+0123456789 ŞşÖöÜüÇçİıĞğ' \
    u'ACBEDGFIHKJMLONQPSRUTWVYXZacbedgfihkjmlonqpsrutwvyxz'
    L = []
    for ch in s:
        if ch in allowed: L.append(ch)
        else: L.append(" ")
    return ''.join(s)


Perhaps a better way is to use a translation table:

def clear(s):
    allowed = u'+0123456789 ŞşÖöÜüÇçİıĞğ' \
    u'ACBEDGFIHKJMLONQPSRUTWVYXZacbedgfihkjmlonqpsrutwvyxz'
    not_allowed = [i for i in range(0x110000) if unichr(i) not in allowed]
    table = dict(zip(not_allowed, u" "*len(not_allowed)))
    return s.translate(table)

Even better is to pre-calculate the translation table, so it is 
calculated only when needed:

TABLE = None
def build_table():
    global TABLE
    if TABLE is None:
        allowed = u'+0123456789 ŞşÖöÜüÇçİıĞğ' \
        u'ACBEDGFIHKJMLONQPSRUTWVYXZacbedgfihkjmlonqpsrutwvyxz'
        not_allowed = \
        [i for i in range(0x110000) if unichr(i) not in allowed]
        TABLE = dict(zip(not_allowed, u" "*len(not_allowed)))
    return TABLE

def clear(s):
    return s.translate(build_table())


The first time you call clear(), it will take a second or so to build the 
translation table, but then it will be very fast.



-- 
Steven.



More information about the Python-list mailing list