Replacing "illegal characters" in html

Robert Brewer fumanchu at amor.org
Sun May 9 16:08:52 EDT 2004


BenO wrote:
> I'm new to python and need to write a function to replace 
> certain characters
> in a string (html).
> 
> The characters I need to replace come from MS Word copy & 
> paste and are:
> 
> ' (Left quote)
> ' (Right quote)
> Double Left quotes
> Double Right quotes
> 
> Can anyone help me or point me in the right direction on an 
> efficient way of doing this?

The two methods most often used are 1) the .replace method of strings,
and 2) regular expressions.

1) The .replace method:

>>> replacemap = {""": '"', """: '"', "'": "'", "'": "'"}
>>> map(ord, replacemap.keys())
[145, 147, 146, 148]
>>> test = ""hl" 'oh'"
>>> for k, v in replacemap.iteritems():
... 	test = test.replace(k, v)
... 	
>>> test
'"hl" \'oh\''

2) Regular Expressions:

>>> import re
>>> test = ""hl" 'oh'"
>>> test = re.sub("[""]", '"', test)
>>> test = re.sub("['']", "'", test)
>>> test
'"hl" \'oh\''


Hope that helps!


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list