How do I automate the removal of all non-ascii characters from my code?
stefan_ml at behnel.de
Mon Sep 12 04:43:51 EDT 2011
Alec Taylor, 12.09.2011 10:33:
> from creole import html2creole
> from BeautifulSoup import BeautifulSoup
> VALID_TAGS = ['strong', 'em', 'p', 'ul', 'li', 'br', 'b', 'i', 'a', 'h1', 'h2']
> def sanitize_html(value):
> soup = BeautifulSoup(value)
> for tag in soup.findAll(True):
> if tag.name not in VALID_TAGS:
> tag.hidden = True
> return soup.renderContents()
> <p class="Standard"
> [more stuff here]
I'm not sure what you are trying to say with the above code, but if it's
the code that fails for you with the exception you posted, I would guess
that the problem is in the "[more stuff here]" part, which likely contains
a non-ASCII character. Note that you didn't declare the source file
encoding above. Do as Gary told you.
More information about the Python-list