How do I automate the removal of all non-ascii characters from my code?

Dave Angel davea at ieee.org
Mon Sep 12 08:09:59 EDT 2011


On 01/-10/-28163 02:59 PM, Steven D'Aprano wrote:
> On Mon, 12 Sep 2011 06:43 pm Stefan Behnel wrote:
>
>> I'm not sure what you are trying to say with the above code, but if it's
>> the code that fails for you with the exception you posted, I would guess
>> that the problem is in the "[more stuff here]" part, which likely contains
>> a non-ASCII character. Note that you didn't declare the source file
>> encoding above. Do as Gary told you.
> Even with a source code encoding, you will probably have problems with
> source files including \xe2 and other "bad" chars. Unless they happen to
> fall inside a quoted string literal, I would expect to get a SyntaxError.
>
> I have come across this myself. While I haven't really investigated in great
> detail, it appears to happen when copying and pasting code from a document
> (usually HTML) which uses non-breaking spaces instead of \x20 space
> characters. All it takes is just one to screw things up.
>
>

For me, more common than non-breaking space is the "smart quotes" 
characters.  In that case, one probably doesn't want to delete them, but 
instead convert them into standard quotes.

DaveA



More information about the Python-list mailing list