How do I automate the removal of all non-ascii characters from my code?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Sep 13 04:15:39 EDT 2011
On Tue, 13 Sep 2011 05:49 pm jmfauth wrote:
> On 12 sep, 23:39, "Rhodri James" <rho... at wildebst.demon.co.uk> wrote:
>
>
>> Now read what Steven wrote again. The issue is that the program contains
>> characters that are syntactically illegal. The "engine" can be perfectly
>> correctly translating a character as a smart quote or a non breaking
>> space or an e-umlaut or whatever, but that doesn't make the character
>> legal!
>>
>
> Yes, you are right. I did not understand in that way.
>
> However, a small correction/precision. Illegal character
> do not exit. One can "only" have an ill-formed encoded code
> points or an illegal encoded code point representing a
> character/glyph.
You are wrong there. There are many ASCII characters which are illegal in
Python source code, at least outside of comments and string literals, and
possibly even there.
>>> code = "x = 1 + \b 2" # all ASCII characters
>>> print(code)
x = 1 + 2
>>> exec(code)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
x = 1 + 2
^
SyntaxError: invalid syntax
Now, imagine that somehow a \b ASCII backspace character somehow gets
introduced into your source file. When you go to run the file, or import
it, you will get a SyntaxError. Changing the encoding will not help.
--
Steven
More information about the Python-list
mailing list