[Python-ideas] Error handling for unknown Unicode characters (was Re: allow `lambda' to be spelled λ)

Thu Jul 21 13:48:36 EDT 2016

On 7/21/2016 3:41 AM, Nick Coghlan wrote:
> On 21 July 2016 at 15:08, Rustom Mody <rustompmody at gmail.com> wrote:
>> My “wrongheaded” was (intended) quite narrow and technical:
>>
>> - The embargo on non-ASCII everywhere in the language except identifiers
>> (strings
>>   and comments obviously dont count as “in” the language
>> - The opening of identifiers to large swathes of Unicode widens as you say
>>   hugely the surface area of attack
>>
>> This was solely the contradiction I was pointing out.
>
> OK, thanks for the clarification, and my apologies for jumping on you.
> I can be a bit hypersensitive on this topic, as my day job sometimes
> includes encouraging commercial redistributors and end users to stop
> taking community volunteers for granted and instead help find ways to
> ensure their work is sustainable :)
>
> As it is, I think there are some possible checks that could be added
> to the code generator pipeline to help clarify matters:
>
> - for the "invalid character" error message, we should be able to
> always report both the printed symbol *and* the ASCII hex escape,
> rather than assuming the caret will point to the correct place
> - the caret positioning logic for syntax errors needs to be checked to
> see if it's currently counting encoded UTF-8 bytes instead of code
> points (as that will consistently do the wrong thing on a correctly
> configured UTF-8 terminal)
> - (more speculatively) when building the symbol table, we may be able
> to look for identifiers referenced in a namespace that are not NKFC
> equivalent, but nevertheless qualify as Unicode confusables, and emit
> a SyntaxWarning (this is speculative, as I'm not sure what degree of
> performance hit would be associated with it)
>
> As far as Danilo's observation regarding the CPython code generator
> always emitting SyntaxError and SyntaxWarning (regardless of which
> part of the code generation actually failed) goes, I wouldn't be
> opposed to our getting more precise about that by defining additional
> subclasses, but one of the requirements would be for documentation in
> https://docs.python.org/devguide/compiler.html or helper functions in
> the source to clearly define "when working on <this> part of the code
> generation pipeline, raise <that> kind of error if something goes
> wrong".
>
> Cheers,
> Nick.
>


-- 
Terry Jan Reedy