syntax difference

Bart bc at freeuk.com
Mon Jun 18 07:16:26 EDT 2018


On 18/06/2018 11:45, Chris Angelico wrote:
> On Mon, Jun 18, 2018 at 8:33 PM, Bart <bc at freeuk.com> wrote:


>> You're right in that neither task is that trivial.
>>
>> I can remove comments by writing a tokeniser which scans Python source and
>> re-outputs tokens one at a time. Such a tokeniser normally ignores comments.
>>
>> But to remove type hints, a deeper understanding of the input is needed. I
>> would need a parser rather than a tokeniser. So it is harder.
> 
> They would actually both end up the same. To properly recognize
> comments, you need to understand enough syntax to recognize them. To
> properly recognize type hints, you need to understand enough syntax to
> recognize them. And in both cases, you need to NOT discard important
> information like consecutive whitespace.

No. If syntax is defined on top of tokens, then at the token level, you 
don't need to know any syntax. The process that scans characters looking 
for the next token, will usually discard comments. Job done.

It is very different for type-hints as you will need to properly parse 
the source code.

As a simpler example, if the task was the eliminate the "+" symbol, that 
would be one kind of token; it would just be skipped when encountered. 
But if the requirement to eliminate only unary "+", and leave binary 
"+", then that can't be done at tokeniser level; it will not know the 
context.

(The matter of leading white space sometimes being important, is a minor 
detail. It just becomes a token of its own.)

> So in both cases, you would probably end up with something like 2to3.
> The effective work is going to be virtually identical. And.... there's
> another complication, if you want any form of generic tool. You have
> to ONLY remove certain comments, not others. For instance, you
> probably should NOT remove copyright/license comments.

What will those look like? If copyright/licence comments have their own 
specific syntax, then they just become another token which has to be 
recognised.

The main complication I can see is that, if this is really a one-time 
source-to-source translator so that you will be working with the result, 
then usually you will want to keep the comments.

Then it is a question of more precisely defining the task that such a 
translator is to perform.

-- 
bart



More information about the Python-list mailing list