[Python-Dev] Reading Python source file

M.-A. Lemburg mal at egenix.com
Tue Nov 17 04:59:06 EST 2015


On 17.11.2015 02:53, Serhiy Storchaka wrote:
> I'm working on rewriting Python tokenizer (in particular the part that reads and decodes Python
> source file). The code is complicated. For now there are such cases:
> 
> * Reading from the string in memory.
> * Interactive reading from the file.
> * Reading from the file:
>   - Raw reading ignoring encoding in parser generator.
>   - Raw reading UTF-8 encoded file.
>   - Reading and recoding to UTF-8.
> 
> The file is read by the line. It makes hard to check correctness of the first line if the encoding
> is specified in the second line. And it makes very hard problems with null bytes and with
> desynchronizing buffered C and Python files. All this problems can be easily solved if read all
> Python source file in memory and then parse it as string. This would allow to drop a large complex
> and buggy part of code.
> 
> Are there disadvantages in this solution? As for memory consumption, the source text itself will
> consume only small part of the memory consumed by AST tree and other structures. As for performance,
> reading and decoding all file can be faster then by the line.

A problem with this approach is that you can no
longer fail early and detect indentation errors et al. while
parsing the data (which may well come from a pipe).

Another related problem is that you have to wait for the full
input data before you can start compiling the code.

I don't think these situations are all that common, though,
so reading in the full source code before compiling it
sounds like a reasonable approach.

We use the same simplification in eGenix PyRun's emulation of
the Python command line interface and it has so far not
caused any problems.

> [1] http://bugs.python.org/issue25643

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 17 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________
2015-10-23: Released mxODBC Connect 2.1.5 ...     http://egenix.com/go85

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/



More information about the Python-Dev mailing list