[Python-Dev] Reading Python source file

Guido van Rossum guido at python.org
Mon Nov 16 22:00:39 EST 2015


If you free the memory used for the source buffer before starting code
generation you should be good.

On Mon, Nov 16, 2015 at 5:53 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> I'm working on rewriting Python tokenizer (in particular the part that reads
> and decodes Python source file). The code is complicated. For now there are
> such cases:
>
> * Reading from the string in memory.
> * Interactive reading from the file.
> * Reading from the file:
>   - Raw reading ignoring encoding in parser generator.
>   - Raw reading UTF-8 encoded file.
>   - Reading and recoding to UTF-8.
>
> The file is read by the line. It makes hard to check correctness of the
> first line if the encoding is specified in the second line. And it makes
> very hard problems with null bytes and with desynchronizing buffered C and
> Python files. All this problems can be easily solved if read all Python
> source file in memory and then parse it as string. This would allow to drop
> a large complex and buggy part of code.
>
> Are there disadvantages in this solution? As for memory consumption, the
> source text itself will consume only small part of the memory consumed by
> AST tree and other structures. As for performance, reading and decoding all
> file can be faster then by the line.
>
> [1] http://bugs.python.org/issue25643
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list