[Python-ideas] Move optional data out of pyc files

Petr Viktorin encukou at gmail.com
Wed Apr 11 04:26:15 EDT 2018


On 04/11/18 06:21, Chris Angelico wrote:
> On Wed, Apr 11, 2018 at 1:02 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> On Wed, Apr 11, 2018 at 10:08:58AM +1000, Chris Angelico wrote:
>>
>>> File system limits aren't usually an issue; as you say, even FAT32 can
>>> store a metric ton of files in a single directory. I'm more interested
>>> in how long it takes to open a file, and whether doubling that time
>>> will have a measurable impact on Python startup time. Part of that
>>> cost can be reduced by using openat(), on platforms that support it,
>>> but even with a directory handle, there's still a definite non-zero
>>> cost to opening and reading an additional file.
>>
>> Yes, it will double the number of files. Actually quadruple it, if the
>> annotations and line numbers are in separate files too. But if most of
>> those extra files never need to be opened, then there's no cost to them.
>> And whatever extra cost there is, is amortized over the lifetime of the
>> interpreter.
> 
> Yes, if they are actually not needed. My question was about whether
> that is truly valid. Consider a very common use-case: an OS-provided
> Python interpreter whose files are all owned by 'root'. Those will be
> distributed with .pyc files for performance, but you don't want to
> deprive the users of help() and anything else that needs docstrings
> etc.

Currently in Fedora, we ship *both* optimized and non-optimized pycs to 
make sure both -O and non--O will work nicely without root privilieges. 
So splitting the docstrings into a separate file would be, for us, a 
benefit in terms of file size.


> So... are the docstrings lazily loaded or eagerly loaded? If
> eagerly, you've doubled the number of file-open calls to initialize
> the interpreter. (Or quadrupled, if you need annotations and line
> numbers and they're all separate.) If lazily, things are a lot more
> complicated than the original description suggested, and there'd need
> to be some semantic changes here.
> 
>> Serhiy is experienced enough that I think we should assume he's not
>> going to push this optimization into production unless it actually does
>> reduce startup time. He has proven himself enough that we should assume
>> competence rather than incompetence :-)
> 
> Oh, I'm definitely assuming that he knows what he's doing :-) Doesn't
> mean I can't ask the question though.
> 
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 


More information about the Python-ideas mailing list