On 20.07.2020 20:58, Paul Ganssle wrote:

Hi all,

I was hoping to get some feedback on a proposed refactoring of the datetime module that should dramatically improve import performance.

The datetime module is implemented more or less in full both in pure Python and in C; the way that this is currently achieved is that the pure Python implementation is defined in datetime.py, and the C implementation is in _datetime, and after the full Python version is defined, the C version is star-imported and thus any symbols defined in both versions are taken from the C version; if the C version is used, any private symbols used only in the pure Python implementation are manually deleted (see the end of the file).

This adds a lot of unnecessary overhead, both to define a bunch of unused classes and functions and to import modules that are required for the pure Python implementation but not for the C implementation. In the issue he created about this, Victor Stinner demonstrated that moving the pure Python implementation to its own module would speed up the import of datetime by a factor of 4.

I think that we should indeed move the pure Python implementation into its own module, despite the fact that this is almost guaranteed to break some people either relying on implementation details or doing something funky with the import system — I don't think it should break anyone relying on the guaranteed public interface. The issue at hand is that we have two options available for the refactoring: either move the pure Python implementation to its own private top-level module (single file) such as `_pydatetime`, or make `datetime` a folder with an `__init__.py` and move the pure Python implementation to `datetime._pydatetime` or something of that nature.

What's the problem with

    from _datetime import *
except ImportError:
    <everything else>



    from _datetime import *
except ImportError:
    from _pydatetime import *

Would be more maintainable I guess.

The same goes for `pickle` then.

The decimal and zoneinfo modules both have this same issue; the decimal module uses the first strategy with _pydecimal and decimal, the zoneinfo module uses a folder with a zoneinfo._zoneinfo submodule. Assuming we go forward with this, we need to decide which strategy to adopt for datetime.

In favor of using a datetime/ folder, I'd say it's cleaner to put the pure Python implementation of datetime under the datetime namespace, and also it gives us more freedom to play with the module's structure in the future, since we could have lazily-imported sub-components, or we could implement some logic common to both implementations in Python and import it from a `datetime._common` module without requiring the C version to import the entire Python version, similar to the way zoneinfo has the zoneinfo._common module.

The downside of the folder method is that it complicates the way datetime is imported — especially if we add additional structure to the module, or add any logic into the __init__.py. Two single-file modules side-by-side, one imported by the other doesn't change anything about the nature of how the datetime module is imported, and is much less likely to break anything.

Anyone have thoughts or strong preferences here? Anyone have use cases where one or the other approaches is likely to cause a bunch of undue hardship? I'd like to avoid moving this more than once.


P.S. Victor's PR moving this code to _pydatetime is currently done in such a way that the ability to backport changes from post-refactoring to pre-refactoring branches is preserved; I have not checked but I think we should be able to do the same thing with the other strategy as well.

Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CCI7PDAL6G67XVVRKPP2FAYJ5YZYHTK3/
Code of Conduct: http://python.org/psf/codeofconduct/