SEEK_* constants in io and os

Hello, I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA). Additional data points: other modules take these constants as arguments - * mmap: directs to use os.SEEK_* * chunk and fcntk: spell out the numeric values. os seems to import io in some functions; can this be done always? If yes, we can just define the constants once and os.SEEK_* will alias io.SEEK_*? The other way (io taking from os) is also a possibility (maybe the preferred one because io already refers to os.SEEK_HOLE/DATA, at least in the documentation). Any ideas and suggestions are welcome, Eli

Le Sun, 1 Sep 2013 18:02:30 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
Hello,
I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA).
What is the runtime cost of doing so? os is a fundamental module that is imported by almost every Python program. Regards Antoine.

On Mon, Sep 2, 2013 at 1:24 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 1 Sep 2013 18:02:30 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
Hello,
I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA).
What is the runtime cost of doing so? os is a fundamental module that is imported by almost every Python program.
Theoretically, it should be very low given that we just need to add an import and define one class. os already does a number of things in its toplevel (mostly a few imports which transitively do other things). Compounded with import caching, since this is done just once per run, doesn't seem like a problem. Empirically, I tried measuring it but I can't discern a difference with/without translating SEEK_* to enums. There's a fluctuation of ~1usec which I can't distinguish from noise. Let me know if you have a good methodology of benchmarking these things Eli

Le Mon, 2 Sep 2013 06:18:31 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
On Mon, Sep 2, 2013 at 1:24 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 1 Sep 2013 18:02:30 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
Hello,
I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA).
What is the runtime cost of doing so? os is a fundamental module that is imported by almost every Python program.
Theoretically, it should be very low given that we just need to add an import and define one class. os already does a number of things in its toplevel (mostly a few imports which transitively do other things). Compounded with import caching, since this is done just once per run, doesn't seem like a problem.
Empirically, I tried measuring it but I can't discern a difference with/without translating SEEK_* to enums. There's a fluctuation of ~1usec which I can't distinguish from noise. Let me know if you have a good methodology of benchmarking these things
How did you get that result? You have to remove to "os" from sys.modules before importing it again, otherwise "import os" will simply return the already imported module. Regards Antoine.

Le Mon, 2 Sep 2013 15:45:22 +0200, Antoine Pitrou <solipsis@pitrou.net> a écrit :
Le Mon, 2 Sep 2013 06:18:31 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
On Mon, Sep 2, 2013 at 1:24 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 1 Sep 2013 18:02:30 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
Hello,
I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA).
What is the runtime cost of doing so? os is a fundamental module that is imported by almost every Python program.
Theoretically, it should be very low given that we just need to add an import and define one class. os already does a number of things in its toplevel (mostly a few imports which transitively do other things). Compounded with import caching, since this is done just once per run, doesn't seem like a problem.
Empirically, I tried measuring it but I can't discern a difference with/without translating SEEK_* to enums. There's a fluctuation of ~1usec which I can't distinguish from noise. Let me know if you have a good methodology of benchmarking these things
How did you get that result? You have to remove to "os" from sys.modules before importing it again, otherwise "import os" will simply return the already imported module.
Oh and remove "enum" too... Regards Antoine.

On Mon, Sep 2, 2013 at 6:51 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Mon, 2 Sep 2013 15:45:22 +0200, Antoine Pitrou <solipsis@pitrou.net> a écrit :
Le Mon, 2 Sep 2013 06:18:31 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
On Mon, Sep 2, 2013 at 1:24 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 1 Sep 2013 18:02:30 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
Hello,
I was looking at the possibility of replacing the SEEK_* constants by IntEnums, and the first thing that catches attention is that these constants are defined in both Lib/os.py and Lib/io.py; both places also recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers to os.SEEK_HOLE and os.SEEK_DATA).
What is the runtime cost of doing so? os is a fundamental module that is imported by almost every Python program.
Theoretically, it should be very low given that we just need to add an import and define one class. os already does a number of things in its toplevel (mostly a few imports which transitively do other things). Compounded with import caching, since this is done just once per run, doesn't seem like a problem.
Empirically, I tried measuring it but I can't discern a difference with/without translating SEEK_* to enums. There's a fluctuation of ~1usec which I can't distinguish from noise. Let me know if you have a good methodology of benchmarking these things
How did you get that result? You have to remove to "os" from sys.modules before importing it again, otherwise "import os" will simply return the already imported module.
Oh and remove "enum" too...
Yes, now I see a 500 usec difference timed within the Python script. When timing the whole execution of Python: $ sudo nice -n -20 perf stat -r 100 python -c 'import os' It still gets lost in the noise, though. I usually get around 34 ms per run with 1 - 1.5% jitter. Since 1.5% of 34 ms is ~0.5 ms, it's difficult to distinguish the runs clearly. That 0.5 ms seems to be a one-time penalty for the importing of enum (most time goes to importing, not defining the new class based on IntEnum), no matter where/how it's used in the stdlib and the user code. Eli

2013/9/2 Eli Bendersky <eliben@gmail.com>:
Yes, now I see a 500 usec difference timed within the Python script. When timing the whole execution of Python: (...)
Can you please provide the list of imported modules by: python -c 'import sys; print(sys.modules)' For python with default options and for python with -S (no site module) options? And also with your patch? Python should be fast to write "hello world" (or "python -c pass), it's a dummy but common benchmark (to compare Python to other VM / other programming languages). Victor

On Mon, Sep 2, 2013 at 8:48 AM, Victor Stinner <victor.stinner@gmail.com>wrote:
2013/9/2 Eli Bendersky <eliben@gmail.com>:
Yes, now I see a 500 usec difference timed within the Python script. When timing the whole execution of Python: (...)
Can you please provide the list of imported modules by: python -c 'import sys; print(sys.modules)'
For python with default options and for python with -S (no site module) options? And also with your patch?
Python should be fast to write "hello world" (or "python -c pass), it's a dummy but common benchmark (to compare Python to other VM / other programming languages).
The sorted list for both default and -S (they're identical) is http://pastebin.com/4vzSMCu7 - there are 55 entries there, including things like itertools, heapq, functools and collections. With my patch, enum is also added (which makes sense since os is in the list). So the 0.5ms increase for the 34ms runtime kind-of makes sense. Eli

On Mon, Sep 2, 2013 at 8:48 AM, Victor Stinner <victor.stinner@gmail.com>wrote:
2013/9/2 Eli Bendersky <eliben@gmail.com>:
Yes, now I see a 500 usec difference timed within the Python script. When timing the whole execution of Python: (...)
Can you please provide the list of imported modules by: python -c 'import sys; print(sys.modules)'
For python with default options and for python with -S (no site module) options? And also with your patch?
Python should be fast to write "hello world" (or "python -c pass), it's a dummy but common benchmark (to compare Python to other VM / other programming languages).
The sorted list for both default and -S (they're identical) is http://pastebin.com/4vzSMCu7 - there are 55 entries there, including things like itertools, heapq, functools and collections. With my patch, enum is also added (which makes sense since os is in the list). So the 0.5ms increase for the 34ms runtime kind-of makes sense.
This question is still kind-of open. I haven't received additional feedback - only Antoine's and Victor's concerns wrt. runtime cost, which I believe I addressed. Note that the runtime cost is globally one-time in the sense that if additional modules (whether used at start-up or not) use enum, this is a price only paid once. So, is it worth it the extra 0.5ms of start-up time, or not? Eli

On 09/01/2013 06:02 PM, Eli Bendersky wrote:
os seems to import io in some functions; can this be done always? If yes, we can just define the constants once and os.SEEK_* will alias io.SEEK_*? The other way (io taking from os) is also a possibility (maybe the preferred one because io already refers to os.SEEK_HOLE/DATA, at least in the documentation).
Any ideas and suggestions are welcome,
Ideally we should only define them once. If these are values that could change per O/S then they should be in the os module. Since they /can/ change per O/S (even if they don't currently), we should put them in os. I'd say it's worth the extra 0.5ms startup time. -- ~Ethan~
participants (4)
-
Antoine Pitrou
-
Eli Bendersky
-
Ethan Furman
-
Victor Stinner