a new type for sys.implementation

The implementation for sys.implementation is going to use a new (but "private") type[1]. It's basically equivalent to the following: class namespace: def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): keys = (k for k in self.__dict__ if not k.startswith('_')) pairs = ("{}={!r}".format(k, self.__dict__[k]) for k in sorted(keys)) return "{}({})".format(type(self).__name__, ", ".join(pairs)) There were other options for the type, but the new type was the best fit and not hard to do. Neither the type nor its API is directly exposed publicly, but it is still accessible through "type(sys.implementation)". This brings me to a couple of questions: 1. should we make the new type un-instantiable (null out tp_new and tp_init)? 2. would it be feasible to officially add the type (or something like it) in 3.3 or 3.4? I've had quite a bit of positive feedback on the type (otherwise I wouldn't bother bringing it up). But, if we don't add a type like this to Python, I'd rather close the loophole and call it good (i.e. *not* introduce a new type by stealth). My preference is for the type (or equivalent) to be blessed in the language. Regardless of the specific details of such a type, my more immediate concern is with the impact on sys.implementation of python-dev's general sentiment in this space. -eric [1] http://bugs.python.org/issue14673

On Thu, May 31, 2012 at 01:21:36AM -0600, Eric Snow wrote:
1. should we make the new type un-instantiable (null out tp_new and tp_init)?
Please don't. "Consenting adults" and all that. There's little things more frustrating that having a working type that does exactly what you want, except that some B&D coder has made it un-instantiable. Leave it undocumented and/or a single underscore name for the time being, with an aim to make it public in 3.4 if it is useful and there are no major objections. -- Steven

On Thu, May 31, 2012 at 8:26 PM, Mark Shannon <mark@hotpy.org> wrote:
Yes, because we want to use it in the sys module. As you get lower down in the interpreter stack, implementing things in Python actually starts getting painful because of bootstrapping issues (e.g. that's why both _structseq and collections.namedtuple exist). Personally, I suggest we just expose the new type as types.SimpleNamespace (implemented in Lib/types.py as "SimpleNamespace = type(sys.implementation)" and call it done. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, May 31, 2012 at 6:56 PM, Benjamin Peterson <benjamin@python.org>wrote:
Yes, why not do that instead of a new thing in C? I don't care about PyPy actually (since we kind of have to implement sys.implementation in python/RPython anyway, since it'll be different) but more that more code in C means more trouble usually. Another question (might be out of topic here). What we do in PyPy to avoid bootstrapping issues (since we have quite a bit implemented in Python, rather than RPython) is to "freeze" the bytecode at compile time (or make time) and put it in the executable. This avoids all sort of filesystem access issues, but might be slightly too complicated. Cheers, fijal

On Fri, Jun 1, 2012 at 7:16 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
The idea is that sys.implementation is the way some interpreter internal details are exposed to the Python layer, thus it needs to handled in the implementation language, and explicitly *not* in Python (if it's in Python, then the implementation has to come up with some *other* API for accessing those internals from Python code, thus missing a large part of the point of the exercise).
Yeah, we're already doing that for importlib._bootstrap. It's a PITA (especially when changing the compiler), and certainly not easier than just writing some C code for a component that's explicitly defined as being implementation specific. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Why? What is wrong with something like the following (for CPython)? class SysImplemention: "Define __repr__(), etc here " ... sys.implementation = SysImplemention() sys.implementation.name = 'cpython' sys.implementation.version = (3, 3, 0, 'alpha', 4) sys.implementation.hexversion = 0x30300a4 sys.implementation.cache_tag = 'cpython-33' Also, should the build/machine info be removed from sys.version and moved it to sys.implementation? Cheers, Mark.

On Fri, Jun 1, 2012 at 9:49 PM, Mark Shannon <mark@hotpy.org> wrote:
Because now you're double keying data in a completely unnecessary fashion. The sys module initialisation code already has access to the info needed to fill out sys.implementation correctly, moving that code somewhere else purely for the sake of getting to write it in Python instead of C would be foolish. Some things are best written in Python, some make sense to write in the underlying implementation language. This is one of the latter because it's all about implementation details.
Also, should the build/machine info be removed from sys.version and moved it to sys.implementation?
No, as the contents of sys.version are already defined as implementation dependent. It remains as the easy to print version, while sys.implementation provides a programmatic interface. There may be other CPython-specific fields currently in sys.version that it makes sense to also include in sys.implementation, but: 1. That's *as well as*, not *instead of* 2. It's something that can be looked at *after* the initial implementation of the PEP has been checked in (and should only be done with a concrete use case, such as eliminating sys.version introspection in other parts of the stdlib or in third party code) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Previously you said that "it needs to handled in the implementation language, and explicitly *not* in Python". I asked why that was. Now you seem to be suggesting that Python code would break the DRY rule, but the C code would not. If the C code can avoid duplication, then so can the Python code, as follows: class SysImplementation: "Define __repr__(), etc here " ... import imp tag = imp.get_tag() sys.implementation = SysImplementation() sys.implementation.name = tag[:tag.index('-')] sys.implementation.version = sys.version_info sys.implementation.hexversion = sys.hexversion sys.implementation.cache_tag = tag Cheers, Mark.

On Fri, Jun 1, 2012 at 11:17 PM, Mark Shannon <mark@hotpy.org> wrote:
This is wrong. sys.version_info is the language version, while sys.implementation.version is the implementation version. They happen to be the same for CPython because it's the reference interpreter, but splitting the definition like this allows (for example) a 0.1 release of a new implementation to target Python 3.3 and clearly indicate the difference between the two version numbers. As the PEP's rationale section says: "The status quo for implementation-specific information gives us that information in a more fragile, harder to maintain way. It is spread out over different modules or inferred from other information, as we see with platform.python_implementation(). This PEP is the main alternative to that approach. It consolidates the implementation-specific information into a single namespace and makes explicit that which was implicit." The idea of the PEP is provide a standard channel from the implementation specific parts of the interpreter (i.e. written in the implementation language) through to the implementation independent code in the standard library (i.e. written in Python). It's intended to *replace* the legacy APIs in the long run, not rely on them. While we're unlikely to bother actually deprecating legacy APIs like imp.get_tag() (as it isn't worth the hassle), PEP 421 means we can at least avoid *adding* to them. To achieve the aims of the PEP without double-keying data it *has* to be written in C. The long term goal here is that all the code in the standard library should be implementation independent - PyPy, Jython, IronPython, et al should be able to grab it and just run it. That means the implementation specific stuff needs to migrate into the C code and get exposed through standard APIs. PEP 421 is one step along that road. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jun 01, 2012, at 11:49 PM, Nick Coghlan wrote:
Exactly. Or to put it another way, if you implemented sys.implementation in some stdlib Python module, you wouldn't be able to share that module between the various Python implementations. I think the stdlib should strive for *more* commonality across Python implementations, not less. Yes, you could conditionalize your way around that, but why do it when writing the code in the interpreter implementation language is easy enough? Plus, who wants to maintain the ugly mass of if-statements that would probably require? Eric's C code is easily auditable to anyone who knows the C API well enough, and I can't imagine it wouldn't be pretty trivial to write it in Java, RPython, or C#. Cheers, -Barry

On Thu, May 31, 2012 at 9:08 PM, Barry Warsaw <barry@python.org> wrote:
Not only that, but any new/experimental/etc. implementation would either have be blessed in that module by Python committers (a la the platform module*) or would have to use a fork of the standard library. * I don't mean to put down the platform module, with has and will continue to serve us well. Rather, just pointing out that a small part of it demonstrates a limitation in the stdlib relative to alternate implementations.
And I'm by no means a C veteran. :) -eric

Nick Coghlan wrote:
I thought this list was for CPython, not other implementations ;)
I'm not arguing with the PEP, just discussing how to implement it.
Could you justify that last sentence. What is special about C that means that information does not have to be duplicated, yet it must be in Python? I've already provided two implementations. The second derives all the information it needs from other sources, thus conforming to DRY. If the use of imp bothers you, then would this be OK: I just picked imp.get_tag() because it has the relevant info. Would: sys.implementation.cache_tag = (sys.implementation.name + '-' + str(sys.version_info[0]) + str(str(sys.version_info[1])) be acceptable?
I don't see how that is relevant. sys.implementation can never be part of the shared stdlib. That does not mean it has to be implemented in C. Cheers, Mark

On Jun 01, 2012, at 03:22 PM, Mark Shannon wrote:
I thought this list was for CPython, not other implementations ;)
This list serves a dual purpose. Its primary purpose is to discuss development of Python-the-language. It's also where discussions about CPython-the-implementation occur, but that's because CPython is the current reference implementation of the language. While python-dev is not the primary forum for discussing implementation details of alternative implementations, I hope that those are not off-limits for this list, and should be especially welcome for issues that pertain to Python-the-language. Remember too that PEPs drive language changes, PEPs (generally) apply to all implementations of the language, and python-dev is where PEPs get discussed. Cheers, -Barry

You have the burden of proof the wrong way around. sys is a builtin module. C is the default language, absent a compelling reason to use Python instead. The code is simple enough that there is no such reason, thus the implementation will be in C. -- Sent from my phone, thus the relative brevity :) On Jun 2, 2012 12:30 AM, "Mark Shannon" <mark@hotpy.org> wrote:

On Fri, Jun 1, 2012 at 7:17 AM, Mark Shannon <mark@hotpy.org> wrote:
This was actually the big motivator for PEP 421. Once PEP 421 is final, imp.get_tag() will get its value from sys.implementation, rather than the other way around. The point is to pull the implementation-specific values into one place (as much as is reasonable). -eric

On Fri, Jun 1, 2012 at 6:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Precisely. The PEP addresses this point directly: http://www.python.org/dev/peps/pep-0421/#adding-new-required-attributes -eric

On May 31, 2012, at 01:21 AM, Eric Snow wrote:
I did the initial review of the four patches that Eric uploaded and I agreed with him that this was the best fit for sys.implementation. (I need to review his updated patch, which I'll try to get to later today.)
This brings me to a couple of questions:
1. should we make the new type un-instantiable (null out tp_new and tp_init)?
I don't think this is necessary.
2. would it be feasible to officially add the type (or something like it) in 3.3 or 3.4?
I wouldn't be against it, but the implementation above (or really, the C equivalent in Eric's patch) isn't quite appropriate to be that type. Specifically, while I think that filtering out _names in the repr is fine for sys.implementation, it would not be appropriate for a generalized, public type. OTOH, I'd have no problem just dropping that detail from sys.implementation too. (Note of course that even with that, you can get the full repr via sys.implementation.__dict__.) Cheers, -Barry

On Thu, May 31, 2012 at 01:21:36AM -0600, Eric Snow wrote:
1. should we make the new type un-instantiable (null out tp_new and tp_init)?
Please don't. "Consenting adults" and all that. There's little things more frustrating that having a working type that does exactly what you want, except that some B&D coder has made it un-instantiable. Leave it undocumented and/or a single underscore name for the time being, with an aim to make it public in 3.4 if it is useful and there are no major objections. -- Steven

On Thu, May 31, 2012 at 8:26 PM, Mark Shannon <mark@hotpy.org> wrote:
Yes, because we want to use it in the sys module. As you get lower down in the interpreter stack, implementing things in Python actually starts getting painful because of bootstrapping issues (e.g. that's why both _structseq and collections.namedtuple exist). Personally, I suggest we just expose the new type as types.SimpleNamespace (implemented in Lib/types.py as "SimpleNamespace = type(sys.implementation)" and call it done. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, May 31, 2012 at 6:56 PM, Benjamin Peterson <benjamin@python.org>wrote:
Yes, why not do that instead of a new thing in C? I don't care about PyPy actually (since we kind of have to implement sys.implementation in python/RPython anyway, since it'll be different) but more that more code in C means more trouble usually. Another question (might be out of topic here). What we do in PyPy to avoid bootstrapping issues (since we have quite a bit implemented in Python, rather than RPython) is to "freeze" the bytecode at compile time (or make time) and put it in the executable. This avoids all sort of filesystem access issues, but might be slightly too complicated. Cheers, fijal

On Fri, Jun 1, 2012 at 7:16 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
The idea is that sys.implementation is the way some interpreter internal details are exposed to the Python layer, thus it needs to handled in the implementation language, and explicitly *not* in Python (if it's in Python, then the implementation has to come up with some *other* API for accessing those internals from Python code, thus missing a large part of the point of the exercise).
Yeah, we're already doing that for importlib._bootstrap. It's a PITA (especially when changing the compiler), and certainly not easier than just writing some C code for a component that's explicitly defined as being implementation specific. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Why? What is wrong with something like the following (for CPython)? class SysImplemention: "Define __repr__(), etc here " ... sys.implementation = SysImplemention() sys.implementation.name = 'cpython' sys.implementation.version = (3, 3, 0, 'alpha', 4) sys.implementation.hexversion = 0x30300a4 sys.implementation.cache_tag = 'cpython-33' Also, should the build/machine info be removed from sys.version and moved it to sys.implementation? Cheers, Mark.

On Fri, Jun 1, 2012 at 9:49 PM, Mark Shannon <mark@hotpy.org> wrote:
Because now you're double keying data in a completely unnecessary fashion. The sys module initialisation code already has access to the info needed to fill out sys.implementation correctly, moving that code somewhere else purely for the sake of getting to write it in Python instead of C would be foolish. Some things are best written in Python, some make sense to write in the underlying implementation language. This is one of the latter because it's all about implementation details.
Also, should the build/machine info be removed from sys.version and moved it to sys.implementation?
No, as the contents of sys.version are already defined as implementation dependent. It remains as the easy to print version, while sys.implementation provides a programmatic interface. There may be other CPython-specific fields currently in sys.version that it makes sense to also include in sys.implementation, but: 1. That's *as well as*, not *instead of* 2. It's something that can be looked at *after* the initial implementation of the PEP has been checked in (and should only be done with a concrete use case, such as eliminating sys.version introspection in other parts of the stdlib or in third party code) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Previously you said that "it needs to handled in the implementation language, and explicitly *not* in Python". I asked why that was. Now you seem to be suggesting that Python code would break the DRY rule, but the C code would not. If the C code can avoid duplication, then so can the Python code, as follows: class SysImplementation: "Define __repr__(), etc here " ... import imp tag = imp.get_tag() sys.implementation = SysImplementation() sys.implementation.name = tag[:tag.index('-')] sys.implementation.version = sys.version_info sys.implementation.hexversion = sys.hexversion sys.implementation.cache_tag = tag Cheers, Mark.

On Fri, Jun 1, 2012 at 11:17 PM, Mark Shannon <mark@hotpy.org> wrote:
This is wrong. sys.version_info is the language version, while sys.implementation.version is the implementation version. They happen to be the same for CPython because it's the reference interpreter, but splitting the definition like this allows (for example) a 0.1 release of a new implementation to target Python 3.3 and clearly indicate the difference between the two version numbers. As the PEP's rationale section says: "The status quo for implementation-specific information gives us that information in a more fragile, harder to maintain way. It is spread out over different modules or inferred from other information, as we see with platform.python_implementation(). This PEP is the main alternative to that approach. It consolidates the implementation-specific information into a single namespace and makes explicit that which was implicit." The idea of the PEP is provide a standard channel from the implementation specific parts of the interpreter (i.e. written in the implementation language) through to the implementation independent code in the standard library (i.e. written in Python). It's intended to *replace* the legacy APIs in the long run, not rely on them. While we're unlikely to bother actually deprecating legacy APIs like imp.get_tag() (as it isn't worth the hassle), PEP 421 means we can at least avoid *adding* to them. To achieve the aims of the PEP without double-keying data it *has* to be written in C. The long term goal here is that all the code in the standard library should be implementation independent - PyPy, Jython, IronPython, et al should be able to grab it and just run it. That means the implementation specific stuff needs to migrate into the C code and get exposed through standard APIs. PEP 421 is one step along that road. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jun 01, 2012, at 11:49 PM, Nick Coghlan wrote:
Exactly. Or to put it another way, if you implemented sys.implementation in some stdlib Python module, you wouldn't be able to share that module between the various Python implementations. I think the stdlib should strive for *more* commonality across Python implementations, not less. Yes, you could conditionalize your way around that, but why do it when writing the code in the interpreter implementation language is easy enough? Plus, who wants to maintain the ugly mass of if-statements that would probably require? Eric's C code is easily auditable to anyone who knows the C API well enough, and I can't imagine it wouldn't be pretty trivial to write it in Java, RPython, or C#. Cheers, -Barry

On Thu, May 31, 2012 at 9:08 PM, Barry Warsaw <barry@python.org> wrote:
Not only that, but any new/experimental/etc. implementation would either have be blessed in that module by Python committers (a la the platform module*) or would have to use a fork of the standard library. * I don't mean to put down the platform module, with has and will continue to serve us well. Rather, just pointing out that a small part of it demonstrates a limitation in the stdlib relative to alternate implementations.
And I'm by no means a C veteran. :) -eric

Nick Coghlan wrote:
I thought this list was for CPython, not other implementations ;)
I'm not arguing with the PEP, just discussing how to implement it.
Could you justify that last sentence. What is special about C that means that information does not have to be duplicated, yet it must be in Python? I've already provided two implementations. The second derives all the information it needs from other sources, thus conforming to DRY. If the use of imp bothers you, then would this be OK: I just picked imp.get_tag() because it has the relevant info. Would: sys.implementation.cache_tag = (sys.implementation.name + '-' + str(sys.version_info[0]) + str(str(sys.version_info[1])) be acceptable?
I don't see how that is relevant. sys.implementation can never be part of the shared stdlib. That does not mean it has to be implemented in C. Cheers, Mark

On Jun 01, 2012, at 03:22 PM, Mark Shannon wrote:
I thought this list was for CPython, not other implementations ;)
This list serves a dual purpose. Its primary purpose is to discuss development of Python-the-language. It's also where discussions about CPython-the-implementation occur, but that's because CPython is the current reference implementation of the language. While python-dev is not the primary forum for discussing implementation details of alternative implementations, I hope that those are not off-limits for this list, and should be especially welcome for issues that pertain to Python-the-language. Remember too that PEPs drive language changes, PEPs (generally) apply to all implementations of the language, and python-dev is where PEPs get discussed. Cheers, -Barry

You have the burden of proof the wrong way around. sys is a builtin module. C is the default language, absent a compelling reason to use Python instead. The code is simple enough that there is no such reason, thus the implementation will be in C. -- Sent from my phone, thus the relative brevity :) On Jun 2, 2012 12:30 AM, "Mark Shannon" <mark@hotpy.org> wrote:

On Fri, Jun 1, 2012 at 7:17 AM, Mark Shannon <mark@hotpy.org> wrote:
This was actually the big motivator for PEP 421. Once PEP 421 is final, imp.get_tag() will get its value from sys.implementation, rather than the other way around. The point is to pull the implementation-specific values into one place (as much as is reasonable). -eric

On Fri, Jun 1, 2012 at 6:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Precisely. The PEP addresses this point directly: http://www.python.org/dev/peps/pep-0421/#adding-new-required-attributes -eric

On May 31, 2012, at 01:21 AM, Eric Snow wrote:
I did the initial review of the four patches that Eric uploaded and I agreed with him that this was the best fit for sys.implementation. (I need to review his updated patch, which I'll try to get to later today.)
This brings me to a couple of questions:
1. should we make the new type un-instantiable (null out tp_new and tp_init)?
I don't think this is necessary.
2. would it be feasible to officially add the type (or something like it) in 3.3 or 3.4?
I wouldn't be against it, but the implementation above (or really, the C equivalent in Eric's patch) isn't quite appropriate to be that type. Specifically, while I think that filtering out _names in the repr is fine for sys.implementation, it would not be appropriate for a generalized, public type. OTOH, I'd have no problem just dropping that detail from sys.implementation too. (Note of course that even with that, you can get the full repr via sys.implementation.__dict__.) Cheers, -Barry
participants (7)
-
Barry Warsaw
-
Benjamin Peterson
-
Eric Snow
-
Maciej Fijalkowski
-
Mark Shannon
-
Nick Coghlan
-
Steven D'Aprano