Python equivalents in stdlib Was: Include datetime.py in stdlib or not?
On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
I am more interested in Brett's overall vision than this particular module. I understand that to be one of a stdlib that is separate from CPython and is indeed the standard Python library.
I am also very much interested in the overall vision, but I would like to keep the datetime.py thread focused, so I a going to reply to broad questions under a separate subject.
Questions:
1. Would the other distributions use a standard stdlib rather than current individual versions?
I certainly hope they will. In the ideal word, passing test.regrtest with unmodified Lib should be *the* definition of what is called Python. I understand that there is already some work underway in this direction such as marking implementation specific tests with appropriate decorators.
2. Would the other distributions pool their currently separate stdlib efforts to help maintain one standard stdlib?
I believe that making stdlib and test.regrtest more friendly to alternative implementations will go long way towards this goal. It will, of course, be a decision that each project will have to make.
3. What version of Python would be allowed for use in the stdlib? I would like the stdlib for 3.x to be able to use 3.x code. This would be only a minor concern for CPython as long as 2.7 is maintained, but a major concern for the other implementation currently 'stuck' in 2.x only. A good 3to2 would be needed.
Availability of python equivalents will hopefully help "other implementation currently 'stuck' in 2.x only" to get "unstuck" and move to 3.x. I understand that this is a somewhat sensitive issue at the moment, but I believe a decision has been made supporting new features for 2.x is outside of python-dev focus.
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
The ctypes modules is very CPython centric as far as I know. For the new modules, this may be a valid way to rapidly develop accelerated versions. For modules that are already written in C, I don't see much benefit in replacing them with ctypes wrappers.
[.. datetime specific discussion skipped ..] From scanning that and the posts here, it seems like a pep or other doc on dual version modules would be a good idea. It should at least document how to code the switch from python version to the x coded version and how to test both, as discussed.
I am certainly not ready to write such PEP. I may be in a better position to contribute to it after I gain more experience with test_datetime.py. At the moment I have more questions than answers. For example, the established practice appears to be: modulename.py # Python code try: from _modulename import * except: pass This is supposed to generate a .pyc file with no python definitions in it if _modulename is available. The problem with datetime.py is that it have several helper methods like _ymd2ord() that will still stay in the module. Should an "else:" clause be added to clean these up? should these methods become class or static methods as appropriate? The established practice for testing is py_module = support.import_fresh_module('modulename', blocked=['_modulename']) c_module = support.import_fresh_module('modulename', fresh=['_modulename']) class TestDefnitions: # not a unittest.TestCase subclass def test_foo(self): self.module.foo(..) ... class C_Test(TestDefnitions, unittest.TestCase): module = c_module class Py_Test(TestDefnitions, unittest.TestCase): module = py_module For datetime.py this approach presents several problems: 1. replacing datetime with self.module.datetime everywhere can get messy quickly. 2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. 3. If #2 is resolved by moving definitions inside functions, the classes will become unpickleable and pickle tests will break. Some hackery involving injecting these classes into __main__ or module globals may be required. These challenges make datetime.py an interesting showcase for other modules, so rather than writing a PEP based on abstract ideas, I think it is better to get datetime.py integrated first and try to establish the best practices on the way.
On 07/07/2010 16:29, Alexander Belopolsky wrote:
[snip...]
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
The ctypes modules is very CPython centric as far as I know. For the new modules, this may be a valid way to rapidly develop accelerated versions. For modules that are already written in C, I don't see much benefit in replacing them with ctypes wrappers.
Nope, both IronPython and PyPy have ctypes implementations and Jython is in the process of "growing" one. Using ctypes for C extensions is the most portable way of providing C extensions for Python (other than providing a pure-Python implementation of course). Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Wed, 07 Jul 2010 16:39:38 +0100 Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 07/07/2010 16:29, Alexander Belopolsky wrote:
[snip...]
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
The ctypes modules is very CPython centric as far as I know. For the new modules, this may be a valid way to rapidly develop accelerated versions. For modules that are already written in C, I don't see much benefit in replacing them with ctypes wrappers.
Nope, both IronPython and PyPy have ctypes implementations and Jython is in the process of "growing" one. Using ctypes for C extensions is the most portable way of providing C extensions for Python (other than providing a pure-Python implementation of course).
Except that ctypes doesn't help provide C extensions at all. It only helps provide wrappers around existing C libraries, which is quite a different thing. Which, in the end, makes the original suggestion meaningless. Regards Antoine.
On Wed, Jul 7, 2010 at 2:42 PM, Antoine Pitrou <solipsis@pitrou.net> wrote: ..
Except that ctypes doesn't help provide C extensions at all. It only helps provide wrappers around existing C libraries, which is quite a different thing.
Yet it may allow writing an equivalent of a C extension in pure python. For example posix or time modules could be easily reimplemented that way if the libc was less platform dependent. Such reimplementation, however is unlikely to be very useful.
Which, in the end, makes the original suggestion meaningless.
It is not meaningless, but would require effectively exposing pyport.h in pure python and using it to select the correct signature for a given library function. This would be a tremendous effort and is hardly justified.
On 7/7/2010 2:42 PM, Antoine Pitrou wrote: I wrote
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll
methods.dll to be written
check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
Except that ctypes doesn't help provide C extensions at all. It only helps provide wrappers around existing C libraries, which is quite a different thing. Which, in the end, makes the original suggestion meaningless.
To you, so let me restate it. It would be easier for many people to only rewrite, for instance, difflib.SequenceMatcher.get_longest_matching in C than to rewrite the whole SequenceMatcher class, let alone the whole difflib module. I got the impression from the datetime issue tracker discussion that it is not possible to replace a single method of a Python-coded class with a C version. I got this from statement that seems to say that having parallel Python and C versions is a nuisance because one must replace large chunks of Python, at least a class if not the whole module. If that impression is wrong, and I hope it is, the suggestion is unnecessary. If it is right, then replacing the Python-coded function with a Python-coded wrapper for a function in a miscellaneous shared library might be both possible and useful. But again, if the premise is wrong, skip the conclusion. -- Terry Jan Reedy
On 08/07/2010 02:45, Terry Reedy wrote:
On 7/7/2010 2:42 PM, Antoine Pitrou wrote:
I wrote
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll
methods.dll to be written
check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
Except that ctypes doesn't help provide C extensions at all. It only helps provide wrappers around existing C libraries, which is quite a different thing. Which, in the end, makes the original suggestion meaningless.
To you, so let me restate it. It would be easier for many people to only rewrite, for instance, difflib.SequenceMatcher.get_longest_matching in C than to rewrite the whole SequenceMatcher class, let alone the whole difflib module.
I got the impression from the datetime issue tracker discussion that it is not possible to replace a single method of a Python-coded class with a C version. I got this from statement that seems to say that having parallel Python and C versions is a nuisance because one must replace large chunks of Python, at least a class if not the whole module. If that impression is wrong, and I hope it is, the suggestion is unnecessary.
If it is right, then replacing the Python-coded function with a Python-coded wrapper for a function in a miscellaneous shared library might be both possible and useful. But again, if the premise is wrong, skip the conclusion.
Would it be possible to provide a single method in C by providing a C base class with a single method and have the full implementation inherit from the C base class if it is available or otherwise a pure Python base class? Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Wed, 07 Jul 2010 21:45:30 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
Except that ctypes doesn't help provide C extensions at all. It only helps provide wrappers around existing C libraries, which is quite a different thing. Which, in the end, makes the original suggestion meaningless.
To you, so let me restate it. It would be easier for many people to only rewrite, for instance, difflib.SequenceMatcher.get_longest_matching in C than to rewrite the whole SequenceMatcher class, let alone the whole difflib module.
And you still haven't understood my point. ctypes doesn't allow you to write any C code, only to interface with existing C code. So, regardless of whether get_longest_matching() is a function or method, it would have to be written in C manually, and that would certainly be in an extension module. (admittedly, you can instead make a pure C library with such a function and then wrap it with ctypes, but I don't see the point: you still have to write most C code yourself)
I got the impression from the datetime issue tracker discussion that it is not possible to replace a single method of a Python-coded class with a C version.
And that's a wrong impression. Inheritance allows you to do that (see Michael's answer). Besides, you can also code that method as a helper function. It is not difficult to graft a function from a module into another module. Antoine.
On Wed, Jul 7, 2010 at 11:29 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
I am more interested in Brett's overall vision than this particular module. I understand that to be one of a stdlib that is separate from CPython and is indeed the standard Python library.
I am also very much interested in the overall vision, but I would like to keep the datetime.py thread focused, so I a going to reply to broad questions under a separate subject.
Questions:
1. Would the other distributions use a standard stdlib rather than current individual versions?
I certainly hope they will. In the ideal word, passing test.regrtest with unmodified Lib should be *the* definition of what is called Python. I understand that there is already some work underway in this direction such as marking implementation specific tests with appropriate decorators.
2. Would the other distributions pool their currently separate stdlib efforts to help maintain one standard stdlib?
I believe that making stdlib and test.regrtest more friendly to alternative implementations will go long way towards this goal. It will, of course, be a decision that each project will have to make.
3. What version of Python would be allowed for use in the stdlib? I would like the stdlib for 3.x to be able to use 3.x code. This would be only a minor concern for CPython as long as 2.7 is maintained, but a major concern for the other implementation currently 'stuck' in 2.x only. A good 3to2 would be needed.
Availability of python equivalents will hopefully help "other implementation currently 'stuck' in 2.x only" to get "unstuck" and move to 3.x. I understand that this is a somewhat sensitive issue at the moment, but I believe a decision has been made supporting new features for 2.x is outside of python-dev focus.
[the rest snipped for now] I agree with Alexander's responses. Brett can chime in here too, and so can Frank W. or any of the other people who were involved in the conversation. Essentially, many of us agreed "one stdlib to bind them", from a canonical repository would help everyone involved. Any modules which were specific to the implementation - such as multiprocessing would either be flagged as such or not included in the shared repo (TBD). This effort has been on hold largely due to the fact we're waiting on the mercurial migration. It's not something I think any of us would want to do prior to that, and requires a fair amount of scaffolding / build tools /etc to make it a net win. Below, you will find the partially completed draft PEP (from a private mercurial repo) Brett/Frank and I had worked on (but again, paused due to mercurial/etc). Now that we're edging closer to 3.2 (this would not happen before then) and mercurial, I think we might need to find the time to finish the PEP: PEP: XXXX Title: Making the Standard Library a Separate Project Version: $Revision: 65628 $ Last-Modified: $Date: 2008-08-10 06:59:20 -0700 (Sun, 10 Aug 2008) $ Author: XXX Status: Draft Type: Process Content-Type: text/x-rst Created: 14-Aug-2009 Post-History: .. warning:: This PEP will not be submitted until the migration of CPython to Mercurial occurs. Abstract ======== XXX Rationale ========= Although the C implementation of Python (CPython) is the original and reference implementation of the Python language, there are now a number of additional implementations that are widely used and reasonably complete implementations. Among these implementations are Jython_, IronPython_, and PyPy_. At `PyCon 2009`_, representatives of multiple implementations of Python agreed that it would be a good idea to divide the Python Standard Library into two logical components, the first being a shared library that is essential for an implementation of Python to be considered a full implementation. All Python implementations would share this library on equal terms. The second library would be an implementation-specific standard library for things that are either implementation details for a specific VM or that depend on internals of each implementation (for example, if part of the implementation must be written in C for CPython or written in Java for Jython). The test suite should be similarly exposed and shared between all implementations on equal terms: one set of tests that must pass to be considered a full implementation, and one set of implementation-specific tests layered on top of the shared test suite (think garbage collection vs refcounting, etc). The same pattern should apply to documentation as well. The idea is to put CPython on a more equal footing with the other implementations, and to remove the need to have Jython, IronPython or PyPy specific cases in the CPython standard library. Criteria for Inclusion/Exclusion of Code ========================================= To be included in the shared library, a module must have a pure Python implementation. If the module also has a native implementation, the identical unit tests must pass in both the pure and native versions. The modules must not use any features that are considered implementation-dependent, and must only depend on other modules within the shared library unless specifically listed below (whose tests are included in the shared library): XXX: need to specify the subsets of the sys and os module that are required. XXX: also include modules that have no pure Python implementation but are still expected to be included (e.g. datetime) * sys * os Modules ------- XXX: maybe we shouldn't list all the modules to include but instead only the module to *exclude*; list is rather long and only listed what to remove gets the point across that most things will make the transition Modules to Exclude '''''''''''''''''' XXX Python 2.6 or 2.7? Intra-Module Objects to Exclude ''''''''''''''''''''''''''''''' XXX E.g. sys._getframe() Documentation Notation ---------------------- XXX How to document a module is CPython-specific in Sphinx Documentation ============= XXX Language docs, shared library, PEPs Code Layout =========== XXX Copyright ========= This document has been placed in the public domain. .. _Jython: http://www.jython.org./ .. _IronPython: http://www.codeplex.com/IronPython .. _PyPy: http://codespeak.net/pypy/ .. _PyCon 2009: http://us.pycon.org/2009/about/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: jesse
On 7/7/2010 11:43 AM, Jesse Noller wrote:
The idea is to put CPython on a more equal footing with the other implementations,
I would reverse this to "The idea is to put the other implementations on a more equal footing with CPython." The subtle difference is the implication of whether the idea is to pull CPython down (the former) or raise the others up (the latter) ;-). -- Terry Jan Reedy
On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
I am more interested in Brett's overall vision than this particular module. I understand that to be one of a stdlib that is separate from CPython and is indeed the standard Python library.
I am also very much interested in the overall vision, but I would like to keep the datetime.py thread focused, so I a going to reply to broad questions under a separate subject.
Questions:
1. Would the other distributions use a standard stdlib rather than current individual versions?
I certainly hope they will. In the ideal word, passing test.regrtest with unmodified Lib should be *the* definition of what is called Python. I understand that there is already some work underway in this direction such as marking implementation specific tests with appropriate decorators.
2. Would the other distributions pool their currently separate stdlib efforts to help maintain one standard stdlib?
I believe that making stdlib and test.regrtest more friendly to alternative implementations will go long way towards this goal. It will, of course, be a decision that each project will have to make.
3. What version of Python would be allowed for use in the stdlib? I would like the stdlib for 3.x to be able to use 3.x code. This would be only a minor concern for CPython as long as 2.7 is maintained, but a major concern for the other implementation currently 'stuck' in 2.x only. A good 3to2 would be needed.
Availability of python equivalents will hopefully help "other implementation currently 'stuck' in 2.x only" to get "unstuck" and move to 3.x. I understand that this is a somewhat sensitive issue at the moment, but I believe a decision has been made supporting new features for 2.x is outside of python-dev focus.
4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping.
The ctypes modules is very CPython centric as far as I know. For the new modules, this may be a valid way to rapidly develop accelerated versions. For modules that are already written in C, I don't see much benefit in replacing them with ctypes wrappers.
[.. datetime specific discussion skipped ..] From scanning that and the posts here, it seems like a pep or other doc on dual version modules would be a good idea. It should at least document how to code the switch from python version to the x coded version and how to test both, as discussed.
I am certainly not ready to write such PEP. I may be in a better position to contribute to it after I gain more experience with test_datetime.py. At the moment I have more questions than answers.
For example, the established practice appears to be:
modulename.py
# Python code
try: from _modulename import * except: pass
This is supposed to generate a .pyc file with no python definitions in it if _modulename is available. The problem with datetime.py is that it have several helper methods like _ymd2ord() that will still stay in the module. Should an "else:" clause be added to clean these up? should these methods become class or static methods as appropriate?
The established practice for testing is
py_module = support.import_fresh_module('modulename', blocked=['_modulename']) c_module = support.import_fresh_module('modulename', fresh=['_modulename'])
class TestDefnitions: # not a unittest.TestCase subclass def test_foo(self): self.module.foo(..) ...
class C_Test(TestDefnitions, unittest.TestCase): module = c_module
class Py_Test(TestDefnitions, unittest.TestCase): module = py_module
For datetime.py this approach presents several problems:
1. replacing datetime with self.module.datetime everywhere can get messy quickly. 2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. 3. If #2 is resolved by moving definitions inside functions, the classes will become unpickleable and pickle tests will break. Some hackery involving injecting these classes into __main__ or module globals may be required.
So I have been thinking about this about how to possibly make this standard test scaffolding a little cleaner. I think a class decorator might do the trick. If you had all test methods take a module argument you could pass in the module that should be used to test. Then you simply rename test_* to _test_*, create test_*_(c|py), and then have those methods call their _test_* equivalents with the proper module to test. You could even make this generic by having the keyword arguments to the decorator by what the test suffix is named. The benefit of this is you don't have to define one base class and then two subclasses; you define a single test class and simply add a decorator. This addresses #1. As for #3, that I can't answer and might simply require restructuring those specific pickle tests. -Brett
These challenges make datetime.py an interesting showcase for other modules, so rather than writing a PEP based on abstract ideas, I think it is better to get datetime.py integrated first and try to establish the best practices on the way. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon <brett@python.org> wrote:
On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote: ..
For datetime.py this approach presents several problems:
1. replacing datetime with self.module.datetime everywhere can get messy quickly. 2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. 3. If #2 is resolved by moving definitions inside functions, the classes will become unpickleable and pickle tests will break. Some hackery involving injecting these classes into __main__ or module globals may be required.
So I have been thinking about this about how to possibly make this standard test scaffolding a little cleaner. I think a class decorator might do the trick. If you had all test methods take a module argument you could pass in the module that should be used to test. Then you simply rename test_* to _test_*, create test_*_(c|py), and then have those methods call their _test_* equivalents with the proper module to test. You could even make this generic by having the keyword arguments to the decorator by what the test suffix is named.
Hmm, I've been playing with the idea of using a metaclass to do essentially the same, but a class decorator may be a simpler solution. I still don't see how this address #1, though. In the ideal world, I would like not to touch the body of test_* methods. These methods, however are written assuming from datetime import date, time, datetime, tzinfo, etc at the top of test_datetime.py. Even if the decorator will call _test_* with six additional arguments named date, time, datetime, tzinfo, etc, it will not work because by the time decorator (or even metaclass machinery) gets to operate, these names are already resolved as globals.
The benefit of this is you don't have to define one base class and then two subclasses; you define a single test class and simply add a decorator.
I like this.
This addresses #1.
Except it does not. :-(
As for #3, that I can't answer and might simply require restructuring those specific pickle tests.
What about #2?
On Wed, Jul 7, 2010 at 13:16, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon <brett@python.org> wrote:
On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote: ..
For datetime.py this approach presents several problems:
1. replacing datetime with self.module.datetime everywhere can get messy quickly. 2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. 3. If #2 is resolved by moving definitions inside functions, the classes will become unpickleable and pickle tests will break. Some hackery involving injecting these classes into __main__ or module globals may be required.
So I have been thinking about this about how to possibly make this standard test scaffolding a little cleaner. I think a class decorator might do the trick. If you had all test methods take a module argument you could pass in the module that should be used to test. Then you simply rename test_* to _test_*, create test_*_(c|py), and then have those methods call their _test_* equivalents with the proper module to test. You could even make this generic by having the keyword arguments to the decorator by what the test suffix is named.
Hmm, I've been playing with the idea of using a metaclass to do essentially the same, but a class decorator may be a simpler solution. I still don't see how this address #1, though. In the ideal world, I would like not to touch the body of test_* methods. These methods, however are written assuming from datetime import date, time, datetime, tzinfo, etc at the top of test_datetime.py. Even if the decorator will call _test_* with six additional arguments named date, time, datetime, tzinfo, etc, it will not work because by the time decorator (or even metaclass machinery) gets to operate, these names are already resolved as globals.
Well, I personally would call that bad form to import those classes explicitly, but that's just me. You will simply need to make them work off of the module object. There is nothing wrong with "cleaning up" the tests as part of your work; the tests code should not be enshrined as perfect.
The benefit of this is you don't have to define one base class and then two subclasses; you define a single test class and simply add a decorator.
I like this.
This addresses #1.
Except it does not. :-(
As for #3, that I can't answer and might simply require restructuring those specific pickle tests.
What about #2?
Either define two different subclasses or write a function that returns the class using the superclass that you want.
On Wed, Jul 7, 2010 at 4:33 PM, Brett Cannon <brett@python.org> wrote:
2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. .. What about #2?
Either define two different subclasses or write a function that returns the class using the superclass that you want.
Selecting one of two globally defined different subclasses will be ugly in parameterized tests. An in the other approach, the class definitions will have to be moved away from the module level and inside a scope where module variable is present. Yes, it looks like some refactoring is unavoidable.
On Wed, Jul 7, 2010 at 13:53, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Jul 7, 2010 at 4:33 PM, Brett Cannon <brett@python.org> wrote:
2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. .. What about #2?
Either define two different subclasses or write a function that returns the class using the superclass that you want.
Selecting one of two globally defined different subclasses will be ugly in parameterized tests.
Didn't say it was a pretty solution. =)
An in the other approach, the class definitions will have to be moved away from the module level and inside a scope where module variable is present.
Yep, which is not a big deal.
Yes, it looks like some refactoring is unavoidable.
=)
On Thu, Jul 8, 2010 at 7:36 AM, Brett Cannon <brett@python.org> wrote:
Selecting one of two globally defined different subclasses will be ugly in parameterized tests.
Didn't say it was a pretty solution. =)
An in the other approach, the class definitions will have to be moved away from the module level and inside a scope where module variable is present.
Yep, which is not a big deal.
Yes, it looks like some refactoring is unavoidable.
If you want to run the same module twice with different instances of an imported module (or any other parameterised globals), creative use of run_module() can provide module level scoping without completely restructuring your tests. 1. Move the current tests aside into a new file that isn't automatically invoked by regrtest (e.g. _test_datetime_inner.py). 2. In that code, remove any imports from datetime (instead, assume datetime will be injected into the module's namespace)* 3. In test_datetime.py itself, use runpy.run_module() to import the renamed module twice, once with the Python version of datetime in init_globals and once with the C version. *How the removals work: "import datetime" is dropped entirely "from datetime import x, y, x" becomes "x, y, z = datetime.x, datetime.y, datetime.z" There would be additional things to do to make the attribution of the test results clearer in order to make this effective in practice though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Jul 7, 2010 at 6:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: ..
If you want to run the same module twice with different instances of an imported module (or any other parameterised globals), creative use of run_module() can provide module level scoping without completely restructuring your tests.
This is what the current patch at http://bugs.python.org/file17848/issue7989.diff does, but at expense of not exposing testcases to unittest correctly.
1. Move the current tests aside into a new file that isn't automatically invoked by regrtest (e.g. _test_datetime_inner.py).
Yes, I already have datetimetester.py.
2. In that code, remove any imports from datetime (instead, assume datetime will be injected into the module's namespace)*
Hmm. That will make datetimetester not importable.
3. In test_datetime.py itself, use runpy.run_module() to import the renamed module twice, once with the Python version of datetime in init_globals and once with the C version.
*How the removals work: "import datetime" is dropped entirely "from datetime import x, y, x" becomes "x, y, z = datetime.x, datetime.y, datetime.z"
I'll try that.
There would be additional things to do to make the attribution of the test results clearer in order to make this effective in practice though.
Thanks. I would really like to make it work first and improve later. I hope this will do the trick.
On 07/07/2010 21:33, Brett Cannon wrote:
On Wed, Jul 7, 2010 at 13:16, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon<brett@python.org> wrote:
On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
..
For datetime.py this approach presents several problems:
1. replacing datetime with self.module.datetime everywhere can get messy quickly. 2. There are test classes defined at the test_datetime module level that subclass from datetime classes. The self.module is not available at the module level. These should probably be moved to setUp() methods and attached to test case self. 3. If #2 is resolved by moving definitions inside functions, the classes will become unpickleable and pickle tests will break. Some hackery involving injecting these classes into __main__ or module globals may be required.
So I have been thinking about this about how to possibly make this standard test scaffolding a little cleaner. I think a class decorator might do the trick. If you had all test methods take a module argument you could pass in the module that should be used to test. Then you simply rename test_* to _test_*, create test_*_(c|py), and then have those methods call their _test_* equivalents with the proper module to test. You could even make this generic by having the keyword arguments to the decorator by what the test suffix is named.
Hmm, I've been playing with the idea of using a metaclass to do essentially the same, but a class decorator may be a simpler solution. I still don't see how this address #1, though. In the ideal world, I would like not to touch the body of test_* methods. These methods, however are written assuming from datetime import date, time, datetime, tzinfo, etc at the top of test_datetime.py. Even if the decorator will call _test_* with six additional arguments named date, time, datetime, tzinfo, etc, it will not work because by the time decorator (or even metaclass machinery) gets to operate, these names are already resolved as globals.
Well, I personally would call that bad form to import those classes explicitly, but that's just me. You will simply need to make them work off of the module object. There is nothing wrong with "cleaning up" the tests as part of your work; the tests code should not be enshrined as perfect.
Yep - each test should take the module under test (either in C or Python) as the parameter and used classes / functions as attributes off the module object. Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Wed, Jul 7, 2010 at 5:56 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote: ..
Well, I personally would call that bad form to import those classes explicitly, but that's just me. You will simply need to make them work off of the module object. There is nothing wrong with "cleaning up" the tests as part of your work; the tests code should not be enshrined as perfect.
Yep - each test should take the module under test (either in C or Python) as the parameter and used classes / functions as attributes off the module object.
This is somewhat uncharted territory. So far test_* methods had no parameters except self and module was attached to the TestCase subclass. It would be accessed inside test_* methods as self.module. I think changing test_* methods' signature is too much of a price to pay for saving self. prefix. I will still have to touch every date, time, datetime, etc symbols throughout the test file.
On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach.
Note that parameterised methods have a similar problem to parameterised modules - unittest results are reported in terms of "testmodule.testclass.testfunction", so proper attribution of results in the test output will require additional work. The separate subclasses approach doesn't share this issue, since it changes the value of the second item in accordance with the module under test. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Jul 7, 2010 at 15:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach.
Note that parameterised methods have a similar problem to parameterised modules - unittest results are reported in terms of "testmodule.testclass.testfunction", so proper attribution of results in the test output will require additional work. The separate subclasses approach doesn't share this issue, since it changes the value of the second item in accordance with the module under test.
This is why a new method would need to be created with a special suffix to delineate what module the test was called with. So instead of testclass specifying what module was used, it would be testfunction. I guess it becomes a question of what boilerplate you prefer. One nice benefit of the class decorator that I can think of is it could handle the import trickery for you so you wouldn't even need to worry about that issue. This could also allow the decorator to not bother running the tests twice if the extension helper was not available. -Brett
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2010/7/7 Nick Coghlan <ncoghlan@gmail.com>:
On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach.
Note that parameterised methods have a similar problem to parameterised modules - unittest results are reported in terms of "testmodule.testclass.testfunction", so proper attribution of results in the test output will require additional work. The separate subclasses approach doesn't share this issue, since it changes the value of the second item in accordance with the module under test.
A good parameterized implementation, though, gives the repr() of the parameters in failure output. -- Regards, Benjamin
On pypi - testscenarios; Its been discussed on TIP before. Its a 'run a function to parameterise some tests' API, it changes the id() of the test to include the parameters, and it can be hooked in via load_tests quite trivially. Cheers, Rob
On Thu, Jul 8, 2010 at 9:13 AM, Benjamin Peterson <benjamin@python.org> wrote:
2010/7/7 Nick Coghlan <ncoghlan@gmail.com>:
On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach.
Note that parameterised methods have a similar problem to parameterised modules - unittest results are reported in terms of "testmodule.testclass.testfunction", so proper attribution of results in the test output will require additional work. The separate subclasses approach doesn't share this issue, since it changes the value of the second item in accordance with the module under test.
A good parameterized implementation, though, gives the repr() of the parameters in failure output.
That would qualify as "additional work" if your tests aren't already set up that way (and doesn't cover the case of unexpected exceptions in a test, where the test method doesn't get to say much about the way the error is reported). I realised during the day that my suggested approach was more complicated than is actually necessary - once the existing tests have been moved to a separate module, *that test module* can itself be imported twice, once with the python version of the module to be tested and once with the C version. You can then do some hackery to distinguish the test classes without having to modify the test code itself (note, the below code should work in theory, but isn't actually tested): ============= py_module_tests = support.import_fresh_module('moduletester', fresh=['modulename'], blocked=['_modulename']) c_module_tests = support.import_fresh_module('moduletester', fresh=['modulename', '_modulename']) test_modules = [py_module_tests, c_module_tests] suffixes = ["_Py", "_C"] for module, suffix in zip(test_modules, suffixes): for obj in module.itervalues(): if isinstance(obj, unittest,TestCase): obj.__name__ += suffix setattr(module, obj.__name__, obj) def test_main(): for module in test_modules: module.test_main() ============= Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 9, 2010 at 12:59 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
for obj in module.itervalues(): if isinstance(obj, unittest,TestCase):
Hmm, isn't there a never-quite-made-it-into-the-Zen line about "syntax shall not look like grit on Tim's monitor"? (s/,/./ in that second line) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jul 8, 2010 at 07:59, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Thu, Jul 8, 2010 at 9:13 AM, Benjamin Peterson <benjamin@python.org> wrote:
2010/7/7 Nick Coghlan <ncoghlan@gmail.com>:
On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Using a class decorator to duplicate each _test_ into two test_* methods sounds like a good approach.
Note that parameterised methods have a similar problem to parameterised modules - unittest results are reported in terms of "testmodule.testclass.testfunction", so proper attribution of results in the test output will require additional work. The separate subclasses approach doesn't share this issue, since it changes the value of the second item in accordance with the module under test.
A good parameterized implementation, though, gives the repr() of the parameters in failure output.
That would qualify as "additional work" if your tests aren't already set up that way (and doesn't cover the case of unexpected exceptions in a test, where the test method doesn't get to say much about the way the error is reported).
I realised during the day that my suggested approach was more complicated than is actually necessary - once the existing tests have been moved to a separate module, *that test module* can itself be imported twice, once with the python version of the module to be tested and once with the C version. You can then do some hackery to distinguish the test classes without having to modify the test code itself (note, the below code should work in theory, but isn't actually tested):
============= py_module_tests = support.import_fresh_module('moduletester', fresh=['modulename'], blocked=['_modulename']) c_module_tests = support.import_fresh_module('moduletester', fresh=['modulename', '_modulename'])
test_modules = [py_module_tests, c_module_tests] suffixes = ["_Py", "_C"]
for module, suffix in zip(test_modules, suffixes): for obj in module.itervalues(): if isinstance(obj, unittest,TestCase): obj.__name__ += suffix setattr(module, obj.__name__, obj)
def test_main(): for module in test_modules: module.test_main() =============
Very cool solution (assuming it works =) ! One issue I see with this is deciding how to organize tests that are specific to one version of a module compared to another. For instance, test_warnings has some tests specific to _warnings because of the hoops it has to jump through in order to get overriding showwarnings and friends to work. I guess I could try to make them generic enough that they don't require a specific module. Otherwise I would insert the module-specific tests into test_warnings to have that module also call gnostic_test_warnings to run the universal tests.
On Thu, Jul 8, 2010 at 10:59 AM, Nick Coghlan <ncoghlan@gmail.com> wrote: ..
I realised during the day that my suggested approach was more complicated than is actually necessary - once the existing tests have been moved to a separate module, *that test module* can itself be imported twice, once with the python version of the module to be tested and once with the C version. You can then do some hackery to distinguish the test classes without having to modify the test code itself (note, the below code should work in theory, but isn't actually tested):
============= py_module_tests = support.import_fresh_module('moduletester', fresh=['modulename'], blocked=['_modulename']) c_module_tests = support.import_fresh_module('moduletester', fresh=['modulename', '_modulename'])
test_modules = [py_module_tests, c_module_tests] suffixes = ["_Py", "_C"]
for module, suffix in zip(test_modules, suffixes): for obj in module.itervalues(): if isinstance(obj, unittest,TestCase): obj.__name__ += suffix setattr(module, obj.__name__, obj)
def test_main(): for module in test_modules: module.test_main() =============
Yes, this is definitely an improvement over my current datetime patch [1]_, but it still requires a custom test_main() and does not make the test cases discoverable by alternative unittest runners. I think that can be fixed by injecting imported TestCase subclasses into the main test module globals. I'll try to implement that for datetime. Thanks, Nick - great idea! .. [1] http://bugs.python.org/file17848/issue7989.diff
On Fri, Jul 9, 2010 at 5:24 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Yes, this is definitely an improvement over my current datetime patch [1]_, but it still requires a custom test_main() and does not make the test cases discoverable by alternative unittest runners. I think that can be fixed by injecting imported TestCase subclasses into the main test module globals.
So include something along the lines of "globals()[obj.__name__] = obj" in the name hacking loop to make the test classes more discoverable? Good idea. Including a comment in the main test module along the lines of your reply to Antoine would be good, too (i.e. this is acknowledged as being something of a hack to make sure we don't break the datetime tests when updating them to be applied to both the existing C module and the new pure Python equivalent). As Antoine says, using explicit subclasses is a *much* cleaner way of doing this kind of thing when the tests are being written from scratch to test multiple implementations within a single interpreter. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jul 8, 2010 at 5:46 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: ..
So include something along the lines of "globals()[obj.__name__] = obj" in the name hacking loop to make the test classes more discoverable? Good idea.
As often happens, a good idea turns quite ugly when facing real world realities. I've uploaded a new patch at http://bugs.python.org/issue7989 and here is what I had to do to make this work for datetime: ========== import unittest import sys; sys.modules['_pickle'] = None from test.support import import_fresh_module, run_unittest TESTS = 'test.datetimetester' pure_tests = import_fresh_module(TESTS, fresh=['datetime', '_strptime', 'time'], blocked=['_datetime']) fast_tests = import_fresh_module(TESTS, fresh=['datetime', '_datetime', '_strptime', 'time']) test_modules = [pure_tests, fast_tests] test_suffixes = ["_Pure", "_Fast"] globs = globals() for module, suffix in zip(test_modules, test_suffixes): for name, cls in module.__dict__.items(): if isinstance(cls, type) and issubclass(cls, unittest.TestCase): name += suffix cls.__name__ = name globs[name] = cls def setUp(self, module=module, setup=cls.setUp): self._save_sys_modules = sys.modules.copy() sys.modules[TESTS] = module sys.modules['datetime'] = module.datetime_module sys.modules['_strptime'] = module.datetime_module._strptime setup(self) def tearDown(self, teardown=cls.tearDown): teardown(self) sys.modules = self._save_sys_modules cls.setUp = setUp cls.tearDown = tearDown def test_main(): run_unittest(__name__) ========= and it still requires that '_pickle' is disabled to pass pickle tests.
On Thu, Jul 8, 2010 at 9:44 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote: ..
and it still requires that '_pickle' is disabled to pass pickle tests.
I have found the problem in test_datetime. Restoring sys.modules has to be done in-place. With this fix, test_datetime looks as follows: ===== import unittest import sys from test.support import import_fresh_module, run_unittest TESTS = 'test.datetimetester' pure_tests = import_fresh_module(TESTS, fresh=['datetime', '_strptime'], blocked=['_datetime']) fast_tests = import_fresh_module(TESTS, fresh=['datetime', '_datetime', '_strptime']) test_modules = [pure_tests, fast_tests] test_suffixes = ["_Pure", "_Fast"] for module, suffix in zip(test_modules, test_suffixes): for name, cls in module.__dict__.items(): if isinstance(cls, type) and issubclass(cls, unittest.TestCase): name += suffix cls.__name__ = name globals()[name] = cls def setUp(self, module=module, setup=cls.setUp): self._save_sys_modules = sys.modules.copy() sys.modules[TESTS] = module sys.modules['datetime'] = module.datetime_module sys.modules['_strptime'] = module.datetime_module._strptime setup(self) def tearDown(self, teardown=cls.tearDown): teardown(self) sys.modules.__init__(self._save_sys_modules) cls.setUp = setUp cls.tearDown = tearDown def test_main(): run_unittest(__name__) if __name__ == "__main__": test_main() ===== I think this is as good as it gets. I am going to update the patch at http://bugs.python.org/issue7989 .
On Fri, 9 Jul 2010 00:59:02 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
py_module_tests = support.import_fresh_module('moduletester', fresh=['modulename'], blocked=['_modulename']) c_module_tests = support.import_fresh_module('moduletester', fresh=['modulename', '_modulename'])
I don't really like the proliferation of module test helpers, it only makes things confusing and forces you to switch between more files in your editor. By contrast, the subclassing solution is simple, explicit and obvious. (I also wonder what problem this subthread is trying to solve at all. Just my 2 eurocents) Regards Antoine.
On Thu, Jul 8, 2010 at 3:29 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 9 Jul 2010 00:59:02 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote: .. I don't really like the proliferation of module test helpers, it only makes things confusing and forces you to switch between more files in your editor. By contrast, the subclassing solution is simple, explicit and obvious.
And would require a lot of tedious and error prone work to retrofit existing tests. Since we don't have meta regression tests, there is no obvious way to assure that retrofitting does not change the tests. Note that test_pickle uses both the subclassing solution *and* a helper pickletester module because this neatly separates maulti-implementation machinery from the actual test definitions.
(I also wonder what problem this subthread is trying to solve at all.
The problem is to find a simple solution that will allow running existing unit tests written for a C extension on both the original extension and the added pure python equivalent. When the existing tests were developed over many years and have 100+ test cases, this is not as easy task as it would be if you wrote your tests from scratch.
participants (9)
-
Alexander Belopolsky -
Antoine Pitrou -
Benjamin Peterson -
Brett Cannon -
Jesse Noller -
Michael Foord -
Nick Coghlan -
Robert Collins -
Terry Reedy