Mailman 3 Choosing a best practice solution for Python/extension modules - Python-Dev

Choosing a best practice solution for Python/extension modules

Brett Cannon

20 Feb 2009 20 Feb '09

7:44 p.m.

With io getting rewritten as an extension module, I think it's time to try to come up with a good best practice scenario for how to be able to control when a module uses a pure Python implementation and when it uses extension module optimizations. This is really only important for testing as if the extension is missing then the pure Python version is just flat-out used. As an example, let's just go with pickle and the Pickler class, with _pickle as the extension module. If you look at examples in the standard library, their seems to be two approaches. One is simply to blast the pure Python version: class Pickler: pass try: from _pickle import Pickler except ImportError: pass This is bad, though, as the only way to get a pure Python version for testing is to clear out pickle and _pickle from sys.modules, put None in for sys.modules['_pickle'] and then import pickle again. Yuck. The other option is to hide the pure Python version:: class _Pickler: pass try: from _pickle import Pickler # pickle actualy imports * except ImportError: Pickler = _Pickler Better, but it still means that you are mucking around with hidden names and it hard-codes what part of the module gets replaced (using import * gets around this, but it also blasts things like __doc__ which you probably don't want). Now, from what I can tell, Antoine is suggesting having _pyio and a _io and then io is simply: try: from _io import * except ImportError: from _pyio import * That works for testing as you can then have test classes have an attribute for the module to use and then create two subclasses which set what module to use (kind of like how test_warnings currently does it). But this only really works for complete module replacements, not modules like pickle where only key portions have been rewritten (which happens more often than the complete rewrite). So here is my crazy idea that I came up with late last night (i.e. might not make a lot of sense). First, the module with the pure Python code is the main version. At the end of that module, you make a function call: ``use_extension(__name__, '_pickle')``. That function then does some "magic":: def use_extension(py_name, ext_name): try: ext = importlib.import_module(ext_name) except ImportError: return py = sys.modules[py_name] swapped = {} for name in (x for x in dir(ext) if not x.startswith('__')): swapped[name] = getattr(py, name) setattr(py, name, getattr(ext, name)) py.__extension__ = ext_name, swapped You can also have an undo_extension('pickle') and it will unroll what was changed. This makes choosing what version of a module to use very simple in tests as it is a single function call in one direction or another. And doing it this way also allows for different VMs to choose different things to replace. For instance IronPython might decide that most of pickle is fine and only want to change a single function with an extension; this solution lets them do that without it being hard-coded in the standard library. At worst other VMs simply need to refactor the Python code so that there is a class or function that can be replaced. So go ahead and tear this apart so that we can hopefully reach a consensus that makes sense so that at least testing can easily be done. -Brett

Attachments:

attachment.htm (text/html — 3.8 KB)

Show replies by date

Daniel Stutzbach

20 Feb 20 Feb

8:31 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 1:44 PM, Brett Cannon wrote:

...

Now, from what I can tell, Antoine is suggesting having _pyio and a _io and then io is simply:

try: from _io import * except ImportError: from _pyio import *

That works for testing as you can then have test classes have an attribute for the module to use and then create two subclasses which set what module to use (kind of like how test_warnings currently does it). But this only really works for complete module replacements, not modules like pickle where only key portions have been rewritten (which happens more often than the complete rewrite).

A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read: from _pypickle import * try: from _pickle import * except ImportError: pass -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com

Brett Cannon

8:37 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...

On Fri, Feb 20, 2009 at 1:44 PM, Brett Cannon wrote:

...
Now, from what I can tell, Antoine is suggesting having _pyio and a _io and then io is simply:

try: from _io import * except ImportError: from _pyio import *

That works for testing as you can then have test classes have an attribute for the module to use and then create two subclasses which set what module to use (kind of like how test_warnings currently does it). But this only really works for complete module replacements, not modules like pickle where only key portions have been rewritten (which happens more often than the complete rewrite).

A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc. -Brett

Brett Cannon

8:45 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote:

...

On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...
On Fri, Feb 20, 2009 at 1:44 PM, Brett Cannon wrote:

...
Now, from what I can tell, Antoine is suggesting having _pyio and a _io and then io is simply:

try: from _io import * except ImportError: from _pyio import *

That works for testing as you can then have test classes have an attribute for the module to use and then create two subclasses which set what module to use (kind of like how test_warnings currently does it). But this only really works for complete module replacements, not modules like pickle where only key portions have been rewritten (which happens more often than the complete rewrite).

A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this? -Brett

Aahz

8:53 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009, Brett Cannon wrote:

...

On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...
A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

Brett Cannon

9:45 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 12:53, Aahz wrote:

...

On Fri, Feb 20, 2009, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...
A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this?

If you define it, yes. But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like:: # _pypickle def A(): return _B() def _B(): return -13 # _pickle def _B(): return 42 # pickle from _pypickle import * try: from _pickle import * except ImportError: pass If you import pickle and call pickle.A() you will get -13 which is not what you are after. -Brett

Nick Coghlan

11:27 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

Ah, you may want to think about that a bit more. There's a reason globals are looked up when they're used rather than when their function is defined. Even in your own example, _B isn't defined at all when you define A. There is a (real) related problem whereby the Python version will *use* it's own globals if it actually tries to call any functions during the import, but that's a problem shared by any "overwrite at the end of import" approach to swapping in extension module versions of functions. With appropriate __all__ definitions in the C extension modules, I don't see anything wrong with Daniel's suggested approach. Note also that with this approach _io.__all__ will give the details of what has been overridden by the extension module, so it even still provides a decent level of introspection support. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Daniel Stutzbach

11:40 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 5:27 PM, Nick Coghlan wrote:

...

Brett Cannon wrote:

...
If you import pickle and call pickle.A() you will get -13 which is not what you are after.

Ah, you may want to think about that a bit more. There's a reason globals are looked up when they're used rather than when their function is defined. Even in your own example, _B isn't defined at all when you define A.

No, I'm afraid Brett is quite right. Globals are looked up when the function is executed, true, but they are looked up within the module that defined the function. Functions defined in _pypickle would only call the _pypickle version of functions. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com

Nick Coghlan

11:54 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Daniel Stutzbach wrote:

...

On Fri, Feb 20, 2009 at 5:27 PM, Nick Coghlan mailto:ncoghlan@gmail.com> wrote:

Brett Cannon wrote: > If you import pickle and call pickle.A() you will get -13 which is not > what you are after.

Ah, you may want to think about that a bit more. There's a reason globals are looked up when they're used rather than when their function is defined. Even in your own example, _B isn't defined at all when you define A.

No, I'm afraid Brett is quite right. Globals are looked up when the function is executed, true, but they are looked up within the module that defined the function. Functions defined in _pypickle would only call the _pypickle version of functions.

Oh, I see what you mean now. Looks like Brett's tracked substitution may be the way to go then. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Greg Ewing

21 Feb 21 Feb

5:14 a.m.

New subject: Choosing a best practice solution for Python/extension modules

Daniel Stutzbach wrote:

...

No, I'm afraid Brett is quite right. Globals are looked up when the function is executed, true, but they are looked up within the module that defined the function.

I was thinking you could fix that by going over the imported functions and stuffing the current globals into their func_globals, but unfortunately it's read-only. :-(

...

...
...
f.func_globals = g Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: readonly attribute

Is there a reason it couldn't be made writeable? -- Greg

Jean-Paul Calderone

5:17 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, 20 Feb 2009 13:45:26 -0800, Brett Cannon wrote:

...

On Fri, Feb 20, 2009 at 12:53, Aahz wrote:

...
On Fri, Feb 20, 2009, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...
A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this?

If you define it, yes.

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

If pickle and _pypickle are both Python modules, and _pypickle.A is intended to be used all the time, regardless of whether _pickle is available, then there's not really any reason to implement A in _pypickle. Just implement it in pickle. Then import whatever optionally fast thing it depends on from _pickle, if possible, and fall-back to the less fast thing in _pypickle otherwise. This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here. Jean-Paul

Brett Cannon

7:07 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone wrote:

...

On Fri, 20 Feb 2009 13:45:26 -0800, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:53, Aahz wrote:

On Fri, Feb 20, 2009, Brett Cannon wrote:

...
...
...
On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:

...
A slight change would make it work for modules where only key

functions

...
have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting

On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote: things

...
like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this?

If you define it, yes.

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

If pickle and _pypickle are both Python modules, and _pypickle.A is intended to be used all the time, regardless of whether _pickle is available, then there's not really any reason to implement A in _pypickle. Just implement it in pickle. Then import whatever optionally fast thing it depends on from _pickle, if possible, and fall-back to the less fast thing in _pypickle otherwise.

This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

But that doesn't provide a clear way, short of screwing with sys.modules, to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places. -Brett

Jean-Paul Calderone

7:32 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, 21 Feb 2009 11:07:07 -0800, Brett Cannon wrote:

...

On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone wrote:

...
On Fri, 20 Feb 2009 13:45:26 -0800, Brett Cannon wrote:

...
On Fri, Feb 20, 2009 at 12:53, Aahz wrote:

On Fri, Feb 20, 2009, Brett Cannon wrote:

...
...
...
On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote: > > A slight change would make it work for modules where only key functions > have been rewritten. For example, pickle.py could read: > > from _pypickle import * > try: from _pickle import * > except ImportError: pass

True, although that still suffers from the problem of overwriting

On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote: things

...
like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this?

If you define it, yes.

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

If pickle and _pypickle are both Python modules, and _pypickle.A is intended to be used all the time, regardless of whether _pickle is available, then there's not really any reason to implement A in _pypickle. Just implement it in pickle. Then import whatever optionally fast thing it depends on from _pickle, if possible, and fall-back to the less fast thing in _pypickle otherwise.

This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

But that doesn't provide a clear way, short of screwing with sys.modules, to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places.

"screwing with sys.modules" isn't a goal. It's a means of achieving a goal, and not a particularly good one. I guess I overedited my message, sorry about that. Originally I included an example of how to parameterize the high-level API to make it easier to test (or use) with any implementation one wants. It went something like this: try: import _pickle as _lowlevel except ImportError: import _pypickle as _lowlevel class Pickler: def __init__(self, implementation=None): if implementation is None: implementation = _lowlevel self.dump = implementation.dump self.load = implementation.load ... Perhaps this isn't /exactly/ how pickle wants to work - I haven't looked at how the C extension and the Python code fit together - but the general idea should apply regardless of those details. Jean-Paul

Brett Cannon

9:19 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009 at 11:32, Jean-Paul Calderone wrote:

...

On Sat, 21 Feb 2009 11:07:07 -0800, Brett Cannon wrote:

...
On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone
...
wrote:

On Fri, 20 Feb 2009 13:45:26 -0800, Brett Cannon

...
wrote:

On Fri, Feb 20, 2009 at 12:53, Aahz wrote:

...
On Fri, Feb 20, 2009, Brett Cannon wrote:

...
...
On Fri, Feb 20, 2009 at 12:37, Brett Cannon wrote: > On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach < > daniel@stutzbachenterprises.com> wrote: >> >> A slight change would make it work for modules where only key functions >> have been rewritten. For example, pickle.py could read: >> >> from _pypickle import * >> try: from _pickle import * >> except ImportError: pass > > True, although that still suffers from the problem of overwriting things > like __name__, __file__, etc.

Actually, I take that back; the IMPORT_STAR opcode doesn't pull in anything starting with an underscore. So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work. Is that really an acceptable compromise for a simple solution like this?

Doesn't __all__ control this?

If you define it, yes.

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

If pickle and _pypickle are both Python modules, and _pypickle.A is intended to be used all the time, regardless of whether _pickle is available, then there's not really any reason to implement A in _pypickle. Just implement it in pickle. Then import whatever optionally fast thing it depends on from _pickle, if possible, and fall-back to the less fast thing in _pypickle otherwise.

This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

But that doesn't provide a clear way, short of screwing with sys.modules, to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places.

"screwing with sys.modules" isn't a goal. It's a means of achieving a goal, and not a particularly good one.

I guess I overedited my message, sorry about that. Originally I included an example of how to parameterize the high-level API to make it easier to test (or use) with any implementation one wants. It went something like this:

try: import _pickle as _lowlevel except ImportError: import _pypickle as _lowlevel

class Pickler: def __init__(self, implementation=None): if implementation is None: implementation = _lowlevel self.dump = implementation.dump self.load = implementation.load ...

Perhaps this isn't /exactly/ how pickle wants to work - I haven't looked at how the C extension and the Python code fit together - but the general idea should apply regardless of those details.

But this requires all VMs to either implement as an extension the same thing, or nothing at all. What if Jython only wants to re-implement 'load' and not 'dump'? -Brett

glyph＠divmod.com

7:43 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On 07:07 pm, brett@python.org wrote:

...

On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone wrote:

...

...
...
But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

...

...
...
# _pypickle def A(): return _B() def _B(): return -13

...

...
...
# _pickle def _B(): return 42

...

...
...
# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

...

...
This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

...

But that doesn't provide a clear way, short of screwing with sys.modules, to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places.

You don't have to screw with sys.modules. The way I would deal with testing this particular interaction would be a setUp that replaces pickle._A with _pypickle._A, and a tearDown that restores the original one. Twisted's TestCase has specific support for this. You would spell it like this: import _pypickle # ... testCase.patch(pickle, '_A', _pypickle._A) You can read more about this method here: http://python.net/crew/mwh/apidocs/twisted.trial.unittest.TestCase.html#patc...

Brett Cannon

9:17 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009 at 11:43, wrote:

...

On 07:07 pm, brett@python.org wrote:

...
On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone
...
wrote:

But there is another issue with this: the pure Python code will never call

...
...
...
the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle

...
...
def A(): return _B() def _B(): return -13

# _pickle

...
...
def _B(): return 42

# pickle

...
...
from _pypickle import * try: from _pickle import * except ImportError: pass

This is really the same as any other high-level/low-level

...
library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

But that doesn't provide a clear way, short of screwing with sys.modules,

...
to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places.

You don't have to screw with sys.modules. The way I would deal with testing this particular interaction would be a setUp that replaces pickle._A with _pypickle._A, and a tearDown that restores the original one.

Twisted's TestCase has specific support for this. You would spell it like this:

import _pypickle # ... testCase.patch(pickle, '_A', _pypickle._A)

You can read more about this method here:

http://python.net/crew/mwh/apidocs/twisted.trial.unittest.TestCase.html#patc...

My worry with this approach is that while this works nicely if you are only overriding a single function, having to do this for all functions and classes in order to make sure you are testing the extension code with all the extension code instead of intermingled extension/Python code. So a function that did this automatically for the entire module would be needed, which is like what I proposed in my use_extension function. I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage. The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from. -Brett

Nick Coghlan

10:05 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

On Sat, Feb 21, 2009 at 11:43, mailto:glyph@divmod.com> wrote:

On 07:07 pm, brett@python.org mailto:brett@python.org wrote:

On Sat, Feb 21, 2009 at 09:17, Jean-Paul Calderone mailto:exarkun@divmod.com>wrote:

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

This is really the same as any other high-level/low-level library split. It doesn't matter that in this case, one low-level implementation is provided as an extension module. Importing the low-level APIs from another module and then using them to implement high-level APIs is a pretty common, simple, well-understood technique which is quite applicable here.

But that doesn't provide a clear way, short of screwing with sys.modules, to get at just the pure Python implementation for testing when the extensions are also present. The key point in trying to figure this out is to facilitate testing since the standard library already uses the import * trick in a couple of places.

You don't have to screw with sys.modules. The way I would deal with testing this particular interaction would be a setUp that replaces pickle._A with _pypickle._A, and a tearDown that restores the original one.

Twisted's TestCase has specific support for this. You would spell it like this:

import _pypickle # ... testCase.patch(pickle, '_A', _pypickle._A)

You can read more about this method here:

http://python.net/crew/mwh/apidocs/twisted.trial.unittest.TestCase.html#patc...

My worry with this approach is that while this works nicely if you are only overriding a single function, having to do this for all functions and classes in order to make sure you are testing the extension code with all the extension code instead of intermingled extension/Python code. So a function that did this automatically for the entire module would be needed, which is like what I proposed in my use_extension function.

I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

I'd actually say there's a third option which is still viable: continue with the current Foo/_Foo practice for optimised modules, and provide a function in test.support to get the original Python version's code out of Foo. That actually has the virtue of directly testing that the ImportError for the missing module is being trapped correctly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Aahz

11:46 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009, Brett Cannon wrote:

...

I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about threads. Alternatively, sys.python_only could be a set, but that gets ugly about setting from the application. (The module checks to see whether it's listed in sys.python_only.) Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

Brett Cannon

22 Feb 22 Feb

12:56 a.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009 at 15:46, Aahz wrote:

...

On Sat, Feb 21, 2009, Brett Cannon wrote:

...
I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about threads.

Alternatively, sys.python_only could be a set, but that gets ugly about setting from the application. (The module checks to see whether it's listed in sys.python_only.)

Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions?

This is all about testing. If a change is made to some extension code it should be mirrored in the Python code and vice-versa. -Brett

Aahz

4:12 a.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009, Brett Cannon wrote:

...

On Sat, Feb 21, 2009 at 15:46, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about threads.

Alternatively, sys.python_only could be a set, but that gets ugly about setting from the application. (The module checks to see whether it's listed in sys.python_only.)

Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions?

This is all about testing. If a change is made to some extension code it should be mirrored in the Python code and vice-versa.

Okay, I don't see how that is a response to my suggestion -- I can imagine that someone might want to test a combination of pure-Python and binary libraries. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

Brett Cannon

9:54 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sat, Feb 21, 2009 at 20:12, Aahz wrote:

...

On Sat, Feb 21, 2009, Brett Cannon wrote:

...
On Sat, Feb 21, 2009 at 15:46, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about threads.

Alternatively, sys.python_only could be a set, but that gets ugly about setting from the application. (The module checks to see whether it's listed in sys.python_only.)

Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions?

This is all about testing. If a change is made to some extension code it should be mirrored in the Python code and vice-versa.

Okay, I don't see how that is a response to my suggestion -- I can imagine that someone might want to test a combination of pure-Python and binary libraries.

I don't want to move it because this isn't some idea for a new feature that may or may not be useful; this isn't an "idea", it's needed. -Brett

Aahz

23 Feb 23 Feb

6:41 a.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sun, Feb 22, 2009, Brett Cannon wrote:

...

On Sat, Feb 21, 2009 at 20:12, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
On Sat, Feb 21, 2009 at 15:46, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about threads.

Alternatively, sys.python_only could be a set, but that gets ugly about setting from the application. (The module checks to see whether it's listed in sys.python_only.)

Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions?

This is all about testing. If a change is made to some extension code it should be mirrored in the Python code and vice-versa.

Okay, I don't see how that is a response to my suggestion -- I can imagine that someone might want to test a combination of pure-Python and binary libraries.

I don't want to move it because this isn't some idea for a new feature that may or may not be useful; this isn't an "idea", it's needed.

That's fine, but what about my idea of using sys.python_only? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

Brett Cannon

7:02 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sun, Feb 22, 2009 at 22:41, Aahz wrote:

...

...
On Sat, Feb 21, 2009 at 20:12, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
On Sat, Feb 21, 2009 at 15:46, Aahz wrote:

...
On Sat, Feb 21, 2009, Brett Cannon wrote:

...
I am seeing two approaches emerging. One is where pickle contains all Python code and then uses something like use_extension to make sure the original Python objects are still reachable at some point. This has the drawback that you have to use some function to make the extensions happen and there is some extra object storage.

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

To what extent do we care about being able to select Python-only on a per-module basis, particularly in the face of threaded imports? That is, we could have a sys.python_only attribute that gets checked on import. That's simple and direct, and even allows per-module switching if the application really cares and import doesn't need to worry about

...
...
...
...
Alternatively, sys.python_only could be a set, but that gets ugly

about

...
setting from the application. (The module checks to see whether it's listed in sys.python_only.)

Maybe we should move this discussion to python-ideas for now to kick around really oddball suggestions?

This is all about testing. If a change is made to some extension code it should be mirrored in the Python code and vice-versa.

Okay, I don't see how that is a response to my suggestion -- I can imagine that someone might want to test a combination of pure-Python and binary libraries.

I don't want to move it because this isn't some idea for a new feature

On Sun, Feb 22, 2009, Brett Cannon wrote: threads. that

...
may or may not be useful; this isn't an "idea", it's needed.

That's fine, but what about my idea of using sys.python_only?

But what is it supposed to represent? That only pure Python modules get imported? What if the module depends on another module that is an extension module? -Brett

Nick Coghlan

12:02 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

I don't want to move it because this isn't some idea for a new feature that may or may not be useful; this isn't an "idea", it's needed.

It is needed, but it's only really needed in the test suite. The "sys.modules hackery" needed to get a Python-only version using the existing idiom really isn't that complicated and the associated import behaviour is perfectly well defined (putting a 0 in sys.modules may currently be a bit questionable, but I'd prefer to make sure that is officially supported with the desired effect rather than trying to define a new idiom for the actual library code for handling optional optimised extension modules). So, I'm still not seeing any significant problem with providing a utility function in test.support that hides that hackery and returns the pure Python version of the module. For example, a version that allows any number of extension modules to be suppressed when importing a module (defaulting to the Foo/_Foo naming): import sys def import_python_only(mod_name, *ext_names): if not ext_names: ext_names = (("_" + mod_name),) orig_modules = {} if name in sys.modules: orig_modules[name] = sys.modules[name] del sys.modules[name] try: for name in ext_names: orig_modules[name] = sys.modules[name] sys.modules[name] = 0 py_module = importlib.import_module(mod_name) finally: for name, module in orig_modules.items(): sys.modules[name] = module return py_module Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Brett Cannon

7:05 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Mon, Feb 23, 2009 at 04:02, Nick Coghlan wrote:

...

Brett Cannon wrote:

...
I don't want to move it because this isn't some idea for a new feature that may or may not be useful; this isn't an "idea", it's needed.

It is needed, but it's only really needed in the test suite. The "sys.modules hackery" needed to get a Python-only version using the existing idiom really isn't that complicated and the associated import behaviour is perfectly well defined (putting a 0 in sys.modules may currently be a bit questionable, but I'd prefer to make sure that is officially supported with the desired effect rather than trying to define a new idiom for the actual library code for handling optional optimised extension modules).

So, I'm still not seeing any significant problem with providing a utility function in test.support that hides that hackery and returns the pure Python version of the module.

Well, neither do I as your proposed approach below is what I do for warnings. But I think you and I, Nick, are more comfortable with mucking with imports than most people. =) I think some people early on in this thread said they didn't like the idea of screwing around with sys.modules. But doing that along with an import * from the extension module is probably the simplest solution. -Brett

...

For example, a version that allows any number of extension modules to be suppressed when importing a module (defaulting to the Foo/_Foo naming):

import sys def import_python_only(mod_name, *ext_names): if not ext_names: ext_names = (("_" + mod_name),) orig_modules = {} if name in sys.modules: orig_modules[name] = sys.modules[name] del sys.modules[name] try: for name in ext_names: orig_modules[name] = sys.modules[name] sys.modules[name] = 0 py_module = importlib.import_module(mod_name) finally: for name, module in orig_modules.items(): sys.modules[name] = module return py_module

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Steven Bethard

7:45 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Mon, Feb 23, 2009 at 04:02, Nick Coghlan wrote:

...

For example, a version that allows any number of extension modules to be suppressed when importing a module (defaulting to the Foo/_Foo naming):

import sys def import_python_only(mod_name, *ext_names): if not ext_names: ext_names = (("_" + mod_name),) orig_modules = {} if name in sys.modules: orig_modules[name] = sys.modules[name] del sys.modules[name] try: for name in ext_names: orig_modules[name] = sys.modules[name] sys.modules[name] = 0 py_module = importlib.import_module(mod_name) finally: for name, module in orig_modules.items(): sys.modules[name] = module return py_module

On Mon, Feb 23, 2009 at 11:05 AM, Brett Cannon wrote:

...

Well, neither do I as your proposed approach below is what I do for warnings. But I think you and I, Nick, are more comfortable with mucking with imports than most people. =) I think some people early on in this thread said they didn't like the idea of screwing around with sys.modules. But doing that along with an import * from the extension module is probably the simplest solution.

+1 for something like Nick's code. No one has to know they're mucking around with sys.modules - they just have to use the import_python_only() function. (And I haven't seen anything in this thread that's really any better.) Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Nick Coghlan

9:23 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

Well, neither do I as your proposed approach below is what I do for warnings.

It's possible I actually had test_warnings.py open in another window while writing that example function... ;) As Steven said, your concerns are precisely why I'm suggesting hiding this in a helper function - so people that aren't quite as comfortable playing games with sys.modules can still use it to suppress particular extension modules when writing tests. Initially for the Python regression test suite only, but perhaps eventually in importlib if we're happy with the way it works out for us. Created http://bugs.python.org/issue5354 and assigned it to myself so we don't forget about it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Brett Cannon

10:12 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Mon, Feb 23, 2009 at 13:23, Nick Coghlan wrote:

...

Brett Cannon wrote:

...
Well, neither do I as your proposed approach below is what I do for warnings.

It's possible I actually had test_warnings.py open in another window while writing that example function... ;)

As Steven said, your concerns are precisely why I'm suggesting hiding this in a helper function - so people that aren't quite as comfortable playing games with sys.modules can still use it to suppress particular extension modules when writing tests. Initially for the Python regression test suite only, but perhaps eventually in importlib if we're happy with the way it works out for us.

Sounds like a plan.

...

Created http://bugs.python.org/issue5354 and assigned it to myself so we don't forget about it.

If we do end up going with this approach I am willing to help out with moving the standard library over. -Brett

James Pye

22 Feb 22 Feb

5:38 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Feb 21, 2009, at 2:17 PM, Brett Cannon wrote:

...

The other approach is having pickle contain code known not to be overridden by anyone, import _pypickle for stuff that may be overridden, and then import _pickle for whatever is available. This approach has the perk of using a standard practice for how to pull in different implementation. But the drawback, thanks to how globals are bound, is that any code pulled in from _pickle/_pypickle will not be able to call into other optimized code; it's a take or leave it once the call chain enters one of those modules as they will always call the implementations in the module they originate from.

fwiw, I believe this is the approach that I've been using when I'm faced with the need to optimize code, but still want to retain a pure-Python version. Thankfully, I haven't had a need for "implementation intersections"(well, it almost works. I think. ;) for access to common module globals as the optimizations tend to be simple transformations or isolated classes. However, if I were faced with the problem of needing some common global data/functionality, I'd probably put it in yet-another-module and just import it explicitly in each implementation. Sure, it seems like it might be annoying, but so is maintaining multiple implementations. ;) Specifically: pbuffer.py - The python implementation buffer.c -> cbuffer.so - The c implementation buffer.py - The "abstraction module", trying to import the contents of the fast one first. And in my unittest: if c_buffer_module is not None: class c_buffer(buffer_test, unittest.TestCase): bufferclass = c_buffer_module.pq_message_stream class p_buffer(buffer_test, unittest.TestCase): bufferclass = p_buffer_module.pq_message_stream Of course, "buffer_test" is not invoked because it's not a TestCase. However, Aahz is probably right about this thread belonging elsewhere.? Hrm, day old, maybe it's been moved already.. sigh. :)

Steven Bethard

6:16 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Fri, Feb 20, 2009 at 1:45 PM, Brett Cannon wrote:

...

But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

Maybe I've missed this and someone else already suggested it, but couldn't we provide a (probably C-coded) function ``replace_globals(module, globals)`` that would, say, replace the globals in _pypickle with the globals in pickle? Then you could write something like:: from _pypickle import * try: from _pickle import * module = __import__('_pickle') except ImportError: module = __import__('_pypickle') replace_globals(module, globals()) Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Michael Foord

6:29 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Steven Bethard wrote:

...

On Fri, Feb 20, 2009 at 1:45 PM, Brett Cannon wrote:

...
But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

Maybe I've missed this and someone else already suggested it, but couldn't we provide a (probably C-coded) function ``replace_globals(module, globals)`` that would, say, replace the globals in _pypickle with the globals in pickle? Then you could write something like::

from _pypickle import * try: from _pickle import * module = __import__('_pickle') except ImportError: module = __import__('_pypickle') replace_globals(module, globals())

Steve

Swapping out module globals seems to be a backwards way to do things to me. Why not have one set of tests that test the low level APIs (identical tests whether they are written in C or Python) - and as they live in their own module are easy to test in isolation. And then a separate set of tests for the higher level APIs, which can even mock out the low level APIs they use. There shouldn't be any need for switching out objects in the scope of the lower level APIs - that seems like a design smell to me. Michael -- http://www.ironpythoninaction.com/

Brett Cannon

10:28 p.m.

New subject: Choosing a best practice solution for Python/extension modules

On Sun, Feb 22, 2009 at 10:29, Michael Foord wrote:

...

Steven Bethard wrote:

...
On Fri, Feb 20, 2009 at 1:45 PM, Brett Cannon wrote:

...
But there is another issue with this: the pure Python code will never call the extension code because the globals will be bound to _pypickle and not _pickle. So if you have something like::

# _pypickle def A(): return _B() def _B(): return -13

# _pickle def _B(): return 42

# pickle from _pypickle import * try: from _pickle import * except ImportError: pass

If you import pickle and call pickle.A() you will get -13 which is not what you are after.

Maybe I've missed this and someone else already suggested it, but couldn't we provide a (probably C-coded) function ``replace_globals(module, globals)`` that would, say, replace the globals in _pypickle with the globals in pickle? Then you could write something like::

from _pypickle import * try: from _pickle import * module = __import__('_pickle') except ImportError: module = __import__('_pypickle') replace_globals(module, globals())

Steve

Swapping out module globals seems to be a backwards way to do things to me.

I agree; I would rather muck with sys.modules at that point. Why not have one set of tests that test the low level APIs (identical tests

...

whether they are written in C or Python) - and as they live in their own module are easy to test in isolation. And then a separate set of tests for the higher level APIs, which can even mock out the low level APIs they use. There shouldn't be any need for switching out objects in the scope of the lower level APIs - that seems like a design smell to me.

That's possible. As I have said, my only worry with the separate py/extension module approach is that any time someone wants to do an extension version of something the Python code will need to be moved. But at this point I am honestly burning out on this topic (got a lot on my plate right now) so I am going to let somebody else lead this to the finish line. =) -Brett

Greg Ewing

20 Feb 20 Feb

10:02 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

So while this alleviates the worry above, it does mean that anything that gets rewritten needs to have a name that does not lead with an underscore for this to work.

You can use an __all__ list to explicitly say what is to be exported. -- Greg

Michael Foord

8:50 p.m.

New subject: Choosing a best practice solution for Python/extension modules

Brett Cannon wrote:

...

On Fri, Feb 20, 2009 at 12:31, Daniel Stutzbach mailto:daniel@stutzbachenterprises.com> wrote:

On Fri, Feb 20, 2009 at 1:44 PM, Brett Cannon mailto:brett@python.org> wrote:

Now, from what I can tell, Antoine is suggesting having _pyio and a _io and then io is simply:

try: from _io import * except ImportError: from _pyio import *

That works for testing as you can then have test classes have an attribute for the module to use and then create two subclasses which set what module to use (kind of like how test_warnings currently does it). But this only really works for complete module replacements, not modules like pickle where only key portions have been rewritten (which happens more often than the complete rewrite).

A slight change would make it work for modules where only key functions have been rewritten. For example, pickle.py could read:

from _pypickle import * try: from _pickle import * except ImportError: pass

True, although that still suffers from the problem of overwriting things like __name__, __file__, etc.

What do you mean overwriting __name__ and __file__? Doing import * in a pure Python file doesn't override these. Michael

...

-Brett

------------------------------------------------------------------------

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...

-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog

Raymond Hettinger

8:33 p.m.

New subject: Choosing a best practice solution for Python/extensionmodules

[Brett]

...

With io getting rewritten as an extension module, I think it's time to try to come up with a good best practice scenario for how to be able to control when a module uses a pure Python implementation and when it uses extension module optimizations. This is really only important for testing as if the extension is missing then the pure Python version is just flat-out used.

There is also need in some modules where the two are not exactly equivalent or where there are multiple C extensions to choose from. In PyYAML, there needs to be an easier way to switch parsers and emitters (i.e. LibYAML). There are similar issues with xmlrpclib with the choice of parsers, marshallers, and unmarshallers. Possibly, the same mechanism can offer the user more control over which dbm is used when there are several choices. Raymond

Brett Cannon

8:41 p.m.

New subject: Choosing a best practice solution for Python/extensionmodules

On Fri, Feb 20, 2009 at 12:33, Raymond Hettinger wrote:

...

[Brett]

...
With io getting rewritten as an extension module, I think it's time to try to come up with a good best practice scenario for how to be able to control when a module uses a pure Python implementation and when it uses extension module optimizations. This is really only important for testing as if the extension is missing then the pure Python version is just flat-out used.

There is also need in some modules where the two are not exactly equivalent or where there are multiple C extensions to choose from. In PyYAML, there needs to be an easier way to switch parsers and emitters (i.e. LibYAML). There are similar issues with xmlrpclib with the choice of parsers, marshallers, and unmarshallers. Possibly, the same mechanism can offer the user more control over which dbm is used when there are several choices.

So are you saying you want something that takes multiple arguments like ``use_extension(py_name, *ext_names)``? Or are you wanting to go down the road of modules needing to define their own functions to use or unwind changes? -Brett

Raymond Hettinger

9:09 p.m.

New subject: Choosing a best practice solution for Python/extensionmodules

I don't have a particular solution mind. Just wanted to reframe the question to be a more general one about the controlling the selection between near equivalent modules and extensions. Some variant of the problem seems to come-up in many different contexts. No one best practice has emerged as dominant. ----- Original Message ----- From: Brett Cannon To: Raymond Hettinger Cc: Python Dev Sent: Friday, February 20, 2009 12:41 PM Subject: Re: [Python-Dev] Choosing a best practice solution for Python/extensionmodules On Fri, Feb 20, 2009 at 12:33, Raymond Hettinger wrote: [Brett] With io getting rewritten as an extension module, I think it's time to try to come up with a good best practice scenario for how to be able to control when a module uses a pure Python implementation and when it uses extension module optimizations. This is really only important for testing as if the extension is missing then the pure Python version is just flat-out used. There is also need in some modules where the two are not exactly equivalent or where there are multiple C extensions to choose from. In PyYAML, there needs to be an easier way to switch parsers and emitters (i.e. LibYAML). There are similar issues with xmlrpclib with the choice of parsers, marshallers, and unmarshallers. Possibly, the same mechanism can offer the user more control over which dbm is used when there are several choices. So are you saying you want something that takes multiple arguments like ``use_extension(py_name, *ext_names)``? Or are you wanting to go down the road of modules needing to define their own functions to use or unwind changes? -Brett

Kevin Teague

21 Feb 21 Feb

11:35 p.m.

New subject: Choosing a best practice solution for Python/extension modules

...

So go ahead and tear this apart so that we can hopefully reach a consensus that makes sense so that at least testing can easily be done.

If I was developing an application and wanted to deal with two different versions of the same library, I would simply make sure that the version I wanted to use was first on sys.path. Perhaps something such as: lib/python-3.0/libdynload/ # extension module implementations lib/python-3.0/libpython/ # pure python implementations Then in the test suite simply ensure that either the Python implementation or the C implementation is first on sys.path. Both directories would contain an _pickle.py module, and then pickle.py could be changed from: try: from _pickle import * except ImportError: Pickler, Unpickler = _Pickler, _Unpickler To just: from _pickle import Pickler, Unpickler By default libdynload would be first on sys.path so that extension modules would be imported first if available, otherwise it would fallback to the pure Python versions. The test suite could then add/ remove libdynload from sys.path as needed. Well, OK, so this is a pretty big change as to how standard lib files are structured - so maybe there are quite a few reasons why this isn't feasable ... but it does make things a lot simpler and gets rid of the need for performing any magic with the loaded modules in the test suite.

5534

Age (days ago)

5537

Last active (days ago)

List overview

Download

37 comments

12 participants

participants (12)

Aahz
Brett Cannon
Daniel Stutzbach
glyph＠divmod.com
Greg Ewing
James Pye
Jean-Paul Calderone
Kevin Teague
Michael Foord
Nick Coghlan
Raymond Hettinger
Steven Bethard

Choosing a best practice solution for Python/extension modules

Daniel Stutzbach

Daniel Stutzbach

glyph＠divmod.com

James Pye

tags

participants (12)