change of strategy for the py3k branch?

Hi all, after some months of work on the py3k branch, I realized that the current strategy/workflow does not scale well and thus I'd like to change it. For those who are not aware, currently we have the default branch where the main development is done, and which includes code for both the rpython translator toolchain and the python 2 interpreter. The py3k branch does not touch the translator toolchain, but modifies the python 2 interpreter to make it a py3k interpreter. These changes are actually destroying the python 2 semantics, which means that the long term goal is to never merge py3k to default, but keeping the development in parallel, and regularly merge default into py3k to make sure that py3k gets benefits of the various improvements in the JIT, GC, etc. In the past months, I ended up spending a considerable amount of time in resolving merge conflicts. This happens all the time that someone modifies something in the python 2 interpreter, for example to apply a new cool optimizations. While on the one hand it is cool to automatically have new cool optimizations on py3k, on the other hand it is blocker which stops me to work on it effectively. After a bit of discussion on IRC, I propose to solve this problem by detaching the development of the py3k interpreter from the development of the python 2 one. Pros: - faster development of py3k - lower entry barrier for new contributors, because the relationship between the various parts will be much simpler - it will be straightforward to apply the new features of the translator toolchain to the py3k branch - it will be easier to split the toolchain from the actual interpreter the day we will finally decide to do it Cons: - we will need to manually port the optimizations done in the interpreter on the default branch to py3k. Note that right now it's now really "automatic" anyway, because merging is painful. If we decide to go for this route, the next question is: where to store the code? I think there are two main solutions: 1) add a new "pypy/py3k" directory where to copy all the relevant modules. E.g. pypy/py3k/interpreter, pypy/py3k/objspace/std, "pypy/py3k/modules. 2) start a completely new repository which contains only the code for py3k. Solution (2) is better and cleaner in theory. However I fear it would soon become a mess to handle, because every change in the translator toolchain would potentially break py3k. I don't want a situation in which we say "yes, you can build py3k but only if you take revision XXX and you use revision YYY of the toolchain, unless the phase of the moon is empty". Solution (1) is more practical and it would probably lead to less problems in the short term. For now, I would still keep the code in the py3k branch, so the normal development of pypy would not be affected. Before doing it, I'd like to hear opinions and comments, in particular of people who already worked on py3k and/or are generally interested in it. Please be constructive :-). ciao, Anto

2012/5/30 Antonio Cuni <anto.cuni@gmail.com>
2) start a completely new repository which contains only the code for py3k.
How is this different from the current py3k branch? We could also just decide to never merge the default branch, or merge only after a release of the main PyPy version. -- Amaury Forgeot d'Arc

Hi Amaury, On 05/30/2012 11:23 AM, Amaury Forgeot d'Arc wrote:
the difference is that you would get the improvements in translator toolchain for free. See also my point below.
We could also just decide to never merge the default branch, or merge only after a release of the main PyPy version.
possibly, but delaying the merge would make it even more painful. The risk is that it'll become so painful that nobody will feel like doing it, and thus we diverge more and more. At the end, we end up with a py3k branch which can't make use of the cool new features of the JIT/GC/etc. and that will always lack behind python 2. Another point of view is that IMHO porting the changes by doing merges is harder/more time consuming than porting them by hand. ciao, Anto

On Wed, May 30, 2012 at 11:42 AM, Antonio Cuni <anto.cuni@gmail.com> wrote:
Hi Anto. I think 1) is a no-no for me. This would first mean we have py3k in the default checkout (why???) and also that we need to make sure that py3k tests pass all the time (they don't pass to start with). I don't see this being any beneficial to the current model. Besidies, this also means we'll never upgrade rpython to py3k (which might be a good thing, just saying). Overall I'm very against pushing *any* burden towards other pypy devs, we have quite enough work. How about you start with detaching interpreter and translation toolchain so those things can leave separately? Cheers, fijal

On 05/30/2012 11:49 AM, Maciej Fijalkowski wrote:
no, it would leave in the py3k branch. No merging to default unless there is a consensus.
Besidies, this also means we'll never upgrade rpython to py3k (which might be a good thing, just saying).
nothing would stops us to take the interpreter/ from py3k and port rpython to python 3, although I'm not sure if it would be a good idea (euphemism :-)). But this is orthogonal to this discussion.
Overall I'm very against pushing *any* burden towards other pypy devs, we have quite enough work.
keeping it in the py3k branch would not change anything for people who don't care about py3k. That's why I asked for opinions of people who care :-)
no. This would be too much work for little benefit from the py3k point of view. Of course if it happens independently then py3k would benefit of it, but the task itself is not on top of my priorities. ciao, Anto

2012/5/30 Antonio Cuni <anto.cuni@gmail.com>
Another point of view is that IMHO porting the changes by doing merges is harder/more time consuming than porting them by hand.
We probably don't have the same view on merges then :) I consider that when a merge is successful (no conflict), it's a win. And conflicts are markers to say "hey, port this change by hand". Of course, it would be better if we could just merge the translator/ or jit/ directories on a regular basis, and come back later to merge (or port) changes from the interpreter/ and objspace/ directories. But hg does not seem to allow this. I estimate that I spent ~2h on each merge from default to py3k. If one merge per month is enough, it's a task I can definitely find time for. -- Amaury Forgeot d'Arc

On 05/30/2012 12:53 PM, Amaury Forgeot d'Arc wrote:
yes, I also consider it a win. However, it happens rarely. Maybe it's just me that can't handle them, but conflicts markers are usually put in places which makes it very difficult to understand what's going on. I always end up at looking at the unchanged files before the merge and the diff in the branch, forgetting the markers.
yes. That would be perfect, but we can't :-(
I estimate that I spent ~2h on each merge from default to py3k. If one merge per month is enough, it's a task I can definitely find time for.
My impression is that the time spent for merges is increasing, and this is not surprising because the two branches are slowly diverging. Anyway, if you volunteer to do the merges regularly I won't certainly stop you :-). We can always do the split later. ciao, Anto

Hi Antonio, On Wed, May 30, 2012 at 1:35 PM, Antonio Cuni <anto.cuni@gmail.com> wrote:
Then we can probably arrange things so that we use "translate.py" from default, and not from the "py3k branch", which would be stripped of the translation parts. More precisely, we could organize this "py3k branch" --- quotes, because likely living then in another repo --- with an only marginally different directory structure: e.g. call the top-level directory "py3k" instead of "pypy". Then you would use the default's "translate.py" to translate it, without getting conflicts between "pypy.interpreter" as used by translate.py and the new "py3k.interpreter" containing what you are translating. Of course the directories that would be in the py3k package would still have the same name as their original ones, so that we keep open the possibility to do merges without adding yet another layer of troubles. A bientôt, Armin.

On 05/30/2012 04:04 PM, Armin Rigo wrote:
uhm, that's an interesting possibility, I didn't think of it. I wonder if mercurial handles merges well if we rename the top-level directory. To make things cleaner and easier to understand, we should probably also "hg rm" from py3k/ the directories which belongs to the toolchain, just to avoid confusion. I think that in this case at each merge mercurial would ask what to do with file X which has been deleted locally but changed remotely, but this is probably something that we can handle. As I said earlier, the drawback of such "decoupling" solutions is that as soon as you have two separate repos, you'll get troubles such as "you can translate revision XXX only if the pypy repo is at version YYY", which can be frustrating especially when you want to go back in the history. In theory mercurial subrepos are supposed to solve this problem, but in practice we should stay as far as we can from them :-(. Amaury: opinions on Armin's proposed solution?

On 5/31/12 10:29 AM, Antonio Cuni wrote:
Hi, I really have to second that last statement: Subrepos are a red herring for me, after having lots of trouble in our much smaller project.. The idea is qood, but not in the next half or one year at least. -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi there, Just throwing in my little bit: any change that is made that would make it easier to run Python 2 and Python 3 interpretors in the same process would interesting, as I'm still vaguely dreaming (nothing more) of a combined interpreter that can run both Python 2 and Python 3 code. Regards, Martijn

On Wed, May 30, 2012 at 7:24 PM, Martijn Faassen <faassen@startifact.com>wrote:
Hi Martijn. Can you describe what sort of semantics you have in mind? Would you like to have two copies of builtin modules? How about namespaces? What about objects being passed from one interpreter to the another? Would they magically change or would they be "py2k dict" and "py3k dict"? If you can describe the semantics of a proposed beast I'm willing to answer how likely it is to happen Cheers, fijal

On 31 May 2012 04:42, Maciej Fijalkowski <fijall@gmail.com> wrote:
I think we already discussed this at one point, here is what I remember getting out of it: * Any such language integration that we do encourages people to write pypy-only programs. There was a question as to whether this was a good idea. I think someone suggested it could go further than python 2/3 and allow interaction with scheme or prolog or javascript since we are already there. * There probably are arguments around semantics, but any solution is better than no solution. This is a good topic for further research imao. * It is worthwhile considering the effect it has on python 3 uptake and porting. If pypy gave people an easy way out, it could have made quite a mess. I don't think this is as significant a problem now as it has been. And of course if you don't want to do language integration, but just add eg a command line switch, you're not getting much out of it but the cost is significant, it means users have to co-ordinate the upgrades of two languages, it increases translation and testing time, etc. ---------------------------------------- By the way, I like solution 1; it's a bit closer to the way pypy/lang was done. I get the cases for moving the other languages away, but python 3 is different because so much of the existing code can be re-used. -- William Leslie

Hey,
Can you describe what sort of semantics you have in mind?
Sure, I've discussed them before. The goal would be to have a Python 3 based project and use Python 2 modules/packages, or the other way around. That way it should become much easier to adopt Python 3, even in existing projects. To this end you'd need to have import magic that'd go across Pythons: # in python 3 foo = python2_import(''foo') # in python 2 bar = python3_import('bar') These would import the modules in the appropriate interpreter, and then wrap them in such a way that they become usable from the other interpreter. Later, you could come up with more sophisticated ways to designate a module "python 2" or "python 3" so that the normal 'import' statement will do something equivalent to the above (if you happen to know there are no namespace conflicts).
Would you like to have two copies of builtin modules?
Yes, a separate copy for each interpreter.
How about namespaces?
Yes, module name spaces should be separate. If you want to make a Python 2 library available in Python 3 you can use the import magic to do so.
They would be wrapped. I understand PyPy supports perfect proxies (I've seen the network-based demonstration). So you'd wrap a Python 3 object in a Python 2 wrapper, and vice versa. So a Python 3 proxy for a Python 2 object would: * make sure any attribute accesses are translated to Python 3 objects. (for immutables, a straight conversion is enough, otherwise a proxy) * a method proxy would make sure that any arguments are proxied from Python 3 to Python 2 (or straight conversion in case of an immutable if that'd be faster. Or a proxy unwrapping in case you are dealing with a Python 2 to 3 proxy already), and any return values are proxied from Python 2 to Python 3. The proxies for various built-ins such as dict would of course make sure that method calls are translated. You need to able to be able to declare various things about arguments and return values in some tricky cases like where a Python 2 string is involved; is it to be interpreted as a Python 3 string or a Python 3 bytes? Declarations could go into a central registry that is consulted by the proxy-ing mechanism, we can come up with nicer syntax later. The idea is that you could make declarations about a Python 2 library externally so you can use it within a Python 3 context. One way to think about this is a FFI from Python to Python. You'd need Python 2 to 3 proxies, Python 3 to 2 proxies, and various proxy wrapping and unwrapping rules. Regards, Martijn

Hi Martijn, On Wed, May 30, 2012 at 19:24 +0200, Martijn Faassen wrote:
Is there a strong reason you want this in the same process? If not you might look into using execnet [1] for connecting python2 and python3 interpreters which then run in two separate processes. One can build something higher level on top of the base execnet communication along the lines of your "python3_import" suggestion. It seems you anyway need largely disconnected interpreter states. On a sidenote, Quora uses execnet to connect python2 and PyPy [2]. best, holger [1] http://codespeak.net/execnet/example/hybridpython.html [2] http://www.quora.com/Quora-Infrastructure/Did-Quoras-switch-to-PyPy-result-i...

On Thu, May 31, 2012 at 11:12 AM, holger krekel <holger@merlinux.eu> wrote:
That's an interesting idea. I don't think it would accomplish exactly the same goals though. I can see two reasons not to do so: * developer simplicity: you just start up your Python 3 as usual, you can now use Python 2 modules in your project. No need to come up with a networked system. * efficiency: if you do a lot of calls using a Python 2 library in a Python 3 project, you'd like this to work pretty quickly. I think using in-process proxies can be made to be quite inexpensive. I think a networked approach is useful if you want communicating applications, but I'm talking more about a single application that uses libraries that might be written in another language. That is why I like the FFI analogy; when you interface with a C library from Python you generally also wouldn't want to use a networked approach. You *would* do this to interface with a networked application written in C. To use libxml2 in Python I'd use lxml. If there's already a C-based web service that uses libxml2 I'd use that service, but that's a different situation. I wouldn't want to have to *have* to do this just to be able to use libxml2. I think execnet blurs the line between library and application integration somewhat, as it allows very intimately communicating applications, but isn't the line still there? Regards, Martijn

On Thu, May 31, 2012 at 11:27 +0200, Martijn Faassen wrote:
not sure i understand what you mean with "networked" system here. With gw = execnet.makegateway("python3") a subprocess is created running with python3. It's true that the channel send/receive uses a network metapher but underlying is process-to-process communication, no network involved.
it all depends i guess. It seems that for Quora it was fast enough to call from PyPy into several libraries deployed on cpython. Moreover, if you need to munge data coming out from library function calls the proxy approach may require a lot of communication between the two interpreters and even if this happens in-process it is overhead. With execnet you can execute the munging code with the interpreter running the library and only send back the result you need. (On a side note, with a proxy approach you also need to carefully design lifecycle/GC issues for out-of-interpreter references).
There is a line, sure. It is blurred because one side can send code to the other. Which makes a difference in a similar way how sending Javascript to the client makes a difference - it reduces communication overhead and makes things faster on the client side. To conclude, i wouldn't be overly concerned by process-to-subprocess communication costs. If i had to combine py3 and py2 code (throw PyPy in to your likening) i'd go down the Quora route and see how far it carries. After all, this is only an intermediate solution until everthing happily runs on Python3 anyway, right? ;) best, holger

Hi there, On Thu, May 31, 2012 at 11:18 PM, holger krekel <holger@merlinux.eu> wrote:
After all, this is only an intermediate solution until everthing happily runs on Python3 anyway, right? ;)
Well, I'm more thinking about it as a means to get to such a situation, not as an intermediate solution. If you have an integrated Python 2 and Python 3 in a single project you can decide to port code to Python 3 on a per-module basis instead of on a per-project basis. So I'm not any immediate need to run Python 3 libraries that I need to solve. Deciding to do so for existing projects at this point would cause quite a lot more hassle than I'm looking for. This is helping to retard Python 3 adoption, as only new projects can start using it. Concerning performance overhead of proxies, lifecycle issues would be tricky, though starting with the basic notion that a proxy is like another reference to that object in the interpreter would get you quite far, I think? I do suspect a proxy approach could be made to be very efficient. I don't know enough about execnet in practice to know how it would feel. It'd be nice if this were a documented way to integrate different interpreters. Regards, Martijn

Hi Martijn, hi Holger, On 6/1/12, Martijn Faassen <faassen@startifact.com> wrote:
Concerning performance overhead of proxies, lifecycle issues would be tricky
If, hypothetically speaking, there is someone interested in writing a PyPy solution where both a Python2 and a Python3 interpreter are running in the same process, then you gain the advantage of having only one GC to run both. At least it transparently solves the issues of lifetime and reference cycles. (You also have for free only one JIT, which can do cross-language optimizations like inlining a Python2 function into a Python3 context or vice-versa). I see these two points as benefits that you don't have in any multi-process solution. It would require some work on the PyPy side, and I'm not aware of anybody ready to invest time in that, but it shouldn't be particularly hard (once PyPy's Python3 interpreter is ready, and once people agree about which API to use to do cross-language calls.) A bientôt, Armin.

Hey Armin, Cool, having a shared GC and a shared JIT would be pretty neat features indeed! Regards, Martijn

On 4 June 2012 14:33, Armin Rigo <arigo@tunes.org> wrote:
Having multiple interpreter instances within a single process allows for lots of interesting possibilities. IronPython permits this and it was used by Resolver One - just one of the reasons (along with the GIL) that it would have been much harder to write Resolver One in CPython than in IronPython (although .NET was chosen as a platform before Python was chosen as an implementation language). Allowing Python 2 and Python 3 to live within the same process would be very interesting. All the best, Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On 6/4/12 4:26 PM, Michael Foord wrote:
I too like this idea very much, because _that_ removes the need for a decision which I really can't base upon taste, but module availabilities, PyPy-readiness etc. Such a bridge would be very cool and quite 'peace-making' if may say so. When thinking of it, a slightly crazy concern popped up: I just followed the pyvenv discussions. PEP 405 will not be back-ported to Python2.7. If we now have two interpreter versions in one binary, which one of the Janus-heads will lead decisions like pyvenv startup? just-semi-seriously ;-) -- chris -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

On 6/4/12 10:09 PM, Amaury Forgeot d'Arc wrote:
Sure, I was just kidding. But if the py3k support is very marginal, just enough to start an imported py2.7 interpreter, then this would be a way to have pyvenv running with Python2.7. ok I'll shut up/down now - chris -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi Armin, Martijn, On Mon, Jun 04, 2012 at 15:33 +0200, Armin Rigo wrote:
Good point, PyPy could indeed bring considerable performance and resource benefits and avoid distributed GC issues. Of course there also are deployment and API questions that could maybe better first be tackled outside of PyPy in the form of a prototype that ignores GC issues. I am wondering, however, how many people really have that py2/py3 need and how much they care about performance in such situations. OTOH, a good solution could trigger needs - that's how a lot of technology development works, anyway :) best, holger
A bientôt,
Armin.

Hi there, On Mon, Jun 4, 2012 at 4:27 PM, holger krekel <holger@merlinux.eu> wrote:
I am wondering, however, how many people really have that py2/py3 need
I'm not sure that's precisely the right question, as you indicate yourself already:
Indeed. Right now many people are developing Python 2 projects. They might be interested in a way to use Python 3 libraries in such a project. Right now some people are developing Python 3 projects. THey migh be interested in a way to use Python 2 libraries in such a project. This might in fact motivate more people to start a Python 3 project in the first place, as they aren't faced with having to port any libraries they may want to use. If you asked these people a question "would you like to be able to use Python 2 libraries in Python 3 projects" (or reversed) many of them might say 'yes'. But none of these people may feel they *need* a py2/py3 interpreter. :)
and how much they care about performance in such situations.
Yes, such people might say "yes, if there isn't any other impact on my project" (like performance). Of course there are other consequences to adopting PyPy in a project, still.. Bringing up this idea once every while is my little contribution to the Python 3 effort. At present I don't want to go any further. :) Regards, Martijn

Hi, 2012/6/4 Armin Rigo <arigo@tunes.org>
I did some experiments a few months ago, trying to embed two 2.7 interpreters in the same translation target (which could have slightly different options, say with/without rope strings). I stopped when I realized that it cannot work with a single pypy tree: if space.get_name() returns a RPython string in 2.7, it cannot return a RPython unicode in 3.2; and even then, the MethodOfFrozenPBCRepr class "assumes that all methods are the same function bound to different PBCs". Maybe we should retry with different pypy trees (i.e pypy/interpreter/ vs. pypy3k/interpreter/, and so on) But I'm sure we would like some objects to be compatible, for example PyFrame2.f_backref could be a PyFrame3. -- Amaury Forgeot d'Arc

Hi Amaury, On Mon, Jun 4, 2012 at 10:32 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
Not necessarily. At least not if the API presented to the user is that of a foreign function invocation library. We'd start with a no-sharing model and focus on enabling some data exchanges beyond the minimal "foreign data" boxes in both directions. A bientôt, Armin.

Antonio Cuni <anto.cuni@gmail.com> writes:
Any solution where improvements to the translator have to be ported manually is a recipe for trouble. The translator toolchain is and should always be shared. The interpreters, on the other hand, are where the merge difficulties come from. These are fundamentally diverging since the python 2 and 3 languages are actually quite different. Because of these two things, there are only two sane solutions, and (2) above is not one of them. Adding a py3k interpreter to the default branch is reasonable, and could lead to sharing common parts in a top level directory at some point, but it comes with a burden on the current python 2 development work. The other solution is to split the current pypy tree in two. Having a translator and an interpreter as separate repositories makes the translator more accessible as a tool, and projects to implement other languages' interpreters need only depend on it. This is the prettiest solution architecturally, but adds the burden on developers to match translator with compatible interpreters, which may or may not end up being a pain point. Anyways, a completely separate pypy3k repository is no better than the current situation, and I would argue that it's far worse. The pains you're experiencing trying to do merges today aren't even close to the kind of pain you'll experience trying to merge in interpreter improvements manually.

On 31 May 2012 15:20, Justin Bogner <mail@justinbogner.com> wrote:
Do remember that the translator actually requires the python 2 interpreter, that is a fundamental part of the way it works. So moving the translator into a different repository now also means maintaining two python 2 interpreters. -- William Leslie

2012/5/30 Antonio Cuni <anto.cuni@gmail.com>
2) start a completely new repository which contains only the code for py3k.
How is this different from the current py3k branch? We could also just decide to never merge the default branch, or merge only after a release of the main PyPy version. -- Amaury Forgeot d'Arc

Hi Amaury, On 05/30/2012 11:23 AM, Amaury Forgeot d'Arc wrote:
the difference is that you would get the improvements in translator toolchain for free. See also my point below.
We could also just decide to never merge the default branch, or merge only after a release of the main PyPy version.
possibly, but delaying the merge would make it even more painful. The risk is that it'll become so painful that nobody will feel like doing it, and thus we diverge more and more. At the end, we end up with a py3k branch which can't make use of the cool new features of the JIT/GC/etc. and that will always lack behind python 2. Another point of view is that IMHO porting the changes by doing merges is harder/more time consuming than porting them by hand. ciao, Anto

On Wed, May 30, 2012 at 11:42 AM, Antonio Cuni <anto.cuni@gmail.com> wrote:
Hi Anto. I think 1) is a no-no for me. This would first mean we have py3k in the default checkout (why???) and also that we need to make sure that py3k tests pass all the time (they don't pass to start with). I don't see this being any beneficial to the current model. Besidies, this also means we'll never upgrade rpython to py3k (which might be a good thing, just saying). Overall I'm very against pushing *any* burden towards other pypy devs, we have quite enough work. How about you start with detaching interpreter and translation toolchain so those things can leave separately? Cheers, fijal

On 05/30/2012 11:49 AM, Maciej Fijalkowski wrote:
no, it would leave in the py3k branch. No merging to default unless there is a consensus.
Besidies, this also means we'll never upgrade rpython to py3k (which might be a good thing, just saying).
nothing would stops us to take the interpreter/ from py3k and port rpython to python 3, although I'm not sure if it would be a good idea (euphemism :-)). But this is orthogonal to this discussion.
Overall I'm very against pushing *any* burden towards other pypy devs, we have quite enough work.
keeping it in the py3k branch would not change anything for people who don't care about py3k. That's why I asked for opinions of people who care :-)
no. This would be too much work for little benefit from the py3k point of view. Of course if it happens independently then py3k would benefit of it, but the task itself is not on top of my priorities. ciao, Anto

2012/5/30 Antonio Cuni <anto.cuni@gmail.com>
Another point of view is that IMHO porting the changes by doing merges is harder/more time consuming than porting them by hand.
We probably don't have the same view on merges then :) I consider that when a merge is successful (no conflict), it's a win. And conflicts are markers to say "hey, port this change by hand". Of course, it would be better if we could just merge the translator/ or jit/ directories on a regular basis, and come back later to merge (or port) changes from the interpreter/ and objspace/ directories. But hg does not seem to allow this. I estimate that I spent ~2h on each merge from default to py3k. If one merge per month is enough, it's a task I can definitely find time for. -- Amaury Forgeot d'Arc

On 05/30/2012 12:53 PM, Amaury Forgeot d'Arc wrote:
yes, I also consider it a win. However, it happens rarely. Maybe it's just me that can't handle them, but conflicts markers are usually put in places which makes it very difficult to understand what's going on. I always end up at looking at the unchanged files before the merge and the diff in the branch, forgetting the markers.
yes. That would be perfect, but we can't :-(
I estimate that I spent ~2h on each merge from default to py3k. If one merge per month is enough, it's a task I can definitely find time for.
My impression is that the time spent for merges is increasing, and this is not surprising because the two branches are slowly diverging. Anyway, if you volunteer to do the merges regularly I won't certainly stop you :-). We can always do the split later. ciao, Anto

Hi Antonio, On Wed, May 30, 2012 at 1:35 PM, Antonio Cuni <anto.cuni@gmail.com> wrote:
Then we can probably arrange things so that we use "translate.py" from default, and not from the "py3k branch", which would be stripped of the translation parts. More precisely, we could organize this "py3k branch" --- quotes, because likely living then in another repo --- with an only marginally different directory structure: e.g. call the top-level directory "py3k" instead of "pypy". Then you would use the default's "translate.py" to translate it, without getting conflicts between "pypy.interpreter" as used by translate.py and the new "py3k.interpreter" containing what you are translating. Of course the directories that would be in the py3k package would still have the same name as their original ones, so that we keep open the possibility to do merges without adding yet another layer of troubles. A bientôt, Armin.

On 05/30/2012 04:04 PM, Armin Rigo wrote:
uhm, that's an interesting possibility, I didn't think of it. I wonder if mercurial handles merges well if we rename the top-level directory. To make things cleaner and easier to understand, we should probably also "hg rm" from py3k/ the directories which belongs to the toolchain, just to avoid confusion. I think that in this case at each merge mercurial would ask what to do with file X which has been deleted locally but changed remotely, but this is probably something that we can handle. As I said earlier, the drawback of such "decoupling" solutions is that as soon as you have two separate repos, you'll get troubles such as "you can translate revision XXX only if the pypy repo is at version YYY", which can be frustrating especially when you want to go back in the history. In theory mercurial subrepos are supposed to solve this problem, but in practice we should stay as far as we can from them :-(. Amaury: opinions on Armin's proposed solution?

On 5/31/12 10:29 AM, Antonio Cuni wrote:
Hi, I really have to second that last statement: Subrepos are a red herring for me, after having lots of trouble in our much smaller project.. The idea is qood, but not in the next half or one year at least. -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi there, Just throwing in my little bit: any change that is made that would make it easier to run Python 2 and Python 3 interpretors in the same process would interesting, as I'm still vaguely dreaming (nothing more) of a combined interpreter that can run both Python 2 and Python 3 code. Regards, Martijn

On Wed, May 30, 2012 at 7:24 PM, Martijn Faassen <faassen@startifact.com>wrote:
Hi Martijn. Can you describe what sort of semantics you have in mind? Would you like to have two copies of builtin modules? How about namespaces? What about objects being passed from one interpreter to the another? Would they magically change or would they be "py2k dict" and "py3k dict"? If you can describe the semantics of a proposed beast I'm willing to answer how likely it is to happen Cheers, fijal

On 31 May 2012 04:42, Maciej Fijalkowski <fijall@gmail.com> wrote:
I think we already discussed this at one point, here is what I remember getting out of it: * Any such language integration that we do encourages people to write pypy-only programs. There was a question as to whether this was a good idea. I think someone suggested it could go further than python 2/3 and allow interaction with scheme or prolog or javascript since we are already there. * There probably are arguments around semantics, but any solution is better than no solution. This is a good topic for further research imao. * It is worthwhile considering the effect it has on python 3 uptake and porting. If pypy gave people an easy way out, it could have made quite a mess. I don't think this is as significant a problem now as it has been. And of course if you don't want to do language integration, but just add eg a command line switch, you're not getting much out of it but the cost is significant, it means users have to co-ordinate the upgrades of two languages, it increases translation and testing time, etc. ---------------------------------------- By the way, I like solution 1; it's a bit closer to the way pypy/lang was done. I get the cases for moving the other languages away, but python 3 is different because so much of the existing code can be re-used. -- William Leslie

Hey,
Can you describe what sort of semantics you have in mind?
Sure, I've discussed them before. The goal would be to have a Python 3 based project and use Python 2 modules/packages, or the other way around. That way it should become much easier to adopt Python 3, even in existing projects. To this end you'd need to have import magic that'd go across Pythons: # in python 3 foo = python2_import(''foo') # in python 2 bar = python3_import('bar') These would import the modules in the appropriate interpreter, and then wrap them in such a way that they become usable from the other interpreter. Later, you could come up with more sophisticated ways to designate a module "python 2" or "python 3" so that the normal 'import' statement will do something equivalent to the above (if you happen to know there are no namespace conflicts).
Would you like to have two copies of builtin modules?
Yes, a separate copy for each interpreter.
How about namespaces?
Yes, module name spaces should be separate. If you want to make a Python 2 library available in Python 3 you can use the import magic to do so.
They would be wrapped. I understand PyPy supports perfect proxies (I've seen the network-based demonstration). So you'd wrap a Python 3 object in a Python 2 wrapper, and vice versa. So a Python 3 proxy for a Python 2 object would: * make sure any attribute accesses are translated to Python 3 objects. (for immutables, a straight conversion is enough, otherwise a proxy) * a method proxy would make sure that any arguments are proxied from Python 3 to Python 2 (or straight conversion in case of an immutable if that'd be faster. Or a proxy unwrapping in case you are dealing with a Python 2 to 3 proxy already), and any return values are proxied from Python 2 to Python 3. The proxies for various built-ins such as dict would of course make sure that method calls are translated. You need to able to be able to declare various things about arguments and return values in some tricky cases like where a Python 2 string is involved; is it to be interpreted as a Python 3 string or a Python 3 bytes? Declarations could go into a central registry that is consulted by the proxy-ing mechanism, we can come up with nicer syntax later. The idea is that you could make declarations about a Python 2 library externally so you can use it within a Python 3 context. One way to think about this is a FFI from Python to Python. You'd need Python 2 to 3 proxies, Python 3 to 2 proxies, and various proxy wrapping and unwrapping rules. Regards, Martijn

Hi Martijn, On Wed, May 30, 2012 at 19:24 +0200, Martijn Faassen wrote:
Is there a strong reason you want this in the same process? If not you might look into using execnet [1] for connecting python2 and python3 interpreters which then run in two separate processes. One can build something higher level on top of the base execnet communication along the lines of your "python3_import" suggestion. It seems you anyway need largely disconnected interpreter states. On a sidenote, Quora uses execnet to connect python2 and PyPy [2]. best, holger [1] http://codespeak.net/execnet/example/hybridpython.html [2] http://www.quora.com/Quora-Infrastructure/Did-Quoras-switch-to-PyPy-result-i...

On Thu, May 31, 2012 at 11:12 AM, holger krekel <holger@merlinux.eu> wrote:
That's an interesting idea. I don't think it would accomplish exactly the same goals though. I can see two reasons not to do so: * developer simplicity: you just start up your Python 3 as usual, you can now use Python 2 modules in your project. No need to come up with a networked system. * efficiency: if you do a lot of calls using a Python 2 library in a Python 3 project, you'd like this to work pretty quickly. I think using in-process proxies can be made to be quite inexpensive. I think a networked approach is useful if you want communicating applications, but I'm talking more about a single application that uses libraries that might be written in another language. That is why I like the FFI analogy; when you interface with a C library from Python you generally also wouldn't want to use a networked approach. You *would* do this to interface with a networked application written in C. To use libxml2 in Python I'd use lxml. If there's already a C-based web service that uses libxml2 I'd use that service, but that's a different situation. I wouldn't want to have to *have* to do this just to be able to use libxml2. I think execnet blurs the line between library and application integration somewhat, as it allows very intimately communicating applications, but isn't the line still there? Regards, Martijn

On Thu, May 31, 2012 at 11:27 +0200, Martijn Faassen wrote:
not sure i understand what you mean with "networked" system here. With gw = execnet.makegateway("python3") a subprocess is created running with python3. It's true that the channel send/receive uses a network metapher but underlying is process-to-process communication, no network involved.
it all depends i guess. It seems that for Quora it was fast enough to call from PyPy into several libraries deployed on cpython. Moreover, if you need to munge data coming out from library function calls the proxy approach may require a lot of communication between the two interpreters and even if this happens in-process it is overhead. With execnet you can execute the munging code with the interpreter running the library and only send back the result you need. (On a side note, with a proxy approach you also need to carefully design lifecycle/GC issues for out-of-interpreter references).
There is a line, sure. It is blurred because one side can send code to the other. Which makes a difference in a similar way how sending Javascript to the client makes a difference - it reduces communication overhead and makes things faster on the client side. To conclude, i wouldn't be overly concerned by process-to-subprocess communication costs. If i had to combine py3 and py2 code (throw PyPy in to your likening) i'd go down the Quora route and see how far it carries. After all, this is only an intermediate solution until everthing happily runs on Python3 anyway, right? ;) best, holger

Hi there, On Thu, May 31, 2012 at 11:18 PM, holger krekel <holger@merlinux.eu> wrote:
After all, this is only an intermediate solution until everthing happily runs on Python3 anyway, right? ;)
Well, I'm more thinking about it as a means to get to such a situation, not as an intermediate solution. If you have an integrated Python 2 and Python 3 in a single project you can decide to port code to Python 3 on a per-module basis instead of on a per-project basis. So I'm not any immediate need to run Python 3 libraries that I need to solve. Deciding to do so for existing projects at this point would cause quite a lot more hassle than I'm looking for. This is helping to retard Python 3 adoption, as only new projects can start using it. Concerning performance overhead of proxies, lifecycle issues would be tricky, though starting with the basic notion that a proxy is like another reference to that object in the interpreter would get you quite far, I think? I do suspect a proxy approach could be made to be very efficient. I don't know enough about execnet in practice to know how it would feel. It'd be nice if this were a documented way to integrate different interpreters. Regards, Martijn

Hi Martijn, hi Holger, On 6/1/12, Martijn Faassen <faassen@startifact.com> wrote:
Concerning performance overhead of proxies, lifecycle issues would be tricky
If, hypothetically speaking, there is someone interested in writing a PyPy solution where both a Python2 and a Python3 interpreter are running in the same process, then you gain the advantage of having only one GC to run both. At least it transparently solves the issues of lifetime and reference cycles. (You also have for free only one JIT, which can do cross-language optimizations like inlining a Python2 function into a Python3 context or vice-versa). I see these two points as benefits that you don't have in any multi-process solution. It would require some work on the PyPy side, and I'm not aware of anybody ready to invest time in that, but it shouldn't be particularly hard (once PyPy's Python3 interpreter is ready, and once people agree about which API to use to do cross-language calls.) A bientôt, Armin.

Hey Armin, Cool, having a shared GC and a shared JIT would be pretty neat features indeed! Regards, Martijn

On 4 June 2012 14:33, Armin Rigo <arigo@tunes.org> wrote:
Having multiple interpreter instances within a single process allows for lots of interesting possibilities. IronPython permits this and it was used by Resolver One - just one of the reasons (along with the GIL) that it would have been much harder to write Resolver One in CPython than in IronPython (although .NET was chosen as a platform before Python was chosen as an implementation language). Allowing Python 2 and Python 3 to live within the same process would be very interesting. All the best, Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On 6/4/12 4:26 PM, Michael Foord wrote:
I too like this idea very much, because _that_ removes the need for a decision which I really can't base upon taste, but module availabilities, PyPy-readiness etc. Such a bridge would be very cool and quite 'peace-making' if may say so. When thinking of it, a slightly crazy concern popped up: I just followed the pyvenv discussions. PEP 405 will not be back-ported to Python2.7. If we now have two interpreter versions in one binary, which one of the Janus-heads will lead decisions like pyvenv startup? just-semi-seriously ;-) -- chris -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

On 6/4/12 10:09 PM, Amaury Forgeot d'Arc wrote:
Sure, I was just kidding. But if the py3k support is very marginal, just enough to start an imported py2.7 interpreter, then this would be a way to have pyvenv running with Python2.7. ok I'll shut up/down now - chris -- Christian Tismer :^)<mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de work +49 173 24 18 776 mobile +49 173 24 18 776 fax n.a. PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi Armin, Martijn, On Mon, Jun 04, 2012 at 15:33 +0200, Armin Rigo wrote:
Good point, PyPy could indeed bring considerable performance and resource benefits and avoid distributed GC issues. Of course there also are deployment and API questions that could maybe better first be tackled outside of PyPy in the form of a prototype that ignores GC issues. I am wondering, however, how many people really have that py2/py3 need and how much they care about performance in such situations. OTOH, a good solution could trigger needs - that's how a lot of technology development works, anyway :) best, holger
A bientôt,
Armin.

Hi there, On Mon, Jun 4, 2012 at 4:27 PM, holger krekel <holger@merlinux.eu> wrote:
I am wondering, however, how many people really have that py2/py3 need
I'm not sure that's precisely the right question, as you indicate yourself already:
Indeed. Right now many people are developing Python 2 projects. They might be interested in a way to use Python 3 libraries in such a project. Right now some people are developing Python 3 projects. THey migh be interested in a way to use Python 2 libraries in such a project. This might in fact motivate more people to start a Python 3 project in the first place, as they aren't faced with having to port any libraries they may want to use. If you asked these people a question "would you like to be able to use Python 2 libraries in Python 3 projects" (or reversed) many of them might say 'yes'. But none of these people may feel they *need* a py2/py3 interpreter. :)
and how much they care about performance in such situations.
Yes, such people might say "yes, if there isn't any other impact on my project" (like performance). Of course there are other consequences to adopting PyPy in a project, still.. Bringing up this idea once every while is my little contribution to the Python 3 effort. At present I don't want to go any further. :) Regards, Martijn

Hi, 2012/6/4 Armin Rigo <arigo@tunes.org>
I did some experiments a few months ago, trying to embed two 2.7 interpreters in the same translation target (which could have slightly different options, say with/without rope strings). I stopped when I realized that it cannot work with a single pypy tree: if space.get_name() returns a RPython string in 2.7, it cannot return a RPython unicode in 3.2; and even then, the MethodOfFrozenPBCRepr class "assumes that all methods are the same function bound to different PBCs". Maybe we should retry with different pypy trees (i.e pypy/interpreter/ vs. pypy3k/interpreter/, and so on) But I'm sure we would like some objects to be compatible, for example PyFrame2.f_backref could be a PyFrame3. -- Amaury Forgeot d'Arc

Hi Amaury, On Mon, Jun 4, 2012 at 10:32 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
Not necessarily. At least not if the API presented to the user is that of a foreign function invocation library. We'd start with a no-sharing model and focus on enabling some data exchanges beyond the minimal "foreign data" boxes in both directions. A bientôt, Armin.

Antonio Cuni <anto.cuni@gmail.com> writes:
Any solution where improvements to the translator have to be ported manually is a recipe for trouble. The translator toolchain is and should always be shared. The interpreters, on the other hand, are where the merge difficulties come from. These are fundamentally diverging since the python 2 and 3 languages are actually quite different. Because of these two things, there are only two sane solutions, and (2) above is not one of them. Adding a py3k interpreter to the default branch is reasonable, and could lead to sharing common parts in a top level directory at some point, but it comes with a burden on the current python 2 development work. The other solution is to split the current pypy tree in two. Having a translator and an interpreter as separate repositories makes the translator more accessible as a tool, and projects to implement other languages' interpreters need only depend on it. This is the prettiest solution architecturally, but adds the burden on developers to match translator with compatible interpreters, which may or may not end up being a pain point. Anyways, a completely separate pypy3k repository is no better than the current situation, and I would argue that it's far worse. The pains you're experiencing trying to do merges today aren't even close to the kind of pain you'll experience trying to merge in interpreter improvements manually.

On 31 May 2012 15:20, Justin Bogner <mail@justinbogner.com> wrote:
Do remember that the translator actually requires the python 2 interpreter, that is a fundamental part of the way it works. So moving the translator into a different repository now also means maintaining two python 2 interpreters. -- William Leslie
participants (10)
-
Amaury Forgeot d'Arc
-
Antonio Cuni
-
Armin Rigo
-
Christian Tismer
-
holger krekel
-
Justin Bogner
-
Maciej Fijalkowski
-
Martijn Faassen
-
Michael Foord
-
William ML Leslie