PyPy as part of a larger, bundled project?

Hi, A number of Python applications (e.g. http://calibre-ebook.com/, http://www.psychopy.org/ ... http://en.wikipedia.org/wiki/List_of_Python_software#Applications) are deployed together with the libraries and interpreter that they will use. Often, these applications are larger, and can end up performing operations that are computationally intensive. In the case of Calibre, e.g., large batch conversions from one book format to another can take more than an hour (for sufficiently large batches). This motivates a couple questions: 1) How difficult might it be, currently, to swap in PyPy[1] as the interpreter for, say, Calibre (http://calibre-ebook.com/download_linux)? (I am in the process of trying to do so presently, but the question is meant to be a bit more general) 2) Might these larger, interpreter-bundled applications be good targets for "high impact" deployments of PyPy? PyPy could, here, provide a tangible benefit without requiring any extra work by end-users, thereby potentially serving as a useful demonstration platform while quickly increasing PyPy usage. Leo [1] https://bugs.launchpad.net/calibre/+bug/977453

Leo Trottier, 11.04.2012 02:23:
A number of Python applications (e.g. http://calibre-ebook.com/, http://www.psychopy.org/ ... http://en.wikipedia.org/wiki/List_of_Python_software#Applications) are deployed together with the libraries and interpreter that they will use.
Often, these applications are larger, and can end up performing operations that are computationally intensive. In the case of Calibre, e.g., large batch conversions from one book format to another can take more than an hour (for sufficiently large batches).
Are you sure the bottleneck is in Python code here? PyPy won't magically speed up image conversions for you, for example. You can expect it to be faster for HTML processing with its bundled html5lib, though, and maybe also PDF generation, which it seems to be using pyPDF for. However, for XML processing, which I would expect to be a substantial part of the work when converting between e-book formats, it appears to be using lxml - you can't beat that with PyPy. Calibre likely won't run in PyPy directly as the GUI uses PyQT4 and it also uses extension modules for plugins. So I'm rather confident that it will not be easy to make it work at all with PyPy, or even to make any of the more interesting conversion pipelines work entirely in PyPy. You can still give it a try, though. Maybe you can manage to get at least an HTML-to-PDF pipeline working by forking off an external PyPy process and porting the libraries. However, you seem to be more interested in making it run fast than in making it run in PyPy. Your time may better be invested into pushing more parallel processing into the right places. You mentioned batch processing, that sounds like the bulk of the workload is trivially parallelisable. And maybe a bit of profiling against your specific processing needs would hint at a specific bottleneck that's easy to fix? Stefan

Actually, my motivation was not to get Calibre to be faster -- I use it only occasionally. All I knew was that Calibre was an application (1) built on Python, that (2) the Python interpreter it used was baked-in to the distribution, and (3) it seemed to perform a number of operations somewhat slowly. It seems that whenever (1) and (2) hold, there is a potential opportunity for the wide-scale deployment of PyPy, taking it from being used on a handful of servers and enthusiasts computers to instead being deployed on thousands or 10s of thousands of end-user applications. Perhaps PyPy *might* not immediately lead to an increase in performance (though one suspects that in general, it would), but the mere fact that it's available to the application developer could inspire new development paradigms that take advantage of PyPy's features. And it could serve as a practical test-bed for deploying PyPy and for evaluating tweaks to it. Leo On Wed, Apr 11, 2012 at 12:24 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Leo Trottier, 11.04.2012 02:23:
A number of Python applications (e.g. http://calibre-ebook.com/, http://www.psychopy.org/ ... http://en.wikipedia.org/wiki/List_of_Python_software#Applications) are deployed together with the libraries and interpreter that they will use.
Often, these applications are larger, and can end up performing operations that are computationally intensive. In the case of Calibre, e.g., large batch conversions from one book format to another can take more than an hour (for sufficiently large batches).
Are you sure the bottleneck is in Python code here? PyPy won't magically speed up image conversions for you, for example. You can expect it to be faster for HTML processing with its bundled html5lib, though, and maybe also PDF generation, which it seems to be using pyPDF for. However, for XML processing, which I would expect to be a substantial part of the work when converting between e-book formats, it appears to be using lxml - you can't beat that with PyPy.
Calibre likely won't run in PyPy directly as the GUI uses PyQT4 and it also uses extension modules for plugins. So I'm rather confident that it will not be easy to make it work at all with PyPy, or even to make any of the more interesting conversion pipelines work entirely in PyPy.
You can still give it a try, though. Maybe you can manage to get at least an HTML-to-PDF pipeline working by forking off an external PyPy process and porting the libraries.
However, you seem to be more interested in making it run fast than in making it run in PyPy. Your time may better be invested into pushing more parallel processing into the right places. You mentioned batch processing, that sounds like the bulk of the workload is trivially parallelisable. And maybe a bit of profiling against your specific processing needs would hint at a specific bottleneck that's easy to fix?
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

Leo Trottier, 11.04.2012 20:56:
Actually, my motivation was not to get Calibre to be faster -- I use it only occasionally. All I knew was that Calibre was an application (1) built on Python, that (2) the Python interpreter it used was baked-in to the distribution, and (3) it seemed to perform a number of operations somewhat slowly.
It seems that whenever (1) and (2) hold, there is a potential opportunity for the wide-scale deployment of PyPy, taking it from being used on a handful of servers and enthusiasts computers to instead being deployed on thousands or 10s of thousands of end-user applications.
Perhaps PyPy *might* not immediately lead to an increase in performance (though one suspects that in general, it would), but the mere fact that it's available to the application developer could inspire new development paradigms that take advantage of PyPy's features. And it could serve as a practical test-bed for deploying PyPy and for evaluating tweaks to it.
Ah, ok. Then your question is somewhat backwards, though. You should start by looking for an application that matches the above properties *and* that runs well in PyPy or is at least not too difficult to port. Otherwise, this discussion will stay at a rather theoretical level and the answer is "yes, sure, whenever you find an application for which it works, it will work for that application". You could start by looking through PyPy's compatibility list to see if any of the names looks familiar and suitable. When users report problems that are listed there, that's usually because they have an interest in making something run in PyPy. Stefan

My hope is that someone who is already familiar with the PyPy build process and various compatibility quirks might be able to both quickly determine whether there would be build compatibility as well as, perhaps, succeeding in actually building it. When I fail to build something against PyPy, it's less obvious to me than to many of the people here whether the failure can be easily resolved by the use of cpyext or some other sophisticated, PyPy-specicific hack. My hope and suspicion, here, is that this kind of task is one that benefits significantly from experience with the subtleties of PyPy interoperability, rather than mere cleverness or a more general familiarity with software development. I.e., that this is a challenge that might be on the one hand quite straightforward (if a little tedious) to some, while perhaps nearly impossible to many others. This just seemed like potential "low-hanging fruit" -- minimal work that might lead to greatly expanded deployment of PyPy. Leo On Wed, Apr 11, 2012 at 10:50 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Leo Trottier, 11.04.2012 20:56:
Actually, my motivation was not to get Calibre to be faster -- I use it only occasionally. All I knew was that Calibre was an application (1) built on Python, that (2) the Python interpreter it used was baked-in to the distribution, and (3) it seemed to perform a number of operations somewhat slowly.
It seems that whenever (1) and (2) hold, there is a potential opportunity for the wide-scale deployment of PyPy, taking it from being used on a handful of servers and enthusiasts computers to instead being deployed on thousands or 10s of thousands of end-user applications.
Perhaps PyPy *might* not immediately lead to an increase in performance (though one suspects that in general, it would), but the mere fact that it's available to the application developer could inspire new development paradigms that take advantage of PyPy's features. And it could serve as a practical test-bed for deploying PyPy and for evaluating tweaks to it.
Ah, ok. Then your question is somewhat backwards, though. You should start by looking for an application that matches the above properties *and* that runs well in PyPy or is at least not too difficult to port. Otherwise, this discussion will stay at a rather theoretical level and the answer is "yes, sure, whenever you find an application for which it works, it will work for that application".
You could start by looking through PyPy's compatibility list to see if any of the names looks familiar and suitable. When users report problems that are listed there, that's usually because they have an interest in making something run in PyPy.
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

Leo Trottier, 12.04.2012 09:08:
My hope is that someone who is already familiar with the PyPy build process and various compatibility quirks might be able to both quickly determine whether there would be build compatibility as well as, perhaps, succeeding in actually building it. When I fail to build something against PyPy, it's less obvious to me than to many of the people here whether the failure can be easily resolved by the use of cpyext or some other sophisticated, PyPy-specicific hack.
My hope and suspicion, here, is that this kind of task is one that benefits significantly from experience with the subtleties of PyPy interoperability, rather than mere cleverness or a more general familiarity with software development. I.e., that this is a challenge that might be on the one hand quite straightforward (if a little tedious) to some, while perhaps nearly impossible to many others.
Seems to answer the question why it's not more commonly done. Stefan

2012/4/12 Stefan Behnel <stefan_ml@behnel.de>
My hope and suspicion, here, is that this kind of task is one that benefits significantly from experience with the subtleties of PyPy interoperability, rather than mere cleverness or a more general familiarity with software development. I.e., that this is a challenge that might be on the one hand quite straightforward (if a little tedious) to some, while perhaps nearly impossible to many others.
Seems to answer the question why it's not more commonly done.
I don't know the application you are referring to, and don't have the time to do it myself, but I definitely want to help anyone who would like take this route. -- Amaury Forgeot d'Arc
participants (3)
-
Amaury Forgeot d'Arc
-
Leo Trottier
-
Stefan Behnel