Google Summer of Code proposal: improvement of long int and adding new types/modules.
I wish to participate in Google Summer of Code as a python developer. I have few ideas, what would be improved and added to python. Since these changes and add-ons would be codded in C, and added to python-core and/or as modules,I am not sure, if you are willing to agree with these ideas. First of all, I think, it would be good idea to speed up long int implementation in python. Especially multiplying and converting radix-2^k to radix-10^l. It might be done, using much faster algorithms than already used, and transparently improve efficiency of multiplying and printing/reading big integers. Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision. And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc. Every of above improvements would be coded in pure C, and without using external libraries, so portability of python won't be cracked, and no new dependencies will be added. I am waiting for your responses about my ideas, which of these you think are good, which poor etc. Main reason I am posting my question here is that, I am not sure, how much are you willing to change the core of the python. At the end, let my apologize for my poor English, I am trying to do my best. Best regards, Mateusz Rukowicz.
On 4/21/06, Mateusz Rukowicz <mateusz.rukowicz@vp.pl> wrote:
I wish to participate in Google Summer of Code as a python developer. I have few ideas, what would be improved and added to python. Since these changes and add-ons would be codded in C, and added to python-core and/or as modules,I am not sure, if you are willing to agree with these ideas.
First of all, I think, it would be good idea to speed up long int implementation in python. Especially multiplying and converting radix-2^k to radix-10^l. It might be done, using much faster algorithms than already used, and transparently improve efficiency of multiplying and printing/reading big integers.
That would be a good project, if someone is available to mentor it. (Tim Peters?)
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision.
I would suggest doing this as a separate module first, rather than as a patch to the Python core. Can you show us some practical applications of these operations?
And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
Probably better a new module. But how many people do you think need these?
Every of above improvements would be coded in pure C, and without using external libraries, so portability of python won't be cracked, and no new dependencies will be added.
Great -- that's important!
I am waiting for your responses about my ideas, which of these you think are good, which poor etc. Main reason I am posting my question here is that, I am not sure, how much are you willing to change the core of the python.
At the end, let my apologize for my poor English, I am trying to do my best.
No problem. We can understand you fine! -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
On 4/21/06, Mateusz Rukowicz <mateusz.rukowicz@vp.pl> wrote:
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision.
I would suggest doing this as a separate module first, rather than as a patch to the Python core.
Can you show us some practical applications of these operations?
Multi precision float is mostly used by physicians and mathematicians, interpreted languages are particularly good for physics simulations, in which small error would grow so much, that results are useless. Rational numbers would be used in codes where precision is of critical-importance, and one cannot accept rounding of results. So fraction (and also multi precision floating) would mostly be used in linear-programming algorithms, like solving set of equations or some optimizing methods.
And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
Probably better a new module. But how many people do you think need these?
Mostly cryptography would exploit fast number-theory algorithms. Adding these as a module would boost and make easier to code everything which is related to cryptography, such as ssl, rsa etc. Also hobbyist looking for huge primes etc. would appreciate that, but it is not main purpose of these ;). I understand that most of these improvements have quite limited audience, but I still think python should be friendly to them ;) Best regards, Mateusz Rukowicz.
On 4/21/06, Mateusz Rukowicz <mateusz.rukowicz@vp.pl> wrote:
Guido van Rossum wrote:
On 4/21/06, Mateusz Rukowicz <mateusz.rukowicz@vp.pl> wrote:
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision.
I would suggest doing this as a separate module first, rather than as a patch to the Python core.
Can you show us some practical applications of these operations?
Multi precision float is mostly used by physicians and mathematicians,
(Aside: you probably mean physicist, someone who practices physics. A physician is a doctor; don't ask me why. :-)
interpreted languages are particularly good for physics simulations, in which small error would grow so much, that results are useless.
We already have decimal floating point which can be configured to use however many digits of precision you want. Would this be sufficient? If you want more performance, perhaps you could tackle the very useful project of translating decimal.py into C?
Rational numbers would be used in codes where precision is of critical-importance, and one cannot accept rounding of results. So fraction (and also multi precision floating) would mostly be used in linear-programming algorithms, like solving set of equations or some optimizing methods.
I'm not sure I see the value of rational numbers implemeted in C; they're easy to write in Python and all the time goes into division of two ints which is already implemented in C.
And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
Probably better a new module. But how many people do you think need these?
Mostly cryptography would exploit fast number-theory algorithms. Adding these as a module would boost and make easier to code everything which is related to cryptography, such as ssl, rsa etc. Also hobbyist looking for huge primes etc. would appreciate that, but it is not main purpose of these ;).
Have you looked at the existing wrappers around OpenSSL, such as pyopenssl and m2crypto? ISTM that these provide most of the needed algorithms, already coded in open source C.
I understand that most of these improvements have quite limited audience, but I still think python should be friendly to them ;)
Sure. The question is, if there are more student applications than Google wants to fund, which projects will be selected? I would personally vote for the projects that potentially help out the largest number of Python users. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
(Aside: you probably mean physicist, someone who practices physics. A physician is a doctor; don't ask me why. :-)
;) I'll remember ;)
interpreted languages are particularly good for physics simulations, in which small error would grow so much, that results are useless.
We already have decimal floating point which can be configured to use however many digits of precision you want. Would this be sufficient? If you want more performance, perhaps you could tackle the very useful project of translating decimal.py into C?
Yes, it seems like better idea - already written software would benefit that transparently. I think I could develop 'margin' ideas later.
I'm not sure I see the value of rational numbers implemeted in C; they're easy to write in Python and all the time goes into division of two ints which is already implemented in C.
Well, quite true ;P
Have you looked at the existing wrappers around OpenSSL, such as pyopenssl and m2crypto? ISTM that these provide most of the needed algorithms, already coded in open source C.
Well, you already convinced me to not do that right now, but I still think python would benefit that, and it would be done later on, but this discussion may be moved in time.
I understand that most of these improvements have quite limited audience, but I still think python should be friendly to them ;)
Sure. The question is, if there are more student applications than Google wants to fund, which projects will be selected? I would personally vote for the projects that potentially help out the largest number of Python users.
So I think the most valuable of my ideas would be improving long int + coding decimal in C. Anyway, I think it would be possible to add other ideas later.
On 4/21/06, Mateusz Rukowicz <mateusz.rukowicz@vp.pl> wrote: ...
So I think the most valuable of my ideas would be improving long int + coding decimal in C. Anyway, I think it would be possible to add other ideas later.
I see "redo Decimal in C" (possibly with the addition of some fast elementary transcendentals) and "enhance operations on longs" (multiplication first and foremost, and base-conversions probably next, as you suggested -- possibly with the addition of some fast number-theoretical functions) as two separate projects, each of just about the right magnitude for an SoC project. I would be glad to mentor either (not both); my preference would be for the former -- it appears to me that attempting to do both together might negatively impact both. Remember, it isn't just the coding...: thorough testing, good docs, accurate performance measurements on a variety of platforms, ..., are all important part of a project that, assuming success, it's scheduled to become a core part of Python 2.6, after all. Alex
Alex Martelli wrote:
I see "redo Decimal in C" (possibly with the addition of some fast elementary transcendentals) and "enhance operations on longs" (multiplication first and foremost, and base-conversions probably next, as you suggested -- possibly with the addition of some fast number-theoretical functions) as two separate projects, each of just about the right magnitude for an SoC project. I would be glad to mentor either (not both); my preference would be for the former -- it appears to me that attempting to do both together might negatively impact both. Remember, it isn't just the coding...: thorough testing, good docs, accurate performance measurements on a variety of platforms, ..., are all important part of a project that, assuming success, it's scheduled to become a core part of Python 2.6, after all.
Alex
If it's enough, that's ok for me - I would concentrate on one thing and very likely do it better. My main reason of doing these both at the same time, was that, these would share many code, and since I did quite a bit research in this area (comparing algorithms with diffirent complexity, comparing diffirent cache-friendly implementations, timing on machines starting at p200 ending at dual Xeon 3GHz), I have quite a experience in that stuff. But anyway, conforming to Tim Peters mail, It's not likely something will change in long int ;) - I understand that, and don't want to change you developing python policy (actually, I was expecting that, when I realized that multiply is karatsuba-only). Here is a little comparsion made by my friend python vs my implementation of multiplying (coded in C, without any assembly - I have also assembly version ;P) http://parkour.pl/mat.png It computes product of k*l log(k) = log(l), X axis is sqrt(n), where O(n) = O(logk), in other words, n is 'length' of numbers. I think, that decimal coded in C by me would achieve quite similiar time (also I would eliminate 'staircase' effect), if you wish. I am quite efficiency concerned, in my opinion, efficiency is third most important thing after easy-to-use-code and functionality, and should not be forgetten. Also I state that efficient code/algorithms isn't necessary hard to maintain. And all in all, these algorithms are not so complicated, (in my opinion, fft multiply - which is assymptoticly the best - is less complicated than karatsuba, but that's my opinion). I am now quite sure, what I would like to do, and is possible by you to accept - code decimal in C, most important things about that would: 1.Compatibility with old decimal 1.easy to maintain and expand code 1.don't broke python portability, nor adding new depedencies 2.Efficiency (if and only if 1. isn't broke) I am quite confident I can achieve this points. But everything is open for discussion ;) I'd like to know as much as possible what do you think about this idea. (But already i know quite much) Best regards, Mateusz Rukowicz.
2006/4/22, Mateusz Rukowicz <mateusz.rukowicz@vp.pl>:
I am now quite sure, what I would like to do, and is possible by you to accept - code decimal in C, most important things about that would:
I'd be glad to mentor this. Regards, . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/
[Mateusz Rukowicz]
I wish to participate in Google Summer of Code as a python developer. I have few ideas, what would be improved and added to python. Since these changes and add-ons would be codded in C, and added to python-core and/or as modules,I am not sure, if you are willing to agree with these ideas.
First of all, I think, it would be good idea to speed up long int implementation in python. Especially multiplying and converting radix-2^k to radix-10^l. It might be done, using much faster algorithms than already used, and transparently improve efficiency of multiplying and printing/reading big integers.
[Guido van Rossum]
That would be a good project, if someone is available to mentor it. (Tim Peters?)
Generally speaking, I think longobject.c is already at the limit for what's _reasonable_ for the core to support in C. In fact, I regret that we added Karatsuba multiplication. That grossly complicated CPython's bigint multiplication code, yet anyone slinging bigints seriously would still be far better off using a Python wrapper for a library (like GMP) seriously devoted to gonzo bigint algorithms. CPython is missing mountains of possible bigint optimizations and algorithms, and that's fine by me -- the core can't afford to keep up with (or even approach) the state of the art for such stuff, but there are already Python-callable libraries that do keep up. Speeding the decimal module is a more attractive project. Note that's an explicit possibility for next month's "Need for Speed" sprint: http://mail.python.org/pipermail/python-dev/2006-April/063428.html Mateusz, if that interests you, it would be good to contact the sprint organizers.
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead.
You can have this today in Python by installing the GMP library and its Python wrapper.
Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision.
I don't think the CPython core can afford to support such sophisticated and limited-audience algorithms. OTOH, the decimal type has potentially unbounded precision already, and the math-module functions have no idea what to do about that. Perhaps some non-gonzo (straightforward even if far from optimal, and coded in Python) arbitrary-precision algorithms would make a decent addition. For example, the Emacs `calc` package uses non-heroic algorithms that are fine for casual use.
I would suggest doing this as a separate module first, rather than as a patch to the Python core.
Can you show us some practical applications of these operations?
He could, yes, but they wouldn't meet any web developer's definition of "practical". The audience is specialized (he says as if web developers' peculiar desires were universal ;-)).
And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests,
GMP again, although the developers note: http://www.swox.com/gmp/projects.html ... GMP is not really a number theory library and probably shouldn't have large amounts of code dedicated to sophisticated prime testing algorithms, ... If they think doing much more here is out of bounds for them, trying to sneak it into longobject.c is crazy :-) Note that tests like Miller-Rabin are very easy to code in Python.
factorizations etc.
Very large topic all by itself. On Windows, I use factor.exe: http://indigo.ie/~mscott/ and call it from Python via os.popen(). That doesn't implement the Number Field Sieve, but does use most earlier high-powered methods (including elliptic curve and MPQS).
Probably better a new module. But how many people do you think need these?
The people who install GMP <0.5 wink>.
Every of above improvements would be coded in pure C, and without using external libraries, so portability of python won't be cracked, and no new dependencies will be added.
Great -- that's important!
Since when does coding a module in pure C do anything for Jython or IronPython or PyPy besides make their lives more difficult? It would be better to code in Python, if the functionality is all that's desired; but it isn't, and getting gonzo speed too is the proper concern of libraries and development teams devoted to gonzo-speed math libraries.
I am waiting for your responses about my ideas, which of these you think are good, which poor etc. Main reason I am posting my question here is that, I am not sure, how much are you willing to change the core of the python.
All the things you mentioned are Good Things, solving some real problems for some real people. What I question is the wisdom of trying to implement them in the CPython core, especially when most of them are already available in CPython via downloading a Python-wrapped library.
At the end, let my apologize for my poor English, I am trying to do my best.
No problem. We can understand you fine!
Yup!
On 4/21/06, Tim Peters <tim.peters@gmail.com> wrote:
OTOH, the decimal type has potentially unbounded precision already, and the math-module functions have no idea what to do about that. Perhaps some non-gonzo (straightforward even if far from optimal, and coded in Python) arbitrary-precision algorithms would make a decent addition. For example, the Emacs `calc` package uses non-heroic algorithms that are fine for casual use.
(Slightly off-topic.) I wonder if the math module would actually be a good proving ground for generic/overloaded functions. It seems a clean fit (and even has a few applications for multiple dispatch, probably). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
"Tim Peters" <tim.peters@gmail.com> wrote:
[Mateusz Rukowicz]
And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests,
If they think doing much more here is out of bounds for them, trying to sneak it into longobject.c is crazy :-) Note that tests like Miller-Rabin are very easy to code in Python.
I've seen at least two examples of such in the ASPN Python cookbook. - Josiah
On Fri, Apr 21, 2006, Mateusz Rukowicz wrote:
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision. And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
To echo and amplify what Guido said: an excellent project would be to rewrite the decimal module in C. Another option would be to pick up and enhance either of the GMP wrappers: http://gmpy.sourceforge.net/ http://www.egenix.com/files/python/mxNumber.html That would also deal with your suggestion of rational numbers. Working with long ints would also be excellent, but I'd hesitate to tackle that unless Tim Peters volunteers to help (because there may be special implementation constraints). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Argue for your limitations, and sure enough they're yours." --Richard Bach
On Apr 21, 2006, at 7:46 AM, Aahz wrote:
On Fri, Apr 21, 2006, Mateusz Rukowicz wrote:
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision. And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
To echo and amplify what Guido said: an excellent project would be to rewrite the decimal module in C. Another option would be to pick up and enhance either of the GMP wrappers:
http://gmpy.sourceforge.net/ http://www.egenix.com/files/python/mxNumber.html
That would also deal with your suggestion of rational numbers.
GMP is covered by LGPL, so must any such derivative work (presumably ruling it out for embedding in Python's core itself). That being said, as gmpy's author, I'd be enthusiastically happy to mentor anybody who wants to work on gmpy or other multiprecision arithmetic extension for Python. Alex
Alex Martelli wrote:
On Apr 21, 2006, at 7:46 AM, Aahz wrote:
On Fri, Apr 21, 2006, Mateusz Rukowicz wrote:
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision. And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
To echo and amplify what Guido said: an excellent project would be to rewrite the decimal module in C. Another option would be to pick up and enhance either of the GMP wrappers:
http://gmpy.sourceforge.net/ http://www.egenix.com/files/python/mxNumber.html
That would also deal with your suggestion of rational numbers.
GMP is covered by LGPL, so must any such derivative work (presumably ruling it out for embedding in Python's core itself). That being said, as gmpy's author, I'd be enthusiastically happy to mentor anybody who wants to work on gmpy or other multiprecision arithmetic extension for Python.
Rewriting decimal module in C is not far from my initial idea, and I am quite happy about that, you think it's good idea. If I was doing it, I would write all needed things myself - I am quite experienced in coding multi precision computation algorithms (and all kind of algorithms at all).
Working with long ints would also be excellent, but I'd hesitate to tackle that unless Tim Peters volunteers to help (because there may be special implementation constraints).
I am quite confident, that I would be able to make those changes transparently, that is - representation of long int wouldn't change. And as far as I saw in the code it's quite straightforward (actually, it would be strange otherwise). But of course it's up to you. Anyway, I hope it will be possible to add those changes. I guess that's quite enough for SoC project, but I hope, it will be possible to add some of my initial ideas anyway. Best regards, Mateusz Rukowicz.
Alex Martelli wrote:
GMP is covered by LGPL, so must any such derivative work
But the wrapper is just using GMP as a library, so it shouldn't be infected with LGPLness, should it? -- Greg
On 4/21/06, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: ...
GMP is covered by LGPL, so must any such derivative work
But the wrapper is just using GMP as a library, so it shouldn't be infected with LGPLness, should it?
If a lawyer for the PSF can confidently assert that gmpy is not a derivative work of GMP, I'll have no problem changing gmpy's licensing. But I won't make such a call myself: for example, gmpy.c #include's gmp.h and uses (==expands) some of the C macros there defined -- doesn't that make gmpy.o a derived work of gmp.h? I'm quite confident that the concept of "derived work" would not apply if gmpy.so only accessed a gmp.so (or other kinds of dynamic libraries), but I fear the connection is stronger than that, so, prudently, I'm assuming the "derived work" status until further notice. Alex
On Apr 21, 2006, at 5:58 PM, Alex Martelli wrote:
On 4/21/06, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: ...
GMP is covered by LGPL, so must any such derivative work
But the wrapper is just using GMP as a library, so it shouldn't be infected with LGPLness, should it?
If a lawyer for the PSF can confidently assert that gmpy is not a derivative work of GMP, I'll have no problem changing gmpy's licensing. But I won't make such a call myself: for example, gmpy.c #include's gmp.h and uses (==expands) some of the C macros there defined -- doesn't that make gmpy.o a derived work of gmp.h?
I'm quite confident that the concept of "derived work" would not apply if gmpy.so only accessed a gmp.so (or other kinds of dynamic libraries), but I fear the connection is stronger than that, so, prudently, I'm assuming the "derived work" status until further notice.
Well we already wrap readline, would this really be any worse? Readline is GPL. -bob
Alex Martelli wrote:
gmpy.c #include's gmp.h and uses (==expands) some of the C macros there defined -- doesn't that make gmpy.o a derived work of gmp.h?
Assuming that gmp.h is the header which defines the public interface of the gmp library, any code which uses it as a library will be including gmp.h. So that in itself doesn't imply that you're doing more than using it as a library. -- Greg
Hi all, Mateusz Rukowicz wrote:
I wish to participate in Google Summer of Code as a python developer. I have few ideas, what would be improved and added to python. Since these changes and add-ons would be codded in C, and added to python-core and/or as modules,I am not sure, if you are willing to agree with these ideas.
First of all, I think, it would be good idea to speed up long int implementation in python. Especially multiplying and converting radix-2^k to radix-10^l. It might be done, using much faster algorithms than already used, and transparently improve efficiency of multiplying and printing/reading big integers.
Next thing I would add is multi precision floating point type to the core and fraction type, which in some cases highly improves operations, which would have to be done using floating point instead. Of course, math module will need update to support multi precision floating points, and with that, one could compute asin or any other function provided with math with precision limited by memory and time. It would be also good idea to add function which computes pi and exp with unlimited precision. And last thing - It would be nice to add some number-theory functions to math module (or new one), like prime-tests, factorizations etc.
Sorry for pitching in late, I was away for a while. I'd just like to point out in the context of this discussion: http://sage.scipy.org/sage/ SAGE is a fairly comprehensive system built on top of python to do all sorts of research-level number theory work, from basic things up to unpronouncable ones. It includes wrappers to many of the major number-theory related libraries which exist with an open-source license. I am not commenting one way or another on your proposal, just bringing up a project with a lot of relevance to what you are talking about. Cheers, f
participants (10)
-
Aahz -
Alex Martelli -
Bob Ippolito -
Facundo Batista -
Fernando Perez -
Greg Ewing -
Guido van Rossum -
Josiah Carlson -
Mateusz Rukowicz -
Tim Peters