Should jitviewer come with a warning?

I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy. I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms. I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer. In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following "The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development." Do others here share this same opinion and should some warning be added to the jitviewer? John

On 05:39 pm, john.m.camara@gmail.com wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster? (Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact) Jean-Paul

On Sun, Feb 3, 2013 at 9:25 PM, <exarkun@twistedmatrix.com> wrote:
On 05:39 pm, john.m.camara@gmail.com wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
Jean-Paul _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Let me rephrase it. Where did you look for such a warning and you did not find it so you assumed it's ok? Cheers, fijal

On 04/02/13 06:25, exarkun@twistedmatrix.com wrote:
On 05:39 pm, john.m.camara@gmail.com wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy. [...] Do others here share this same opinion and should some warning be added to the jitviewer?
What makes you think people will even read this warning, let alone prioritize < it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
I think that if the coder is actually using some sort of profiling tool, any sort of profiling tool, that makes them 1000 times more likely to read and pay attention to the warning than the average coder who optimizes code by guessing. Other than that observation, I don't have an opinion on whether jitviewer should come with a warning. (Oh, and another thing... I'm assuming you mean for jitviewer to print the warning as part of it's normal output.) -- Steven

On Mon, Feb 4, 2013 at 12:39 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On 04/02/13 06:25, exarkun@twistedmatrix.com wrote:
On 05:39 pm, john.m.camara@gmail.com wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
[...]
Do others here share this same opinion and should some warning be added to the jitviewer?
What makes you think people will even read this warning, let alone prioritize
< it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
I think that if the coder is actually using some sort of profiling tool, any sort of profiling tool, that makes them 1000 times more likely to read and pay attention to the warning than the average coder who optimizes code by guessing.
Other than that observation, I don't have an opinion on whether jitviewer should come with a warning.
(Oh, and another thing... I'm assuming you mean for jitviewer to print the warning as part of it's normal output.)
that is definitely a no (my screen is too small to have some noise there, if for no other reason), it might have a warning in the documentation though, if it's any useful. But honestly, I doubt such a warning makes any sense. People who are capable of using jitviewer already "know better".

What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
Jean-Paul
I agree with you and was not being naive and thinking this alone was going to solve the problem but it does gives us something to point to when we see someone abusing the jitviewer. Maybe, a more effective approach, is not to advertise about the jitviewer to everyone who has performance issues and only tell those who are experience programmers who have already done the obvious in fixing any design issues that had existed in their code. Having inexperience developers use the normal profiling tools will still help them find the hot spots in their code and help prevent them from picking up habits that lead them to writing un-Pythonic code. I'm sure we all agree that code with a better design will run faster in pypy than trying to add optimizations that work only for pypy to help out a poor design. I don't think we want to end up with a lot of Python code that looks like C code. This is what happens when the inexperience start relying on the jitviewer. For instance take a look at this code [1] and blog [2] which lead me to post this. This is not the first example I have come across this issue and unfortunately it appears to be increaseing at an alarming rate. I guess I feel we have a responsibility to try to promote good programming practices when we can. [1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py [2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/ John On Sun, Feb 3, 2013 at 12:39 PM, John Camara <john.m.camara@gmail.com>wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
John

On Sun, Feb 3, 2013 at 10:08 PM, John Camara <john.m.camara@gmail.com> wrote:
What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
Jean-Paul
I agree with you and was not being naive and thinking this alone was going to solve the problem but it does gives us something to point to when we see someone abusing the jitviewer.
Maybe, a more effective approach, is not to advertise about the jitviewer to everyone who has performance issues and only tell those who are experience programmers who have already done the obvious in fixing any design issues that had existed in their code. Having inexperience developers use the normal profiling tools will still help them find the hot spots in their code and help prevent them from picking up habits that lead them to writing un-Pythonic code.
I'm sure we all agree that code with a better design will run faster in pypy than trying to add optimizations that work only for pypy to help out a poor design.
I don't think we want to end up with a lot of Python code that looks like C code. This is what happens when the inexperience start relying on the jitviewer.
For instance take a look at this code [1] and blog [2] which lead me to post this. This is not the first example I have come across this issue and unfortunately it appears to be increaseing at an alarming rate.
I guess I feel we have a responsibility to try to promote good programming practices when we can.
[1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py
[2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/
John
On Sun, Feb 3, 2013 at 12:39 PM, John Camara <john.m.camara@gmail.com> wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
John
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Hi John. I don't believe jitviewer is advertised really in that many places. We tell people who come to IRC, yes, but that's about it (it's not prominently featured on pypy.org for example). It's hard enough to make people read docs.

On Sun, Feb 3, 2013 at 10:12 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Sun, Feb 3, 2013 at 10:08 PM, John Camara <john.m.camara@gmail.com> wrote:
What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
Jean-Paul
I agree with you and was not being naive and thinking this alone was going to solve the problem but it does gives us something to point to when we see someone abusing the jitviewer.
Maybe, a more effective approach, is not to advertise about the jitviewer to everyone who has performance issues and only tell those who are experience programmers who have already done the obvious in fixing any design issues that had existed in their code. Having inexperience developers use the normal profiling tools will still help them find the hot spots in their code and help prevent them from picking up habits that lead them to writing un-Pythonic code.
I'm sure we all agree that code with a better design will run faster in pypy than trying to add optimizations that work only for pypy to help out a poor design.
I don't think we want to end up with a lot of Python code that looks like C code. This is what happens when the inexperience start relying on the jitviewer.
For instance take a look at this code [1] and blog [2] which lead me to post this. This is not the first example I have come across this issue and unfortunately it appears to be increaseing at an alarming rate.
I guess I feel we have a responsibility to try to promote good programming practices when we can.
[1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py
[2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/
John
On Sun, Feb 3, 2013 at 12:39 PM, John Camara <john.m.camara@gmail.com> wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
John
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Hi John.
I don't believe jitviewer is advertised really in that many places. We tell people who come to IRC, yes, but that's about it (it's not prominently featured on pypy.org for example). It's hard enough to make people read docs.
Also, looking at the msgpack - this code is maybe not ideal, but if you're dealing with buffer-level protocols, you end up with code looking like C a lot.

Let me rephrase it. Where did you look for such a warning and you did not find it so you assumed it's ok?
Cheers, fijal
Having a warning on https://bitbucket.org/pypy/jitviewer would be good. On Sun, Feb 3, 2013 at 3:08 PM, John Camara <john.m.camara@gmail.com> wrote:
What makes you think people will even read this warning, let alone prioritize it over their immediate desire to make their program run faster?
(Not that I am objecting to adding the warning, but I think you might be fooling yourself if you think it will have any impact)
Jean-Paul
I agree with you and was not being naive and thinking this alone was going to solve the problem but it does gives us something to point to when we see someone abusing the jitviewer.
Maybe, a more effective approach, is not to advertise about the jitviewer to everyone who has performance issues and only tell those who are experience programmers who have already done the obvious in fixing any design issues that had existed in their code. Having inexperience developers use the normal profiling tools will still help them find the hot spots in their code and help prevent them from picking up habits that lead them to writing un-Pythonic code.
I'm sure we all agree that code with a better design will run faster in pypy than trying to add optimizations that work only for pypy to help out a poor design.
I don't think we want to end up with a lot of Python code that looks like C code. This is what happens when the inexperience start relying on the jitviewer.
For instance take a look at this code [1] and blog [2] which lead me to post this. This is not the first example I have come across this issue and unfortunately it appears to be increaseing at an alarming rate.
I guess I feel we have a responsibility to try to promote good programming practices when we can.
[1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py
[2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/
John
On Sun, Feb 3, 2013 at 12:39 PM, John Camara <john.m.camara@gmail.com>wrote:
I have been noticing a pattern where many who are writing Python code to run on PyPy are relying more and more on using the jitviewer to help them write faster code. Unfortunately, many of them who do so don't look at improving the design of their code as a way to improve the speed at which it will run under PyPy but instead start writing obscure Python code that happens to run faster under PyPy.
I know that at least with the PyPy core developers they would like to see every one just create good clean Python code and that often code that has been made into obscure Python was don so to try to optimize it for CPython which in many cases causes it to run slower on PyPy than it would run it the code just followed typical Python idioms.
I feel that a normal developer should be using tools like cProfiler and runsnakerun and cleaning up design issues way before they should even consider using jitviewer.
In a recent case where I saw someone using the jitviewer who likely doesn't need to use it. At least they don't need to use it considering the current design of the code I said the following
"The jitviewer should be mainly used by PyPy core developers and those building PyPy VMs. A normal developer writing Python code to run on PyPy shouldn’t have a need to use it. They can use it to point out an inefficiency that PyPy has to the core developers but it should not be used as a way to get you to write Python code in a way that has a better chance of being optimized under PyPy except for very rare occasions and even then it should only be made by those who follow closely and understand PyPy’s development."
Do others here share this same opinion and should some warning be added to the jitviewer?
John

Also, looking at the msgpack - this code is maybe not ideal, but if you're dealing with buffer-level protocols, you end up with code looking like C a lot.
I do agree that this type a code will likely end up looking like C but it's not necessary for all of it to look like c. Like there should be a need to have long chains of if, elif statements. Using pack_into and unpack_from instead of pack and unpack methods so that it directly deals with the buffer instead of making sub strings. Even if pypy can optimize this away why write Python code like this when its not necessary. Plus I felt, initially the code should just use cffi and connect to the native c library. I believe this approach is likely to give very close to the best performance you could get on pypy for this type of library. I'm not sure how much of an increase in performance would be gain by writing the library completely in Python vs using cffi. Is there anything wrong with this line of thinking. Do you feel a pure Python approach could achieve better results than using cffi under pypy. John

On Sun, Feb 3, 2013 at 10:29 PM, John Camara <john.m.camara@gmail.com> wrote:
Also, looking at the msgpack - this code is maybe not ideal, but if you're dealing with buffer-level protocols, you end up with code looking like C a lot.
I do agree that this type a code will likely end up looking like C but it's not necessary for all of it to look like c. Like there should be a need to have long chains of if, elif statements. Using pack_into and unpack_from instead of pack and unpack methods so that it directly deals with the buffer instead of making sub strings. Even if pypy can optimize this away why write Python code like this when its not necessary.
er. strings are immutable in python. you can unpack into them. other kinds of buffers are kind of dodgy, because python never grew a correct buffer.
Plus I felt, initially the code should just use cffi and connect to the native c library. I believe this approach is likely to give very close to the best performance you could get on pypy for this type of library. I'm not sure how much of an increase in performance would be gain by writing the library completely in Python vs using cffi. Is there anything wrong with this line of thinking. Do you feel a pure Python approach could achieve better results than using cffi under pypy.
python is nicer. It does not segfault. Besides, how do you get a string out of a C library? if you do raw malloc it's prone to be bad. Etc. etc.
John
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

that is definitely a no (my screen is too small to have some noise there, if for no other reason), it might have a warning in the documentation though, if it's any useful. But honestly, I doubt such a warning makes any sense. People who are capable of using jitviewer already "know better".
I agree it should not be part of the normal output. I would say add it to the doc string in app.py and to the README file. As far as people using the jitviewer already "know better". If that's the case I wouldn't have started this thread. Like you said earlier the use of jitviewer is only promoted on irc and yet I have come across 3 people working on different projects who are using it for the wrong reasons over the last 2 weeks. It's like this is the new RPython where people start using it for the wrong reasons.

On Mon, Feb 4, 2013 at 2:12 AM, John Camara <john.m.camara@gmail.com> wrote:
that is definitely a no (my screen is too small to have some noise there, if for no other reason), it might have a warning in the documentation though, if it's any useful. But honestly, I doubt such a warning makes any sense. People who are capable of using jitviewer already "know better".
I agree it should not be part of the normal output. I would say add it to the doc string in app.py and to the README file. As far as people using the jitviewer already "know better". If that's the case I wouldn't have started this thread. Like you said earlier the use of jitviewer is only promoted on irc and yet I have come across 3 people working on different projects who are using it for the wrong reasons over the last 2 weeks. It's like this is the new RPython where people start using it for the wrong reasons.
Seriously which ones? I think msgpack usage is absolutely legit. You seem to have different opinions about the design of that software, but you did not respond to my concerns even, not to mention the fact that it sounds like it's not "obfuscated by jitviewer". Cheers, fijal

On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Seriously which ones? I think msgpack usage is absolutely legit. You seem to have different opinions about the design of that software, but you did not respond to my concerns even, not to mention the fact that it sounds like it's not "obfuscated by jitviewer".
Cheers, fijal
First I would have tried using cffi to the msgpack c library. If I wasn't happy with it I would do a Python port. So for no lets forget about cffi and just deal with the current design of this library. I had tried to minimize the discussion about this library on this forum as I had already wrote extensive comments on the original blog [1]. Now I didn't do an extensive review of the code as I only concentrated on a small portion of it namely in the area of unpacking the msgpack messages. I'll just highlight a couple of concerns I had. The first thing the shocked me was the use of the struct.pack and struct.unpack functions. Normally when you need to pack and unpack often with the same format you would create a struct object with the desired format and use this object with its pack and unpack methods. That way the format string is not always being parsed but instead once when the struct object is created. As Bas pointed out pypy is able to optimize the parsing of the format which is great but why would you prefer to write code that would run with horrible performance under CPython when there is an alternative available. Now toward the end of the comments on the blog, Bas stated he tried the struct object under pypy and found it ran slower. So there is likely an opportunity for pypy to add another optimization as if pypy can optimize the struct functions it should be able to handle the struct objects which I would think would be an easier case to handle purely looking at it from a high level perspective. Another issue I had was the msgpack spec is designed in a way to minimize the need of copying data. That is you should be able to just use the data directly from the message buffers. The normal way to do this with the struct module is to use the unpack_from and pack_into methods instead of the pack and unpack methods. These methods take a buffer and an offset as opposed to the pack and unpack which would require you to slice out a copy of the original buffer to pass it in the unpack method. As Bas pointed out again pypy is able to optimize this copy created from slicing away which is great but again why code it in a way that will be slow on CPython when there is an alternative. The other issue I mentioned on the blog was the large number of if, elif statements used to handle each type of msgpack message. I instead suggested creating essentialy a list that holds references to struct objects so that the message type would be used as in index into this list. So that way you remove all the if, elif statements and end up with something like struct_objects[message_type].unpack_from() Now I understand that pypy is able to optimize all these if and elif statements by creating bridges for the various paths through this code but again why code it this way when it will be slow on CPython. I would also assume that using the if elif statements would still have more overhead in pypy compared to using a list of references although maybe there is not much of a difference. Any way this is just the issues I saw with this library which by the way is no where near as bad as other code I have seen written as a result of users using the jitviewer. Unfortunately, I could not discuss these other projects as they are closed source. Any way to get to the other part of you reply I assume not responding to your concerns is about the following "python is nicer. It does not segfault. Besides, how do you get a string out of a C library? if you do raw malloc it's prone to be bad. Etc. etc." Sorry that was an over sight. I feel the same way about Python but what's the real issue of taking the practical approach of using a c library that is written well and is robust. I would love to see everything written in Python but who has the time to port everything over. In the msgpack c library it would have the responsibility of maintaining the buffers. It's API supports creating and freeing these buffers. The msgpack library would be doing most of the work and the only data that has to go back and forth between the Python code and the library are just basic types like int, float, double, strings, etc. To get a string out of the c library just slice cffi.buffer to create a copy of it in Python before calling the function to clear the msgpack buffer. With using cffi this slicing to create copies of strings into Python and the overhead of calling into the c functions does add extra work over what would be done with the code written purely in Python and assuming pypy does have all the optimizations in place to get you to match the performance of the msgpack c library. The question is how much overhead does cffi really add in this use case, and is it worth doing the Python port to remove that overhead. I don't know the answer to this question. It would require profiling both cases. [1] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/ John

On Mon, Feb 4, 2013 at 6:28 PM, John Camara <john.m.camara@gmail.com> wrote:
On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Seriously which ones? I think msgpack usage is absolutely legit. You seem to have different opinions about the design of that software, but you did not respond to my concerns even, not to mention the fact that it sounds like it's not "obfuscated by jitviewer".
Cheers, fijal
First I would have tried using cffi to the msgpack c library. If I wasn't happy with it I would do a Python port. So for no lets forget about cffi and just deal with the current design of this library.
I had tried to minimize the discussion about this library on this forum as I had already wrote extensive comments on the original blog [1]. Now I didn't do an extensive review of the code as I only concentrated on a small portion of it namely in the area of unpacking the msgpack messages. I'll just highlight a couple of concerns I had.
The first thing the shocked me was the use of the struct.pack and struct.unpack functions. Normally when you need to pack and unpack often with the same format you would create a struct object with the desired format and use this object with its pack and unpack methods. That way the format string is not always being parsed but instead once when the struct object is created.
As Bas pointed out pypy is able to optimize the parsing of the format which is great but why would you prefer to write code that would run with horrible performance under CPython when there is an alternative available. Now toward the end of the comments on the blog, Bas stated he tried the struct object under pypy and found it ran slower. So there is likely an opportunity for pypy to add another optimization as if pypy can optimize the struct functions it should be able to handle the struct objects which I would think would be an easier case to handle purely looking at it from a high level perspective.
It's a fallback for PyPy, so CPython speed is irrelevant. Also CPython has tons of weird quirks and "faster for PyPy and slower for CPython" is not always a bad thing. Personally I don't care. This particular example however should be reported as a bug in PyPy - using Struct is *nicer*, so it should be as fast (and there is no good reason why not).
Another issue I had was the msgpack spec is designed in a way to minimize the need of copying data. That is you should be able to just use the data directly from the message buffers. The normal way to do this with the struct module is to use the unpack_from and pack_into methods instead of the pack and unpack methods. These methods take a buffer and an offset as opposed to the pack and unpack which would require you to slice out a copy of the original buffer to pass it in the unpack method. As Bas pointed out again pypy is able to optimize this copy created from slicing away which is great but again why code it in a way that will be slow on CPython when there is an alternative.
Python buffer support sucks. For example you don't get a string out (because strings are immutable). PyPy buffer support double sucks, because buffer protocol is broken and we also didn't care. Fortunately we're able to optimize string slicing here (strings are nicer than buffers or bytearrays to play with), but we should fix buffers. Sorry about that. Again, the CPython speed does not apply.
The other issue I mentioned on the blog was the large number of if, elif statements used to handle each type of msgpack message. I instead suggested creating essentialy a list that holds references to struct objects so that the message type would be used as in index into this list. So that way you remove all the if, elif statements and end up with something like
struct_objects[message_type].unpack_from()
Lack of constant propagation. Again, a potential bug in PyPy, but a hard one.
Now I understand that pypy is able to optimize all these if and elif statements by creating bridges for the various paths through this code but again why code it this way when it will be slow on CPython. I would also assume that using the if elif statements would still have more overhead in pypy compared to using a list of references although maybe there is not much of a difference.
It's not about if/elif or references (all those things are incredibly cheap), but about constant propagation. Notably determining that a format is constant. This would disappear if we fix Struct (it's an easy fix, a few hours of work for someone not experienced with PyPy)
Any way this is just the issues I saw with this library which by the way is no where near as bad as other code I have seen written as a result of users using the jitviewer. Unfortunately, I could not discuss these other projects as they are closed source.
And we're unable to help you because of that.
Any way to get to the other part of you reply I assume not responding to your concerns is about the following
"python is nicer. It does not segfault. Besides, how do you get a string out of a C library? if you do raw malloc it's prone to be bad. Etc. etc."
Sorry that was an over sight. I feel the same way about Python but what's the real issue of taking the practical approach of using a c library that is written well and is robust. I would love to see everything written in Python but who has the time to port everything over.
If you're dealing with a data coming from the outside using Python over C lib sounds like a very sensible idea security-wise. I can't blame anyone here. I would do the same (given that the protocol is simple enough as well). Cheers, fijal
participants (4)
-
exarkun@twistedmatrix.com
-
John Camara
-
Maciej Fijalkowski
-
Steven D'Aprano