Python parser performance optimizations
Hello, Back in March, I've posted a patch at http://bugs.python.org/issue26526 -- "In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA". The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization. Python Developer's Guide says: "If you don't get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org<mailto:python-dev@python.org> asking for someone to review your patch." So, here I am. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Thu, May 26, 2016 at 10:19:05AM +0000, Artyom Skrobov wrote: [...]
The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization.
That can't be right. How can you reduce memory usage by more than one hundred percent? That would mean you have saved more memory than was originally used and are now using a negative amount of memory. -- Steve
On 05/29/2016 10:53 PM, Steven D'Aprano wrote:
On Thu, May 26, 2016 at 10:19:05AM +0000, Artyom Skrobov wrote: [...]
The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization.
That can't be right. How can you reduce memory usage by more than one hundred percent? That would mean you have saved more memory than was originally used and are now using a negative amount of memory.
It is not. It would be nice to have the values that were used to calculate these percentages.
Steven D'Aprano wrote:
That can't be right. How can you reduce memory usage by more than one hundred percent? That would mean you have saved more memory than was originally used and are now using a negative amount of memory.
It emails an order for more RAM to Amazon, who send out a robot drone to install it in your computer. -- Greg
I know we're all just having fun, but that's probably a rather stressful welcome to the list. Maybe we can tone down the humor a bit and instead review the OP's patches? --Guido (mobile)
Hello, This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- "Excessive peak memory consumption by the Python parser". The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, "the payload", reviewed and committed. To address the concerns raised by the commenters back in May: the patch doesn't lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, "200% less memory usage" is a threefold shrink. The absolute values, and the way they were produced, are all reported under the ticket. From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations Hello, Back in March, I've posted a patch at http://bugs.python.org/issue26526 -- "In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA". The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization. Python Developer's Guide says: "If you don't get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org<mailto:python-dev@python.org> asking for someone to review your patch." So, here I am.
Hello, This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- "Excessive peak memory consumption by the Python parser". Following the comments from July, the patches now include updating Misc/NEWS and compiler.rst to describe the change. The code change itself is still the same as a month ago. From: Artyom Skrobov Sent: 07 July 2016 15:44 To: python-dev@python.org; steve@pearwood.info; mafagafogigante@gmail.com; greg.ewing@canterbury.ac.nz Cc: nd Subject: RE: Python parser performance optimizations Hello, This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- "Excessive peak memory consumption by the Python parser". The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, "the payload", reviewed and committed. To address the concerns raised by the commenters back in May: the patch doesn't lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, "200% less memory usage" is a threefold shrink. The absolute values, and the way they were produced, are all reported under the ticket. From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations Hello, Back in March, I've posted a patch at http://bugs.python.org/issue26526 -- "In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA". The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization. Python Developer's Guide says: "If you don't get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org<mailto:python-dev@python.org> asking for someone to review your patch." So, here I am.
Hello, This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- "Excessive peak memory consumption by the Python parser". Following the comments from August, the patches now include a more detailed comment for Init_ValidationGrammar(). The code change itself is still the same as two months ago. From: Artyom Skrobov Sent: 07 July 2016 15:44 To: python-dev@python.org<mailto:python-dev@python.org>; steve@pearwood.info<mailto:steve@pearwood.info>; mafagafogigante@gmail.com<mailto:mafagafogigante@gmail.com>; greg.ewing@canterbury.ac.nz<mailto:greg.ewing@canterbury.ac.nz> Cc: nd Subject: RE: Python parser performance optimizations Hello, This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- "Excessive peak memory consumption by the Python parser". The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, "the payload", reviewed and committed. To address the concerns raised by the commenters back in May: the patch doesn't lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, "200% less memory usage" is a threefold shrink. The absolute values, and the way they were produced, are all reported under the ticket. From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations Hello, Back in March, I've posted a patch at http://bugs.python.org/issue26526 -- "In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA". The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415 My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion. The run time stays unaffected by this optimization. Python Developer's Guide says: "If you don't get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org<mailto:python-dev@python.org> asking for someone to review your patch." So, here I am.
I wonder if this patch could just be rejected instead of lingering forever? It clearly has no champion among the current core devs and therefore it won't be included in Python 3.6 (we're all volunteers so that's how it goes). The use case for the patch is also debatable: Python's parser wasn't designed to *efficiently* parse huge data tables like that, and if you have that much data, using JSON is the right answer. So this doesn't really scratch anyone's itch except of the patch author (Artyom).
From a quick look it seems the patch is very disruptive in terms of what it changes, so it's not easy to review.
I recommend giving up, closing the issue as "won't fix", recommending to use JSON, and moving on. Sometimes a change is just not worth the effort. --Guido On Tue, Aug 9, 2016 at 1:59 AM, Artyom Skrobov <Artyom.Skrobov@arm.com> wrote:
Hello,
This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- “Excessive peak memory consumption by the Python parser”.
Following the comments from July, the patches now include updating Misc/NEWS and compiler.rst to describe the change.
The code change itself is still the same as a month ago.
From: Artyom Skrobov Sent: 07 July 2016 15:44 To: python-dev@python.org; steve@pearwood.info; mafagafogigante@gmail.com; greg.ewing@canterbury.ac.nz Cc: nd Subject: RE: Python parser performance optimizations
Hello,
This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- “Excessive peak memory consumption by the Python parser”.
The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, “the payload”, reviewed and committed.
To address the concerns raised by the commenters back in May: the patch doesn’t lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, “200% less memory usage” is a threefold shrink.
The absolute values, and the way they were produced, are all reported under the ticket.
From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations
Hello,
Back in March, I’ve posted a patch at http://bugs.python.org/issue26526 -- “In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA”.
The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415
My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion.
The run time stays unaffected by this optimization.
Python Developer’s Guide says: “If you don’t get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org asking for someone to review your patch.”
So, here I am.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Thank you very much for your comments, I appreciate that we're all volunteers, and that if nobody fancies reviewing a big invasive patch, then it won't get reviewed. Still, I want to note that the suggested optimization has a noticeable positive effect on many benchmarks -- even though the effect may only become of practical value in such uncommon use cases as parsing huge data tables. As I found out later, JSON wasn't a viable option for storing dozens of megabytes of deeply-nested data, either. To get acceptable deserialization performance, I eventually had to resort to pickled files. -----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 15 September 2016 17:01 To: Artyom Skrobov Cc: python-dev@python.org; brett@python.org; jimjjewett@gmail.com; nd Subject: Re: [Python-Dev] Python parser performance optimizations I wonder if this patch could just be rejected instead of lingering forever? It clearly has no champion among the current core devs and therefore it won't be included in Python 3.6 (we're all volunteers so that's how it goes). The use case for the patch is also debatable: Python's parser wasn't designed to *efficiently* parse huge data tables like that, and if you have that much data, using JSON is the right answer. So this doesn't really scratch anyone's itch except of the patch author (Artyom). From a quick look it seems the patch is very disruptive in terms of what it changes, so it's not easy to review. I recommend giving up, closing the issue as "won't fix", recommending to use JSON, and moving on. Sometimes a change is just not worth the effort. --Guido On Tue, Aug 9, 2016 at 1:59 AM, Artyom Skrobov <Artyom.Skrobov@arm.com> wrote:
Hello,
This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- “Excessive peak memory consumption by the Python parser”.
Following the comments from July, the patches now include updating Misc/NEWS and compiler.rst to describe the change.
The code change itself is still the same as a month ago.
From: Artyom Skrobov Sent: 07 July 2016 15:44 To: python-dev@python.org; steve@pearwood.info; mafagafogigante@gmail.com; greg.ewing@canterbury.ac.nz Cc: nd Subject: RE: Python parser performance optimizations
Hello,
This is a monthly ping to get a review on http://bugs.python.org/issue26415 -- “Excessive peak memory consumption by the Python parser”.
The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, “the payload”, reviewed and committed.
To address the concerns raised by the commenters back in May: the patch doesn’t lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, “200% less memory usage” is a threefold shrink.
The absolute values, and the way they were produced, are all reported under the ticket.
From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations
Hello,
Back in March, I’ve posted a patch at http://bugs.python.org/issue26526 -- “In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA”.
The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415
My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion.
The run time stays unaffected by this optimization.
Python Developer’s Guide says: “If you don’t get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org asking for someone to review your patch.”
So, here I am.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
OK, but if nobody responds within a week we should close it. IMO there's no value in keeping things around that nobody is going to apply. I don't expect that a year from now we'll suddenly a surge of interest in this patch, sorry. On Fri, Sep 16, 2016 at 4:25 AM, Artyom Skrobov <Artyom.Skrobov@arm.com> wrote:
Thank you very much for your comments,
I appreciate that we're all volunteers, and that if nobody fancies reviewing a big invasive patch, then it won't get reviewed.
Still, I want to note that the suggested optimization has a noticeable positive effect on many benchmarks -- even though the effect may only become of practical value in such uncommon use cases as parsing huge data tables.
As I found out later, JSON wasn't a viable option for storing dozens of megabytes of deeply-nested data, either. To get acceptable deserialization performance, I eventually had to resort to pickled files.
-----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: 15 September 2016 17:01 To: Artyom Skrobov Cc: python-dev@python.org; brett@python.org; jimjjewett@gmail.com; nd Subject: Re: [Python-Dev] Python parser performance optimizations
I wonder if this patch could just be rejected instead of lingering forever? It clearly has no champion among the current core devs and therefore it won't be included in Python 3.6 (we're all volunteers so that's how it goes).
The use case for the patch is also debatable: Python's parser wasn't designed to *efficiently* parse huge data tables like that, and if you have that much data, using JSON is the right answer. So this doesn't really scratch anyone's itch except of the patch author (Artyom).
From a quick look it seems the patch is very disruptive in terms of what it changes, so it's not easy to review.
I recommend giving up, closing the issue as "won't fix", recommending to use JSON, and moving on. Sometimes a change is just not worth the effort.
--Guido
On Tue, Aug 9, 2016 at 1:59 AM, Artyom Skrobov <Artyom.Skrobov@arm.com> wrote:
Hello,
This is a monthly ping to get a review on http://bugs.python.org/ issue26415 -- “Excessive peak memory consumption by the Python parser”.
Following the comments from July, the patches now include updating Misc/NEWS and compiler.rst to describe the change.
The code change itself is still the same as a month ago.
From: Artyom Skrobov Sent: 07 July 2016 15:44 To: python-dev@python.org; steve@pearwood.info; mafagafogigante@gmail.com; greg.ewing@canterbury.ac.nz Cc: nd Subject: RE: Python parser performance optimizations
Hello,
This is a monthly ping to get a review on http://bugs.python.org/ issue26415 -- “Excessive peak memory consumption by the Python parser”.
The first patch of the series (an NFC refactoring) was successfully committed earlier in June, so the next step is to get the second patch, “the payload”, reviewed and committed.
To address the concerns raised by the commenters back in May: the patch doesn’t lead to negative memory consumption, of course. The base for calculating percentages is the smaller number of the two; this is the same style of reporting that perf.py uses. In other words, “200% less memory usage” is a threefold shrink.
The absolute values, and the way they were produced, are all reported under the ticket.
From: Artyom Skrobov Sent: 26 May 2016 11:19 To: 'python-dev@python.org' Subject: Python parser performance optimizations
Hello,
Back in March, I’ve posted a patch at http://bugs.python.org/issue26526 -- “In parsermodule.c, replace over 2KLOC of hand-crafted validation code, with a DFA”.
The motivation for this patch was to enable a memory footprint optimization, discussed at http://bugs.python.org/issue26415
My proposed optimization reduces the memory footprint by up to 30% on the standard benchmarks, and by 200% on a degenerate case which sparked the discussion.
The run time stays unaffected by this optimization.
Python Developer’s Guide says: “If you don’t get a response within a few days after pinging the issue, then you can try emailing python-dev@python.org asking for someone to review your patch.”
So, here I am.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
participants (6)
-
Artyom Skrobov
-
Bernardo Sulzbach
-
Greg Ewing
-
Guido van Rossum
-
Guido van Rossum
-
Steven D'Aprano