Question regarding: Lib/_markupbase.py
Hello all, I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention: if '>' in rawdata[j:]: return rawdata.find(">", j) + 1 Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1 Otherwise I think we are scanning rawdata[j:] twice. Best regards
On Mon, Feb 11, 2013 at 12:16:48PM +0000, Developer Developer <just_another_developer@yahoo.de> wrote:
I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention:
if '>' in rawdata[j:]: return rawdata.find(">", j) + 1
Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1
Otherwise I think we are scanning rawdata[j:] twice.
Is it really a significant optimization? Can you do an experiment and show figures? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data. On Mon, Feb 11, 2013 at 4:37 AM, Oleg Broytman <phd@phdru.name> wrote:
On Mon, Feb 11, 2013 at 12:16:48PM +0000, Developer Developer < just_another_developer@yahoo.de> wrote:
I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention:
if '>' in rawdata[j:]: return rawdata.find(">", j) + 1
Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1
Otherwise I think we are scanning rawdata[j:] twice.
Is it really a significant optimization? Can you do an experiment and show figures?
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Le Mon, 11 Feb 2013 11:02:04 -0800, Guido van Rossum <guido@python.org> a écrit :
Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data.
That said, the savings are still puny unless you spend your time calling str.find(). Regards Antoine.
Sorry, I just thought that: if '>' in rawdata[j:] would do a search, that is, that the implementation of "in" would just reuse/call the implementation of "find" and that the position returned would be used as: -1: not in != -1: in which seemed to me like the easy implementation of "in". That's why I was wondering why to search twice. Now I realize that it doesn't work the way I thought. Thank you for showing me and sorry for the confusion. Best regards, Guido (another Guido ;-) ) ----- Ursprüngliche Message ----- Von: Antoine Pitrou <solipsis@pitrou.net> An: python-dev@python.org CC: Gesendet: 11:22 Dienstag, 12.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py Le Mon, 11 Feb 2013 11:02:04 -0800, Guido van Rossum <guido@python.org> a écrit :
Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data.
That said, the savings are still puny unless you spend your time calling str.find(). Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/just_another_developer%40y...
On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.
Yes, that would be better, and avoids a string object creation as well. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
Same thing in the function: "_parse_doctype_attlist": if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1 I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1 Best regards, Guido ----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.
Yes, that would be better, and avoids a string object creation as well. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
If these don't get reported as tracker issues they will probably get lost. --David On Mon, 11 Feb 2013 14:47:00 +0000, Developer Developer <just_another_developer@yahoo.de> wrote:
Same thing in the function: "_parse_doctype_attlist":
if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1
I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1
Best regards, Guido
----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py
On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.
Yes, that would be better, and avoids a string object creation as well.
-Fred
-- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com
Thank you David, I didn't think of the issue tracker. I have just done it. Guido ----- Ursprüngliche Message ----- Von: R. David Murray <rdmurray@bitdance.com> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 16:59 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py If these don't get reported as tracker issues they will probably get lost. --David On Mon, 11 Feb 2013 14:47:00 +0000, Developer Developer <just_another_developer@yahoo.de> wrote:
Same thing in the function: "_parse_doctype_attlist":
if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1
I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1
Best regards, Guido
----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py
On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.
Yes, that would be better, and avoids a string object creation as well.
-Fred
-- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com
participants (6)
-
Antoine Pitrou
-
Developer Developer
-
Fred Drake
-
Guido van Rossum
-
Oleg Broytman
-
R. David Murray