Mailman 3 Question regarding: Lib/_markupbase.py - Python-Dev

Question regarding: Lib/_markupbase.py

older
efficient string concatenation ...

Developer Developer

11 Feb 2013 11 Feb '13

1:16 p.m.

Hello all, I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention: if '>' in rawdata[j:]: return rawdata.find(">", j) + 1 Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1 Otherwise I think we are scanning rawdata[j:] twice. Best regards

Show replies by date

Oleg Broytman

11 Feb 11 Feb

1:37 p.m.

On Mon, Feb 11, 2013 at 12:16:48PM +0000, Developer Developer <just_another_developer@yahoo.de> wrote:

...

I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention:

if '>' in rawdata[j:]: return rawdata.find(">", j) + 1

Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1

Otherwise I think we are scanning rawdata[j:] twice.

Is it really a significant optimization? Can you do an experiment and show figures? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Guido van Rossum

8:02 p.m.

Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data. On Mon, Feb 11, 2013 at 4:37 AM, Oleg Broytman <phd@phdru.name> wrote:

...

On Mon, Feb 11, 2013 at 12:16:48PM +0000, Developer Developer < just_another_developer@yahoo.de> wrote:

...
I was having a look at the file: Lib/_markupbase.py (@ 82151), function: "_parse_doctype_element" and have seen something that has caught my attention:

if '>' in rawdata[j:]: return rawdata.find(">", j) + 1

Wouldn't it be better to do the following? pos = rawdata.find(">", j) if pos != -1: return pos + 1

Otherwise I think we are scanning rawdata[j:] twice.

Is it really a significant optimization? Can you do an experiment and show figures?

Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido)

Antoine Pitrou

12 Feb 12 Feb

11:22 a.m.

Le Mon, 11 Feb 2013 11:02:04 -0800, Guido van Rossum <guido@python.org> a écrit :

...

Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data.

That said, the savings are still puny unless you spend your time calling str.find(). Regards Antoine.

Developer Developer

14 Feb 14 Feb

7:25 p.m.

Sorry, I just thought that: if '>' in rawdata[j:] would do a search, that is, that the implementation of "in" would just reuse/call the implementation of "find" and that the position returned would be used as: -1: not in != -1: in which seemed to me like the easy implementation of "in". That's why I was wondering why to search twice. Now I realize that it doesn't work the way I thought. Thank you for showing me and sorry for the confusion. Best regards, Guido (another Guido ;-) ) ----- Ursprüngliche Message ----- Von: Antoine Pitrou <solipsis@pitrou.net> An: python-dev@python.org CC: Gesendet: 11:22 Dienstag, 12.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py Le Mon, 11 Feb 2013 11:02:04 -0800, Guido van Rossum <guido@python.org> a écrit :

...

Warning: see http://bugs.python.org/issue17170. Depending on the length of the string being scanned and the probability of finding the specific character, the proposed change could actually be a *pessimization*. OTOH if the character occurs many times, the slice will actually cause O(N**2) behavior. So yes, it depends greatly on the distribution of the input data.

That said, the savings are still puny unless you spend your time calling str.find(). Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/just_another_developer%40y...

Fred Drake

11 Feb 11 Feb

3:10 p.m.

On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:

...

Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.

Yes, that would be better, and avoids a string object creation as well. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

Developer Developer

3:47 p.m.

Same thing in the function: "_parse_doctype_attlist": if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1 I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1 Best regards, Guido ----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:

...

Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.

Yes, that would be better, and avoids a string object creation as well. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

R. David Murray

4:59 p.m.

If these don't get reported as tracker issues they will probably get lost. --David On Mon, 11 Feb 2013 14:47:00 +0000, Developer Developer <just_another_developer@yahoo.de> wrote:

...

Same thing in the function: "_parse_doctype_attlist":

if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1

I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1

Best regards, Guido

----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py

On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:

...
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.

Yes, that would be better, and avoids a string object creation as well.

-Fred

-- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com

Developer Developer

5:15 p.m.

Thank you David, I didn't think of the issue tracker. I have just done it. Guido ----- Ursprüngliche Message ----- Von: R. David Murray <rdmurray@bitdance.com> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 16:59 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py If these don't get reported as tracker issues they will probably get lost. --David On Mon, 11 Feb 2013 14:47:00 +0000, Developer Developer <just_another_developer@yahoo.de> wrote:

...

Same thing in the function: "_parse_doctype_attlist":

if ")" in rawdata[j:]: j = rawdata.find(")", j) + 1 else: return -1

I would change it to: pos = rawdata.find(")", j) if pos != -1: j = pos + 1 else: return -1

Best regards, Guido

----- Ursprüngliche Message ----- Von: Fred Drake <fred@fdrake.net> An: Developer Developer <just_another_developer@yahoo.de> CC: "python-dev@python.org" <python-dev@python.org> Gesendet: 15:10 Montag, 11.Februar 2013 Betreff: Re: [Python-Dev] Question regarding: Lib/_markupbase.py

On Mon, Feb 11, 2013 at 7:16 AM, Developer Developer <just_another_developer@yahoo.de> wrote:

...
Wouldn't it be better to do the following? ... Otherwise I think we are scanning rawdata[j:] twice.

Yes, that would be better, and avoids a string object creation as well.

-Fred

-- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com

4096

Age (days ago)

4099

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Antoine Pitrou
Developer Developer
Fred Drake
Guido van Rossum
Oleg Broytman
R. David Murray

Question regarding: Lib/_markupbase.py

Developer Developer

Oleg Broytman

Guido van Rossum

Antoine Pitrou

Developer Developer

Fred Drake

Developer Developer

R. David Murray

Developer Developer

tags

participants (6)