partition() (was: Remove str.find in 3.0?)
Michael Chermside wrote:
Raymond writes:
That suggests that we need a variant of split() that has been customized for typical find/index use cases. Perhaps introduce a new pair of methods, partition() and rpartition()
+1
My only suggestion is that when you're about to make a truly inspired suggestion like this one, that you use a new subject header. It will make it easier for the Python-Dev summary authors and for the people who look back in 20 years to ask "That str.partition() function is really swiggy! It's everywhere now, but I wonder what language had it first and who came up with it?"
+1 This is very useful behaviour IMO. Have the precise return values of partition() been defined? Specifically, given: 'a'.split('b') we could get back: ('a', '', '') ('a', None, None) Similarly: 'ab'.split('b') could be either: ('a', 'b', '') ('a', 'b', None) IMO the most useful (and intuitive) behaviour is to return strings in all cases. My major issue is with the names - partition() doesn't sound right to me. split() of course sounds best, but it has additional stuff we don't necessarily want. However, I think we should aim to get the idea accepted first, then work out the best name. Tim Delaney
[Delaney, Timothy (Tim)]
+1
This is very useful behaviour IMO.
Thanks. It seems to be getting +1s all around.
Have the precise return values of partition() been defined? . . . IMO the most useful (and intuitive) behaviour is to return strings in all cases.
Yes, there is a precise spec and yes it always returns three strings. Movitation and spec: http://mail.python.org/pipermail/python-dev/2005-August/055764.html Pure python implementation, sample invocations, and tests: http://mail.python.org/pipermail/python-dev/2005-August/055764.html
My major issue is with the names - partition() doesn't sound right to me.
FWIW, I am VERY happy with the name partition(). It has a long and delightful history in conjunction with the quicksort algorithm where it does something very similar to what we're doing here: partitioning data into three groups (left,center,right) with a small center element (called a pivot in the quicksort context and called a separator in our string parsing context). This name has enjoyed great descriptive success in communicating that the total data size is unchanged and that the parts can be recombined to the whole. IOW, it is exactly the right word. I won't part with it easily. http://www.google.com/search?q=quicksort+partition Raymond
On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote:
My major issue is with the names - partition() doesn't sound right to me.
FWIW, I am VERY happy with the name partition().
I'm +1 on the functionality, and +1 on the name partition(). The only other name that comes to mind is 'separate()', but a) I always spell it 'seperate' (and I don't need another lamdba <wink>) b) It's too similar in name to 'split()' Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
Hi, How about piece() ? Anthony can have his "e"s that way too! ;-) and it's the same number of characters as .split(). Cheers, --ldl On 8/29/05, Anthony Baxter <anthony@interlink.com.au> wrote:
On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote:
My major issue is with the names - partition() doesn't sound right to me.
FWIW, I am VERY happy with the name partition().
I'm +1 on the functionality, and +1 on the name partition(). The only other name that comes to mind is 'separate()', but a) I always spell it 'seperate' (and I don't need another lamdba <wink>) b) It's too similar in name to 'split()'
Anthony
-- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ldlandis%40gmail.com
-- LD Landis - N0YRQ - from the St Paul side of Minneapolis
Hi, Re: multiples, etc... Check out (and Pythonify) the ANSI M[UMPS] $PIECE(). See: http://www.jacquardsystems.com/Examples/function/piece.htm Cheers, --ldl On 8/29/05, LD Gus Landis <ldlandis@gmail.com> wrote:
Hi,
How about piece() ? Anthony can have his "e"s that way too! ;-) and it's the same number of characters as .split().
Cheers, --ldl
-- LD Landis - N0YRQ - from the St Paul side of Minneapolis
At 10:33 PM 8/29/2005 -0500, LD \"Gus\" Landis wrote:
Hi,
Re: multiples, etc...
Check out (and Pythonify) the ANSI M[UMPS] $PIECE(). See: http://www.jacquardsystems.com/Examples/function/piece.htm
Cheers, --ldl
As far as I can see, either you misunderstand what partition() does, or I'm completely misunderstanding what $PIECE does. As far as I can tell, $PIECE and partition() have absolutely nothing in common except that they take strings as arguments. :) -1 on piece(), +1 for partition().
Phillip J. Eby wrote:
Check out (and Pythonify) the ANSI M[UMPS] $PIECE(). See: http://www.jacquardsystems.com/Examples/function/piece.htm
As far as I can see, either you misunderstand what partition() does, or I'm completely misunderstanding what $PIECE does. As far as I can tell, $PIECE and partition() have absolutely nothing in common except that they take strings as arguments. :)
both split on a given token. partition splits once, and returns all three parts, while piece returns the part you ask for (the 3-argument form is similar to x.split(s)[i]) </F>
At 10:01 AM 8/30/2005 +0200, Fredrik Lundh wrote:
Phillip J. Eby wrote:
Check out (and Pythonify) the ANSI M[UMPS] $PIECE(). See: http://www.jacquardsystems.com/Examples/function/piece.htm
As far as I can see, either you misunderstand what partition() does, or I'm completely misunderstanding what $PIECE does. As far as I can tell, $PIECE and partition() have absolutely nothing in common except that they take strings as arguments. :)
both split on a given token. partition splits once, and returns all three parts, while piece returns the part you ask for
No, because looking at that URL, there is no piece that is the token split on. partition() always returns 3 parts for 1 occurrence of the token, whereas $PIECE only has 2.
(the 3-argument form is similar to x.split(s)[i])
Which is quite thoroughly unlike partition.
Phillip J. Eby wrote:
both split on a given token. partition splits once, and returns all three parts, while piece returns the part you ask for
No, because looking at that URL, there is no piece that is the token split on. partition() always returns 3 parts for 1 occurrence of the token, whereas $PIECE only has 2.
so "absolutely nothing in common" has now turned into "does the same thing but doesn't return the value you passed to it" ? sorry for wasting my time. </F>
At 07:54 PM 8/30/2005 +0200, Fredrik Lundh wrote:
Phillip J. Eby wrote:
both split on a given token. partition splits once, and returns all three parts, while piece returns the part you ask for
No, because looking at that URL, there is no piece that is the token split on. partition() always returns 3 parts for 1 occurrence of the token, whereas $PIECE only has 2.
so "absolutely nothing in common" has now turned into "does the same thing but doesn't return the value you passed to it" ?
$PIECE returns exactly one value. partition returns exactly 3. partition always returns the separator as one of the three values. $PIECE never does. How many more differences does it have to have before you consider them to be nothing alike?
sorry for wasting my time.
And sorry for you being either illiterate or humor-impaired, to have missed the smiley on the sentence that said "absolutely nothing in common except having string arguments". You quoted it in your first reply, so it's not like it didn't make it into your email client.
On 2005-08-30, Anthony Baxter <anthony@interlink.com.au> wrote:
On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote:
My major issue is with the names - partition() doesn't sound right to me.
FWIW, I am VERY happy with the name partition().
I'm +1 on the functionality, and +1 on the name partition(). The only other name that comes to mind is 'separate()', but a) I always spell it 'seperate' (and I don't need another lamdba <wink>) b) It's too similar in name to 'split()'
trisplit()
On 30/08/05, JustFillBug <mozbugbox@yahoo.com.au> wrote:
On 2005-08-30, Anthony Baxter <anthony@interlink.com.au> wrote:
On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote:
My major issue is with the names - partition() doesn't sound right to me.
FWIW, I am VERY happy with the name partition().
I'm +1 on the functionality, and +1 on the name partition(). The only other name that comes to mind is 'separate()', but a) I always spell it 'seperate' (and I don't need another lamdba <wink>) b) It's too similar in name to 'split()'
trisplit()
split3() ? I'm +1 on the name "partition" but I think this is shorter, communicates the similarity to split and the fact that it always returns exactly three parts. Oren
JustFillBug wrote:
trisplit()
And then for when you need to record the result somewhere, tricord(). :-) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
"Raymond" == Raymond Hettinger <raymond.hettinger@verizon.net> writes:
Raymond> FWIW, I am VERY happy with the name partition(). Raymond> ... [I]t is exactly the right word. I won't part with it Raymond> easily. +1 I note that Emacs has a split-string function which does not have those happy properties. In particular it never preserves the separator, and (by default) it discards empty strings. Raymond> It has a long and delightful history in conjunction with Raymond> the quicksort algorithm Now, that is a delightful mnemonic! -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
"Raymond Hettinger" <raymond.hettinger@verizon.net> wrote in
Yes, there is a precise spec and yes it always returns three strings.
While the find/index discussion was about "what is the best way to indicate 'cannot answer'", part of the conclusion is that any way can be awkward. So I am generally in favor of defining a function, when possible, so that it can always deliver an answer (giving inputs of the appropriate types) and so that the 'best way' question is moot. Nicely done. I think the name 'partition' is fine too. It does not preclude putting a quicksort-type partition function in a module of list functions. The only alternative I can think of is 'tripart', but I do *not* prefer that. Terry J. Reedy
Well, I want to come back on a point that wasn't discussed. I only found one positive comment here : http://mail.python.org/pipermail/python-dev/2005-August/055775.html It's about that : Raymond Hettinger wrote:
* The function always succeeds unless the separator argument is not a string type or is an empty string. So, a typical call doesn't have to be wrapped in a try-suite for normal usage.
Well, I wonder if it's so good ! Almost all the use case I find would require something like: head, sep, tail = s.partition(t) if sep: do something else: do something else Like, if you want to extract the drive letter from a windows path : drive, sep, tail = path.partition(":") if not sep: drive = get_current_drive() # Because it's a local path Or, if I want to iterate over all the path parts in a UNIX path: sep = '/' while sep: head, sep, path = path.partition(sep) IMO, that read strange ... partitionning until sep is None :S Then, testing with "if" in Python is always a lot slower than having an exception launched from C extension inside a try...except block. So both construct would read like already a lot of Python code: try: head,sep,tail = s.partition(t) do something except SeparatorException: do something else and: sep='/' try: while 1: head, drop, path = path.partition(sep) except SeparatorException: The end To me, the try..except block to test end or error conditions are just part of Python design. So I don't understand why you don't want it ! For the separator, keeping it in the return values may be very useful, mainly because I would really like to use this function replacing string with a regexp (like a simplified version of the Qt method QStringList::split) and, in that case, the separator would be the actual matched separator string. Pierre -- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68
Pierre Barbier de Reuille <pierre.barbier@cirad.fr> wrote:
Well, I want to come back on a point that wasn't discussed. I only found one positive comment here : http://mail.python.org/pipermail/python-dev/2005-August/055775.html
You apparently haven't been reading python-dev for around 36 hours, because there have been over a dozen positive comments in regards to str.partition().
Raymond Hettinger wrote:
* The function always succeeds unless the separator argument is not a string type or is an empty string. So, a typical call doesn't have to be wrapped in a try-suite for normal usage.
Well, I wonder if it's so good ! Almost all the use case I find would require something like:
head, sep, tail = s.partition(t) if sep: do something else: do something else
Why don't you pause for a second and read Raymond's post here: http://mail.python.org/pipermail/python-dev/2005-August/055781.html In that email there is a listing of standard library translations from str.find to str.partition, and in every case, it is improved. If you believe that str.index would be better used, take a moment and do a few translations of the sections provided and compare them with the str.partition examples.
Like, if you want to extract the drive letter from a windows path :
drive, sep, tail = path.partition(":") if not sep: drive = get_current_drive() # Because it's a local path
Or, if I want to iterate over all the path parts in a UNIX path:
sep = '/' while sep: head, sep, path = path.partition(sep)
IMO, that read strange ... partitionning until sep is None :S Then, testing with "if" in Python is always a lot slower than having an exception launched from C extension inside a try...except block.
In the vast majority of cases, all three portions of the returned partition result are used. The remaining few are generally split between one or two instances. In the microbenchmarks I've conducted, manually generating the slicings are measureably slower than when Python does it automatically. Also, exceptions are actually quite slow in relation to comparisons, specifically in the case of find vs. index (using 2.4)...
if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... if x.find('i')>=0: ... pass ... print time.time()-t ... 0.953000068665 if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... try: ... x.index('i') ... except ValueError: ... pass ... print time.time()-t ... 6.53100013733
I urge you to take some time to read Raymond's translations. - Josiah
Josiah Carlson a écrit :
Pierre Barbier de Reuille <pierre.barbier@cirad.fr> wrote:
Well, I want to come back on a point that wasn't discussed. I only found one positive comment here : http://mail.python.org/pipermail/python-dev/2005-August/055775.html
You apparently haven't been reading python-dev for around 36 hours, because there have been over a dozen positive comments in regards to str.partition().
Well, I wasn't criticizing the overall idea of str.partition, which I found very useful ! I'm just discussing one particular idea, which is to avoid the use of exceptions.
Raymond Hettinger wrote:
* The function always succeeds unless the separator argument is not a string type or is an empty string. So, a typical call doesn't have to be wrapped in a try-suite for normal usage.
Well, I wonder if it's so good ! Almost all the use case I find would require something like:
head, sep, tail = s.partition(t) if sep: do something else: do something else
Why don't you pause for a second and read Raymond's post here: http://mail.python.org/pipermail/python-dev/2005-August/055781.html
In that email there is a listing of standard library translations from str.find to str.partition, and in every case, it is improved. If you believe that str.index would be better used, take a moment and do a few translations of the sections provided and compare them with the str.partition examples.
Well, what it does is exactly what I tought, you can express most of the use-cases of partition with: head, sep, tail = s.partition(sep) if not sep: #do something when it does not work else: #do something when it works And I propose to replace it by : try: head, sep, tail = s.partition(sep) # do something when it works except SeparatorError: # do something when it does not work What I'm talking about is consistency. In most cases in Python, or at least AFAIU, error testing is avoided and exception launching is preferred mainly for efficiency reasons. So my question remains: why prefer for that specific method returning an "error" value (i.e. an empty separator) against an exception ? Pierre -- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68
Pierre Barbier de Reuille wrote:
What I'm talking about is consistency. In most cases in Python, or at least AFAIU, error testing is avoided and exception launching is preferred mainly for efficiency reasons. So my question remains: why prefer for that specific method returning an "error" value (i.e. an empty separator) against an exception ?
Because, in many cases, there is more to it than just the separator not being found. Given a non-empty some_str and some_sep: head, sep, tail = some_str.partition(some_sep) There are actually five possible results: head and not sep and not tail (the separator was not found) head and sep and not tail (the separator is at the end) head and sep and tail (the separator is somewhere in the middle) not head and sep and tail (the separator is at the start) not head and sep and not tail (the separator is the whole string) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
I have some use cases with: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at:] and: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at+offset:] # offset != len(sep) So if partition() [or whatever it'll be called] could have an optional second argument that defines the width of the 'cut' made, I would be helped enormously. The default for this second argument would be len(sep), to preserve the current proposal. --eric
Eric Nieuwland a écrit :
I have some use cases with: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at:] and: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at+offset:] # offset != len(sep)
So if partition() [or whatever it'll be called] could have an optional second argument that defines the width of the 'cut' made, I would be helped enormously. The default for this second argument would be len(sep), to preserve the current proposal.
Well, IMO, your example is much better written: import re rsep = re.compile(sep + '.'*offset) lst = re.split(resp, some_str, 1) head = lst[0] tail = lst[1] Or you want to have some "partition" method which accept regular expressions: head, sep, tail = some_str.partition(re.compile(sep+'.'*offset))
--eric
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pierre.barbier%40cirad.fr
-- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68
Neat! +1 on regexps as an argument to partition().
It sounds better to have a separate function and call it re.partition, doesn't it ? By the way, re.partition() is *really* useful compared to re.split() because with the latter you don't which string precisely matched the pattern (it isn't an issue with str.split() since matching is exact). Regards Antoine.
Antoine> By the way, re.partition() is *really* useful compared to Antoine> re.split() because with the latter you don't which string Antoine> precisely matched the pattern (it isn't an issue with Antoine> str.split() since matching is exact). Just group your re: >>> import re >>> >>> re.split("ab", "abracadabra") ['', 'racad', 'ra'] >>> re.split("(ab)", "abracadabra") ['', 'ab', 'racad', 'ab', 'ra'] and you get it in the return value. In fact, re.split with a grouped re is very much like Raymond's str.partition method without the guarantee of returning a three-element list. Skip
Le mardi 30 août 2005 à 12:29 -0500, skip@pobox.com a écrit :
Just group your re:
>>> import re >>> >>> re.split("ab", "abracadabra") ['', 'racad', 'ra'] >>> re.split("(ab)", "abracadabra") ['', 'ab', 'racad', 'ab', 'ra']
and you get it in the return value. In fact, re.split with a grouped re is very much like Raymond's str.partition method without the guarantee of returning a three-element list.
Thanks! I guess I should have read the documentation carefully instead of assuming re.split() worked like in some other language (namely, PHP). Regards Antoine.
In fact, re.split with a grouped re is very much like Raymond's str.partition method without the guarantee of returning a three-element list. Whoops... Should also have included the maxsplit=1 constraint. Skip
On 30 aug 2005, at 17:40, Antoine Pitrou wrote:
Neat! +1 on regexps as an argument to partition().
It sounds better to have a separate function and call it re.partition, doesn't it ? By the way, re.partition() is *really* useful compared to re.split() because with the latter you don't which string precisely matched the pattern (it isn't an issue with str.split() since matching is exact).
Nice, too. BUT, "spam! and eggs".partition(re.compile("!.*d")) more closely resembles "xyz".split(), and that is the way things have evolved up-to now. --eric
Eric Nieuwland wrote:
Pierre Barbier de Reuille wrote:
Or you want to have some "partition" method which accept regular expressions:
head, sep, tail = some_str.partition(re.compile(sep+'.'*offset))
Neat! +1 on regexps as an argument to partition().
Are you sure? I would instead expect to find a .partition method on a regexp object: head, sep, tail = re.compile(sep+'.'*offset).partition(some_str) Shane
Shane Hathaway a écrit :
Eric Nieuwland wrote:
Pierre Barbier de Reuille wrote:
Or you want to have some "partition" method which accept regular expressions:
head, sep, tail = some_str.partition(re.compile(sep+'.'*offset))
Neat! +1 on regexps as an argument to partition().
Are you sure? I would instead expect to find a .partition method on a regexp object:
head, sep, tail = re.compile(sep+'.'*offset).partition(some_str)
Well, to be consistent with current re module, it would be better to follow Antoine's suggestion : head, sep, tail = re.partition(re.compile(sep+'.'*offset), some_str) Pierre
Shane
-- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68
Pierre Barbier de Reuille wrote:
Shane Hathaway a écrit :
Are you sure? I would instead expect to find a .partition method on a regexp object:
head, sep, tail = re.compile(sep+'.'*offset).partition(some_str)
Well, to be consistent with current re module, it would be better to follow Antoine's suggestion :
head, sep, tail = re.partition(re.compile(sep+'.'*offset), some_str)
Actually, consistency with the current re module requires new methods to be added in *both* places. Apparently Python believes TMTOWTDI is the right practice here. ;-) See search, match, split, findall, finditer, sub, and subn: http://docs.python.org/lib/node114.html http://docs.python.org/lib/re-objects.html Shane
Anyone remember why setdefault's second argument is optional?
d = {} d.setdefault(666) d {666: None}
just doesn't seem useful. In fact, it's so silly that someone calling setdefault with just one arg seems far more likely to have a bug in their code than to get an outcome they actually wanted. Haven't found any 1-arg uses of setdefault() either, except for test code verifying that you _can_ omit the second arg. This came up in ZODB-land, where someone volunteered to add setdefault() to BTrees. Some flavors of BTrees are specialized to hold integer or float values, and then setting None as a value is impossible. I resolved it there by making BTree.setdefault() require both arguments. It was a surprise to me that dict.setdefault() didn't also require both. If there isn't a sane use case for leaving the second argument out, I'd like to drop the possibility in P3K (assuming setdefault() survives).
Tim Peters <tim.peters@gmail.com> wrote:
Anyone remember why setdefault's second argument is optional?
d = {} d.setdefault(666) d {666: None}
For quick reference for other people, d.setdefault(key [, value]) returns the value that is currently there, or just assigned. The only case where it makes sense to omit the value parameter is in the case where value=None.
just doesn't seem useful. In fact, it's so silly that someone calling setdefault with just one arg seems far more likely to have a bug in their code than to get an outcome they actually wanted. Haven't found any 1-arg uses of setdefault() either, except for test code verifying that you _can_ omit the second arg.
This came up in ZODB-land, where someone volunteered to add setdefault() to BTrees. Some flavors of BTrees are specialized to hold integer or float values, and then setting None as a value is impossible. I resolved it there by making BTree.setdefault() require both arguments. It was a surprise to me that dict.setdefault() didn't also require both.
If there isn't a sane use case for leaving the second argument out, I'd like to drop the possibility in P3K (assuming setdefault() survives).
I agree, at least that in the case where people actually want None (the only time where the second argument is really optional, I think that they should have to specify it. EIBTI and all that. - Josiah
[Tim Peters]
Anyone remember why setdefault's second argument is optional?
d = {} d.setdefault(666) d {666: None} ...
[Josiah Carlson]
For quick reference for other people, d.setdefault(key [, value]) returns the value that is currently there, or just assigned. The only case where it makes sense to omit the value parameter is in the case where value=None.
Yes, that's right. Overwhelmingly most often in the wild, a just-constructed empty container object is passed as the second argument. Rarely, I see 0 passed. I've found no case where None is wanted (except in the test suite, verifying that the 1-argument form does indeed default to using None).
... I agree, at least that in the case where people actually want None (the only time where the second argument is really optional, I think that they should have to specify it. EIBTI and all that.
And since there apparently aren't any such cases outside of Python's test suite, that wouldn't be much of a burden on them <wink>.
[Tim]
Anyone remember why setdefault's second argument is optional?
IIRC, this is a vestige from its ancestor. The proposal for setdefault() described it as behaving like dict.get() but inserting the key if not found.
Haven't found any 1-arg uses of setdefault() either, except for test code verifying that you _can_ omit the second arg.
Likewise, I found zero occurrences in the library, in my cumulative code base, and in the third-party packages on my system.
If there isn't a sane use case for leaving the second argument out, I'd like to drop the possibility in P3K (assuming setdefault() survives).
Give a lack of legitimate use cases, do we have to wait to Py3.0? It could likely be fixed directly and not impact any code that people care about. Raymond
[Raymond]
setdefault() described it as behaving like dict.get() but inserting the key if not found.
...
Likewise, I found zero occurrences in the library, in my cumulative code base, and in the third-party packages on my system.
[Tim]
If there isn't a sane use case for leaving the second argument out, I'd like to drop the possibility in P3K (assuming setdefault() survives).
[Raymond]
Give a lack of legitimate use cases, do we have to wait to Py3.0? It could likely be fixed directly and not impact any code that people care about.
That would be fine by me, but any change really requires a deprecation-warning release first. Dang! I may have just found a use, in Zope's lib/python/docutils/parsers/rst/directives/images.py (which is part of docutils, not really part of Zope): figwidth = options.setdefault('figwidth') figclass = options.setdefault('figclass') del options['figwidth'] del options['figclass'] I'm still thinking about what that's trying to do <0.5 wink>. Assuming options is a dict-like thingie, it probably meant to do: figwidth = options.pop('figwidth', None) figclass = options.pop('figclass', None) David, are you married to that bizarre use of setdefault <wink>? Whatever, I can't claim there are _no_ uses of 1-arg setdefault() in the wild any more.
[Tim Peters]
Dang! I may have just found a use, in Zope's lib/python/docutils/parsers/rst/directives/images.py (which is part of docutils, not really part of Zope):
figwidth = options.setdefault('figwidth') figclass = options.setdefault('figclass') del options['figwidth'] del options['figclass']
If a feature is available, it *will* eventually be used! Whose law is that?
I'm still thinking about what that's trying to do <0.5 wink>.
The code needs to store the values of certain dict entries, then delete them. This is because the "options" dict is passed on to another function, where those entries are not welcome. The code above is simply shorter than this: if options.has_key('figwidth'): figwidth = options['figwidth'] del options['figwidth'] # again for 'figclass' Alternatively, try: figwidth = options['figwidth'] del options['figwidth'] except KeyError: pass It saves between one line and three lines of code per entry. But since those entries are probably not so common, it would actually be faster to use one of the above patterns.
Assuming options is a dict-like thingie, it probably meant to do:
figwidth = options.pop('figwidth', None) figclass = options.pop('figclass', None)
Yes, but the "pop" method was only added in Python 2.3. Docutils currently maintains compatibility with Python 2.1, so that's RIGHT OUT!
David, are you married to that bizarre use of setdefault <wink>?
No, not at all. In fact, I will vehemently deny that I ever wrote such code, and will continue to do so until someone looks up its history and proves that I'm guilty, which I probably am. -- David Goodger <http://python.net/~goodger>
[Tim Peters]
Dang! I may have just found a use, in Zope's lib/python/docutils/parsers/rst/directives/images.py (which is part of docutils, not really part of Zope):
figwidth = options.setdefault('figwidth') figclass = options.setdefault('figclass') del options['figwidth'] del options['figclass']
[David Goodger]
If a feature is available, it *will* eventually be used! Whose law is that?
This is a different law, about design mistakes getting used by people who should know better ;-)
The code needs to store the values of certain dict entries, then delete them. This is because the "options" dict is passed on to another function, where those entries are not welcome. The code above is simply shorter than this:
if options.has_key('figwidth'): figwidth = options['figwidth'] del options['figwidth'] # again for 'figclass'
Alternatively,
try: figwidth = options['figwidth'] del options['figwidth'] except KeyError: pass
Those wouldn't work in context, because they leave figwidth unbound if it's not a key in options. Later code unconditionally references fidgwidth.
It saves between one line and three lines of code per entry. But since those entries are probably not so common, it would actually be faster to use one of the above patterns.
Changing figwidth = options.setdefault('figwidth') figclass = options.setdefault('figclass') to figwidth = options.setdefault('figwidth', None) figclass = options.setdefault('figclass', None) is a minimal semantics-neutral edit to avoid the unloved 1-argument case.
Assuming options is a dict-like thingie, it probably meant to do:
figwidth = options.pop('figwidth', None) figclass = options.pop('figclass', None)
Yes, but the "pop" method was only added in Python 2.3. Docutils currently maintains compatibility with Python 2.1, so that's RIGHT OUT!
Oh, stop torturing yourself. Nobody uses Python 2.1 anymore ;-)
David, are you married to that bizarre use of setdefault <wink>?
No, not at all. In fact, I will vehemently deny that I ever wrote such code, and will continue to do so until someone looks up its history and proves that I'm guilty, which I probably am.
No, I checked, and this code was actually added by an Asian spammer, who polluted the docutils codebase with thousandsd of porn links hidden in triple-quoted strings. Google reveals that 1-argument setdefault() is a favorite of Asian porn spammers. So you should add a second argument just to avoid getting in trouble with Interpol ;-)
On Tue, 30 Aug 2005 18:14:55 +0200, Tim Peters <tim.peters@gmail.com> wrote:
d = {} d.setdefault(666) d {666: None}
just doesn't seem useful. In fact, it's so silly that someone calling setdefault with just one arg seems far more likely to have a bug in their code than to get an outcome they actually wanted. Haven't found
reminds me of dict.get()... i think in both cases being explicit:: beast = d.setdefault( 666, None ) beast = d.get( 666, None ) just reads better, allthemore since at least in my code what comes next is invariably a test 'if beast is None:...'. so beast = d.setdefault( 666 ) if beast is None: ... and beast = d.get( 666 ) if beast is None: ... a shorter but a tad too implicit for my feeling. _wolf
[Wolfgang Lipp]
reminds me of dict.get()... i think in both cases being explicit::
beast = d.setdefault( 666, None ) beast = d.get( 666, None )
just reads better, allthemore since at least in my code what comes next is invariably a test 'if beast is None:...'. so
beast = d.setdefault( 666 ) if beast is None: ...
Do you actually do this with setdefault()? It's not at all the same as the get() example next, because d.setdefault(666) may _also_ have the side effect of permanently adding a 666->None mapping to d. d.get(...) never mutates d.
and
beast = d.get( 666 ) if beast is None: ...
a shorter but a tad too implicit for my feeling.
Nevertheless, 1-argument get() is used a lot. Outside the test suite, I've only found one use of 1-argument setdefault() so far, and it was a poor use (used two lines of code to emulate what dict.pop() does directly).
On Tue, 30 Aug 2005 20:55:45 +0200, Tim Peters <tim.peters@gmail.com> wrote:
[Wolfgang Lipp]
reminds me of dict.get()... i think in both cases being explicit::
beast = d.setdefault( 666, None ) ...
Do you actually do this with setdefault()?
well, actually more like:: def f( x ): return x % 3 R = {} for x in range( 30 ): R.setdefault( f( x ), [] ).append( x ) still contrived, but you get the idea. i was really excited when finding out that d.pop, d.get and d.setdefault work in very much the same way in respect to the default argument, and my code has greatly benefitted from that. e.g. def f( **Q ): myoption = Q.pop( 'myoption', 42 ) if Q: raise TypeError(...) _w
Tim Peters wrote:
Anyone remember why setdefault's second argument is optional?
Some kind of symmetry with get, probably. if d.get(x) returns None if x doesn't exist, it makes some kind of sense that d.setdefault(x) returns None as well. Anyone remember why nobody managed to come up with a better name for setdefault (which is probably the worst name ever given to a method in the standard Python distribution) ? (if I were in charge, I'd rename it to something more informative. I'd also add a "join" built-in (similar to the good old string.join) and a "textfile" built- in (similar to open("U") plus support for encodings). but that's me. I want my code nice and tidy.) </F>
On Tue, 2005-08-30 at 14:53, Fredrik Lundh wrote:
Some kind of symmetry with get, probably. if
d.get(x)
returns None if x doesn't exist, it makes some kind of sense that
d.setdefault(x)
I think that's right, and IIRC the specific detail about the optional second argument was probably hashed out in private Pythonlabs email, or over a tasty lunch of kung pao chicken. I don't have access to my private archives at the moment, though the public record seems to start about here: http://mail.python.org/pipermail/python-dev/2000-August/007819.html
Anyone remember why nobody managed to come up with a better name for setdefault (which is probably the worst name ever given to a method in the standard Python distribution) ?
Heh. http://mail.python.org/pipermail/python-dev/2000-August/008059.html
(if I were in charge, I'd rename it to something more informative.
Maybe like getorset() <wink>. Oh, and yeah, I don't care if we change .setdefault() to require its second argument -- I've never used it without one. But don't remove the method, it's quite handy. -Barry
[Fredrik Lundh]
... Anyone remember why nobody managed to come up with a better name for setdefault (which is probably the worst name ever given to a method in the standard Python distribution) ?
I suggested a perfect name at the time: http://mail.python.org/pipermail/python-dev/2000-August/008036.html To save you from following that link, to this day I still mentally translate "setdefault" to "getorset" whenever I see it. That it didn't get that name is probably Skip's fault, for whining that "getorsetandget" would be "more accurate" <wink>. Actually, there's no evidence that Guido noticed: http://mail.python.org/pipermail/python-dev/2000-August/008059.html
(if I were in charge, I'd rename it to something more informative. I'd also add a "join" built-in (similar to the good old string.join) and a "textfile" built-in (similar to open("U") plus support for encodings). but that's me. I want my code nice and tidy.)
I'm not sure who is in charge, but I am sure they can be bribed ;-)
Tim Peters wrote:
Anyone remember why nobody managed to come up with a better name for setdefault (which is probably the worst name ever given to a method in the standard Python distribution) ?
I suggested a perfect name at the time:
http://mail.python.org/pipermail/python-dev/2000-August/008036.html
To save you from following that link, to this day I still mentally translate "setdefault" to "getorset" whenever I see it.
from this day, I'll do that as well. I have to admit that I had to follow that link anyway, just to make sure I wasn't involved in the decision at that time (which I wasn't, from what I can tell). But I stumbled upon this little naming protocol Protocol: if you have a suggestion for a name for this function, mail it to me. DON'T MAIL THE LIST. (If you mail it to the list, that name is disqualified.) Don't explain me why the name is good -- if it's good, I'll know, if it needs an explanation, it's not good. which I thought was most excellent, and something that we might PEP:ify for future use, until I realized that it gave us the "worst name ever"... oh well. </F>
On Tue, 2005-08-30 at 16:46, Fredrik Lundh wrote:
But I stumbled upon this little naming protocol
Protocol: if you have a suggestion for a name for this function, mail it to me. DON'T MAIL THE LIST. (If you mail it to the list, that name is disqualified.) Don't explain me why the name is good -- if it's good, I'll know, if it needs an explanation, it's not good.
which I thought was most excellent, and something that we might PEP:ify for future use, until I realized that it gave us the "worst name ever"...
/And/ the rule was self-admittedly broken by Guido not a few posts after that one. ;) -Barry
[Shane Hathaway writes about the existence of both module-level functions and object methods to do the same regex operations]
Apparently Python believes TMTOWTDI is the right practice here. ;-) See search, match, split, findall, finditer, sub, and subn:
http://docs.python.org/lib/node114.html http://docs.python.org/lib/re-objects.html
Dare I ask whether the uncompiled versions should be considered for removal in Python 3.0? *puts on his asbestos jacket* -- Michael Hoffman <hoffman@ebi.ac.uk> European Bioinformatics Institute
>> http://docs.python.org/lib/re-objects.html Michael> Dare I ask whether the uncompiled versions should be considered Michael> for removal in Python 3.0? It is quite convenient to not have to compile regular expressions in most cases. The module takes care of compiling your patterns and caching them for you. Skip
Michael Hoffman wrote:
Dare I ask whether the uncompiled versions should be considered for removal in Python 3.0?
*puts on his asbestos jacket*
there are no uncompiled versions, so that's not a problem. if you mean the function level api, it's there for convenience. if you're using less than 100 expressions in your program, you don't really have to *explicitly* compile your expressions. the function api will do that for you all by itself. </F>
At 04:28 PM 8/30/2005 +0200, Eric Nieuwland wrote:
I have some use cases with: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at:] and: cut_at = some_str.find(sep) head, tail = some_str[:cut_at], some_str[cut_at+offset:] # offset != len(sep)
So if partition() [or whatever it'll be called] could have an optional second argument that defines the width of the 'cut' made, I would be helped enormously. The default for this second argument would be len(sep), to preserve the current proposal.
Unrelated comment: maybe 'cut()' and rcut() would be nice short names. I'm not seeing the offset parameter, though, because this: head,__,tail = some_str.cut(sep) tail = tail[offset:] is still better than the original example.
On Tue, 2005-08-30 at 11:27, Phillip J. Eby wrote:
So if partition() [or whatever it'll be called] could have an optional second argument that defines the width of the 'cut' made, I would be helped enormously. The default for this second argument would be len(sep), to preserve the current proposal.
+1 on the concept -- very nice Raymond.
Unrelated comment: maybe 'cut()' and rcut() would be nice short names.
FWIW, +1 on .cut(), +0 on .partition() -Barry
>> Unrelated comment: maybe 'cut()' and rcut() would be nice short names. Barry> FWIW, +1 on .cut(), +0 on .partition() As long as people are free associating: snip(), excise(), explode(), invade_iraq()... <wink> Skip
Pierre Barbier de Reuille <pierre.barbier@cirad.fr> wrote:
Well, what it does is exactly what I tought, you can express most of the use-cases of partition with:
head, sep, tail = s.partition(sep) if not sep: #do something when it does not work else: #do something when it works
And I propose to replace it by :
try: head, sep, tail = s.partition(sep) # do something when it works except SeparatorError: # do something when it does not work
No, you can't. As Tim Peters pointed out, in order to be correct, you need to use... try: head, found, tail = s.partition(sep) except ValueError: # do something when it can't find sep else: # do something when it can find sep By embedding the 'found' case inside the try/except clause as you offer, you could be hiding another exception, which is incorrect.
What I'm talking about is consistency. In most cases in Python, or at least AFAIU, error testing is avoided and exception launching is preferred mainly for efficiency reasons. So my question remains: why prefer for that specific method returning an "error" value (i.e. an empty separator) against an exception ?
It is known among those who tune their Python code that try/except is relatively expensive when exceptions are raised, but not significantly faster (if any) when they are not. I'll provide an updated set of microbenchmarks...
if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... _ = x.find('h') ... if _ >= 0: ... pass ... else: ... pass ... print time.time()-t ... 0.84299993515 if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... try: ... _ = x.index('h') ... except ValueError: ... pass ... else: ... pass ... print time.time()-t ... 0.81299996376
BUT!
if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... try: ... _ = x.index('i') ... except ValueError: ... pass ... else: ... pass ... print time.time()-t ... 4.29700016975
We should subtract the time of the for loop, the method call overhead, perhaps the integer object creation/fetch, and the assignment. str.__len__() is pretty fast (really just a member check, which is at a constant offset...), let us use that.
if 1: ... x = 'h' ... t = time.time() ... for i in xrange(1000000): ... _ = x.__len__() ... print time.time()-t ... 0.5
So, subtracting that .5 seconds from all the cases gives us... 0.343 seconds for .find's comparison 0.313 seconds for .index's exception handling when an exception is not raised 3.797 seconds for .index's exception handling when an exception is raised. In the case of a string being found, .index is about 10% faster than .find . In the case of a string not being found, .index's exception handlnig mechanics are over 11 times slower than .find's comparison. Those numbers should speak for themselves. In terms of the strings being automatically chopped up vs. manually chopping them up with slices, it is obvious which will be faster: C-level slicing. I agree with Raymond that if you are going to poo-poo on str.partition() not raising an exception, you should do some translations using the correct structure that Tim Peters provided, and post them here on python-dev as 'proof' that raising an exception in the cases provided is better. - Josiah
Josiah Carlson a écrit :
Pierre Barbier de Reuille <pierre.barbier@cirad.fr> wrote:
0.5
So, subtracting that .5 seconds from all the cases gives us...
0.343 seconds for .find's comparison 0.313 seconds for .index's exception handling when an exception is not raised 3.797 seconds for .index's exception handling when an exception is raised.
Well, when I did benchmark that (two years ago) the difference was, AFAIR, much greater ! But well, I just have to adjust my internal data sets ;) Pierre
In the case of a string being found, .index is about 10% faster than .find . In the case of a string not being found, .index's exception handlnig mechanics are over 11 times slower than .find's comparison.
[...]
- Josiah
-- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68
Raymond Hettinger wrote:
[Delaney, Timothy (Tim)]
+1
This is very useful behaviour IMO.
Thanks. It seems to be getting +1s all around.
Wow, a lot of approvals! :)
Have the precise return values of partition() been defined?
+1 on the Name partition, I considered split or parts, but i agree partition reads better and since it's not so generic as something like get_parts, it creates a stronger identity making the code clearer.
IMO the most useful (and intuitive) behaviour is to return strings in all cases.
Wow, a lot of approvals! :-) A possibly to consider: Instead of partition() and rpartition(), have just partition with an optional step or skip value which can be a positive or negative non zero integer. head, found, tail = partition(sep, [step=1]) step = -1 step would look for sep from the right. step = 2, would look for the second sep from left. step = -2, would look for the second sep from the right. Default of course would be 1, find first step from the left. This would allow creating an iterator that could iterate though a string splitting on each sep from either the left, or right. Weather a 0 or a |value|>len(string) causes an exception would need to be decided. I can't think of an obvious use for a partition iterator at the moment, maybe someone could find an example. In any case, finding the second, or third sep is probably common enough. Cheers, Ron
[Ron Adam]
This would allow creating an iterator that could iterate though a string splitting on each sep from either the left, or right.
For uses more complex than basic partitioning, people should shift to more powerful tools like re.finditer(), re.findall(), and re.split().
I can't think of an obvious use for a partition iterator at the moment, maybe someone could find an example.
I prefer to avoid variants that are searching of a purpose.
In any case, finding the second, or third sep is probably common enough.
That case should be handled with consecutive partitions: # keep everything after the second 'X' head, found, s = s.partition('X') head, found, s = s.partition('x') Raymond
At 02:25 PM 8/30/2005 -0400, Raymond Hettinger wrote:
That case should be handled with consecutive partitions:
# keep everything after the second 'X' head, found, s = s.partition('X') head, found, s = s.partition('x')
Or: s=s.partition('X')[2].partition('X')[2] which actually suggests a shorter, clearer way to do it: s = s.after('X').after('X') And the corresponding 'before' method, of course, such that if sep in s: s.before(sep), sep, s.after(sep) == s.partition(sep) Technically, these should probably be before_first and after_first, with the corresponding before_last and after_last corresponding to rpartition.
Phillip J. Eby wrote:
At 02:25 PM 8/30/2005 -0400, Raymond Hettinger wrote:
That case should be handled with consecutive partitions:
# keep everything after the second 'X' head, found, s = s.partition('X') head, found, s = s.partition('x')
I was thinking of cases where head is everything before the second 'X'. A posible use case might be getting items in comma delimited string.
Or:
s=s.partition('X')[2].partition('X')[2]
which actually suggests a shorter, clearer way to do it:
s = s.after('X').after('X')
And the corresponding 'before' method, of course, such that if sep in s:
s.before(sep), sep, s.after(sep) == s.partition(sep)
Technically, these should probably be before_first and after_first, with the corresponding before_last and after_last corresponding to rpartition.
Do you really think these are easer than: head, found, tail = s.partition('X',2) I don't feel there is a need to avoid numbers entirely. In this case I think it's the better way to find the n'th seperator and since it's an optional value I feel it doesn't add a lot of complication. Anyway... It's just a suggestion. Cheers, Ron
Ron Adam wrote:
I don't feel there is a need to avoid numbers entirely. In this case I think it's the better way to find the n'th seperator and since it's an optional value I feel it doesn't add a lot of complication. Anyway... It's just a suggestion.
Avoid overengineering this without genuine use cases. Raymond's review of the standard library shows that the basic version of str.partition provides definite readability benefits and also makes it easier to write correct code - enhancements can wait until we have some real experience with how people use the method. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
Nick Coghlan wrote:
Ron Adam wrote:
I don't feel there is a need to avoid numbers entirely. In this case I think it's the better way to find the n'th seperator and since it's an optional value I feel it doesn't add a lot of complication. Anyway... It's just a suggestion.
Avoid overengineering this without genuine use cases. Raymond's review of the standard library shows that the basic version of str.partition provides definite readability benefits and also makes it easier to write correct code - enhancements can wait until we have some real experience with how people use the method.
Cheers, Nick.
The use cases for nth items 1 and -1 are the same ones for partition() and rpartition. It's only values greater or less than those that need use cases. (I'll try to find some.) True, a directional index enhancement could be added later, but not considering it now and then adding it later would mean rpartition() would become redundant and/or an argument against doing it later. As it's been stated fairly often, it's hard to remove something once it's put in. So it's prudent to consider a few alternative forms and rule them out, rather than try to change things later. Cheers, Ron
participants (25)
-
Anthony Baxter
-
Antoine Pitrou
-
Barry Warsaw
-
David Goodger
-
Delaney, Timothy (Tim)
-
Eric Nieuwland
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Greg Ewing
-
Josiah Carlson
-
JustFillBug
-
LD "Gus" Landis
-
Michael Hoffman
-
Nick Coghlan
-
Oren Tirosh
-
Phillip J. Eby
-
Pierre Barbier de Reuille
-
Raymond Hettinger
-
Ron Adam
-
Shane Hathaway
-
skip@pobox.com
-
Stephen J. Turnbull
-
Terry Reedy
-
Tim Peters
-
Wolfgang Lipp