Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method: s.split(sep[, maxsplit[, pad]]) pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments: user, hostname= address.split('@', 1, True) without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’. -- And Clover mailto:and@doxdesk.com http://www.doxdesk.com/
And Clover wrote:
Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method:
s.split(sep[, maxsplit[, pad]])
pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments:
user, hostname= address.split('@', 1, True)
without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’.
Can you find a better use case? For splitting email address, I think I would want to know if the address turned out to be invalid (e.g. it does not contain exactly 1 @s)
Lie Ryan wrote:
Can you find a better use case?
Well here are some random uses from projects that a search on splitpad (one of the names I used for it) is turning up: command, parameters= splitpad(line, ' ', 1) # get SMTP command y, m, d= splitpad(t, '-', 2) # split date, month and day optional headers, body= splitpad(request, '\n\n', 1) # there might be no body table, column= rsplitpad(colname, '.', 1) # extract SQL [table.]column name id, cat, name, price= splitpad(line, ',', 3) # should be four columns, but editor might have lost trailing commas user, pwd= splitpad(base64.decodestring(authtoken), ':', 1) # will always contain ':' unless malformed pars= dict(splitpad(p, '=', 1) for p in input.split(';')) # no '=value' part is allowable server, version= splitpad(environ.get('SERVER_SOFTWARE', ''), '/', 1) # might not have a version And so on. (Obviously these have an internetty bias, where “be liberal in what you accept” is desirable.)
For splitting email address, I think I would want to know if the address turned out to be invalid (e.g. it does not contain exactly 1 @s)
Maybe, maybe not. In this case I wanted to accept the case of a bare username, with or without ‘@’, as a local user. An empty string instead of an exception for a missing part is something I find very common; it kind of fits with Python's “string processing does what you usually want” behaviour (as compared to other languages that are still tediously throwing exceptions when you try to slice outside the string length range). For example with an HTTP command (eg. “GET / HTTP/1.0”): method, path, version= splitpad(command, ' ', 2) ‘version’ might be missing, on ancient HTTP/0.9 clients. ‘path’ could be missing, on malformed requests. In either of those cases I don't want an exception, and I don't particularly want to burden my split code with extra checking; I'll probably have to do further checking on ‘path’ anyway so setting it to an empty string is the best I can do here. The alternative I use if I can't be bothered to define splitpad() again is something like: parts= command.split(' ', 2) method= parts[0] path= parts[1] if len(parts)>=2 else '' .... which is pretty ugly. -- And Clover mailto:and@doxdesk.com http://www.doxdesk.com/
And Clover wrote:
Lie Ryan wrote:
Can you find a better use case?
Well here are some random uses from projects that a search on splitpad (one of the names I used for it) is turning up:
command, parameters= splitpad(line, ' ', 1) # get SMTP command y, m, d= splitpad(t, '-', 2) # split date, month and day optional headers, body= splitpad(request, '\n\n', 1) # there might be no body table, column= rsplitpad(colname, '.', 1) # extract SQL [table.]column name id, cat, name, price= splitpad(line, ',', 3) # should be four columns, but editor might have lost trailing commas user, pwd= splitpad(base64.decodestring(authtoken), ':', 1) # will always contain ':' unless malformed pars= dict(splitpad(p, '=', 1) for p in input.split(';')) # no '=value' part is allowable server, version= splitpad(environ.get('SERVER_SOFTWARE', ''), '/', 1) # might not have a version
And so on. (Obviously these have an internetty bias, where “be liberal in what you accept” is desirable.)
For splitting email address, I think I would want to know if the address turned out to be invalid (e.g. it does not contain exactly 1 @s)
Maybe, maybe not. In this case I wanted to accept the case of a bare username, with or without ‘@’, as a local user. An empty string instead of an exception for a missing part is something I find very common; it kind of fits with Python's “string processing does what you usually want” behaviour (as compared to other languages that are still tediously throwing exceptions when you try to slice outside the string length range).
For example with an HTTP command (eg. “GET / HTTP/1.0”):
method, path, version= splitpad(command, ' ', 2)
‘version’ might be missing, on ancient HTTP/0.9 clients. ‘path’ could be missing, on malformed requests. In either of those cases I don't want an exception, and I don't particularly want to burden my split code with extra checking; I'll probably have to do further checking on ‘path’ anyway so setting it to an empty string is the best I can do here.
The alternative I use if I can't be bothered to define splitpad() again is something like:
parts= command.split(' ', 2) method= parts[0] path= parts[1] if len(parts)>=2 else '' ....
which is pretty ugly.
I am honestly quite surprised: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
On Sat, 14 Mar 2009 01:43:28 pm Lie Ryan wrote:
And Clover wrote:
Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method:
s.split(sep[, maxsplit[, pad]])
pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments:
user, hostname= address.split('@', 1, True)
without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’.
Can you find a better use case? For splitting email address, I think I would want to know if the address turned out to be invalid (e.g. it does not contain exactly 1 @s)
What makes you think that email address must contain exactly one @ sign? Email being sent locally may contain zero @ signs, and email being sent externally can contain one or more @ signs. Andy's code: user, hostname= address.split('@', 1, True) will fail on syntactically valid email addresses like this: fred(away @ the pub)@example.com -- Steven D'Aprano
Steven D'Aprano wrote:
On Sat, 14 Mar 2009 01:43:28 pm Lie Ryan wrote:
And Clover wrote:
Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method:
s.split(sep[, maxsplit[, pad]])
pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments:
user, hostname= address.split('@', 1, True)
without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’. Can you find a better use case? For splitting email address, I think I would want to know if the address turned out to be invalid (e.g. it does not contain exactly 1 @s)
What makes you think that email address must contain exactly one @ sign?
Email being sent locally may contain zero @ signs, and email being sent externally can contain one or more @ signs. Andy's code:
user, hostname= address.split('@', 1, True)
will fail on syntactically valid email addresses like this:
fred(away @ the pub)@example.com
From Wikipedia: RFC invalid e-mail addresses * Abc.example.com (character @ is missing) * Abc.@example.com (character dot(.) is last in local part) * Abc..123@example.com (character dot(.) is double) * A@b@c@example.com (only one @ is allowed outside quotations marks) * ()[]\;:,<>@example.com (none of the characters before the @ in this example, are allowed outside quotation marks) Your example is valid email address if and only if it is enclosed in quotation mark: "fred(away @ the pub)"@example.com
On Fri, Mar 13, 2009 at 9:59 PM, Lie Ryan <lie.1296@gmail.com> wrote:
Steven D'Aprano wrote:
Email being sent locally may contain zero @ signs, and email being sent externally can contain one or more @ signs. Andy's code:
user, hostname= address.split('@', 1, True)
will fail on syntactically valid email addresses like this:
fred(away @ the pub)@example.com
From Wikipedia: RFC invalid e-mail addresses * Abc.example.com <http://abc.example.com/> (character @ is missing) * Abc.@example.com (character dot(.) is last in local part) * Abc..123@example.com (character dot(.) is double) * A@b@c@example.com (only one @ is allowed outside quotations marks) * ()[]\;:,<>@example.com (none of the characters before the @ in this example, are allowed outside quotation marks)
Your example is valid email address if and only if it is enclosed in quotation mark: "fred(away @ the pub)"@example.com
That is valid but not because you can have nested email addresses like that.** The (...) part is a comment. I wouldn't bet that very many mail clients handle that according to the rfc. Many don't handle quoted strings either. And there are those that have a narrow view of which characters (Hint: if you don't want to get mail from hotmail users, just make sure your email address has '/' in it.) http://www.ietf.org/rfc/rfc0822.txt **Way back people wrote nested email addresses with % replacing the @ in the nested address (sna%foo@bar). I haven't seen that for a while. On topic: Making split more complicated seems like overspecialization. Wouldn't a generic padding function be more useful? FWIW, this has been discussed before. http://bugs.python.org/issue5034 --- Bruce (sorry for the digression)
And Clover schrieb:
Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method:
s.split(sep[, maxsplit[, pad]])
pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments:
user, hostname= address.split('@', 1, True)
without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’.
Note that for maxsplit=1, you can use str.partition(). Georg
Georg Brandl wrote:
Note that for maxsplit=1, you can use str.partition().
Indeed, though it does slightly spoil the cleanness of the unpacking assignment to include a dummy lvalue for the middle element. [Thanks for the on-topic reply! I'm surprised more people haven't felt the need to write unpacking splits like this to be honest, but I guess engaging in SMTP syntax law is much more fun. Yes guys, I'm well aware of the capabilities of the RFC2822 addr-spec format, thanks, and no, it's not relevant to the particular program that example came from. Cheers for the concern though.] -- And Clover mailto:and@doxdesk.com http://www.doxdesk.com/
And Clover wrote:
Here's a simple one I've reinvented in my own apps often enough that it might be worth adding to the built-in split() method:
s.split(sep[, maxsplit[, pad]])
pad, if set True, would pad out the returned list with empty strings (strs/unicodes depending on returned datatype) so that the list was always (maxsplit+1) elements long. This allows one to do things like unpacking assignments:
user, hostname= address.split('@', 1, True)
without having to worry about exceptions when the number of ‘sep’s in the string is unexpectedly fewer than ‘maxsplit’.
I would make pad = <padding string>. Example use case: major,minor,micro = pyversion.split('.', 2, '0') # 3.0 = 3.0.0, etc. # or major,minor,micro = (int(s) for s in pyversion.split('.', 2, '0') ) I suppose a counter argument is than one could write (pyversion+'.0').split('.',2) Terry Jan Reedy
participants (6)
-
And Clover
-
Bruce Leban
-
Georg Brandl
-
Lie Ryan
-
Steven D'Aprano
-
Terry Reedy