s1 == (sf % (s1 / sf))? A bad idea?

At the moment it is very silent on Python-dev. I guess you guys are all out hunting dead parrots, which escaped from the cages on April 1st. ;-) So this might be the right moment to present a possibly bad idea (TM). see below. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) PEP: XXXX Title: String Scanning Version: $Revision$ Author: pf@artcom-gmbh.de (Peter Funk) Status: Not yet Draft Type: Standards Track Python-Version: 2.2 Created: 02-Apr-2001 Post-History: Abstract This document proposes a string scanning feature for Python to allow easier string parsing. The suggested syntax change is to allow the use of the division '/' operator for string operands as counterpart to the already existing '%' string interpolation operator. In current Python this raises an exception: 'TypeError: bad operand type(s) for /'. With the proposed enhancement the expression string1 / format2 should either return a simple value, a tuple of values or a dictionary depending on the content of the right operand (aka. format) string. Copyright This document is in the public domain. Specification The feature should mimic the behaviour of the scanf function well known to C programmers. For any format string sf and any matching input string si the following pseudo condition should be true: string.split( sf % (si / sf) ) == string.split( si ) That is modulo any differences in white space the result of the string interpolation using the intermediate result from the string scanning operation should look similar to original input string. All conversions are introduced by the % (percent sign) character. The format string may also contain other characters. White space (such as blanks, tabs, or newlines) in the format string match any amount of white space, including none, in the input. Everything else matches only itself. Scanning stops when an input character does not match such a format character. Scanning also stops when an input conversion cannot be made (see below). Examples Here is an example of an interactive session exhibiting the expected behaviour of this feature. >>> "12345 John Doe" / "%5d %8s" (12345, 'John Doe') >>> "12 34 56 7.890" / "%d %d %d %f" (12, 34, 56, 7.8899999999999997) >>> "12345 John Doe, Foo Bar" / "%(num)d %(n)s, %(f)s %(b)s" {'n': 'John Doe', 'f': 'Foo', 'b': 'Bar', 'num': 12345} >>> "1 2" / "%d %d %d" Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: not all arguments filled Discussion This should fix the assymetry between arithmetic types and strings. It should also make the life easier for C programmers migrating to Python (see FAQ 4.33). Those poor souls are acustomed to scanf as the counterpart of printf and usually feel uneasy to convert to string slitting, slicing or the syntax of regular expressions. Security Issues There should be no security issues. Implementation There is no implementation yet. This is just an idea. Local Variables: mode: indented-text indent-tabs-mode: nil End:

Peter Funk wrote:
I would prefer "foo".scanf("%5d %8s") or maybe "parse" or "parseformats" or something like that. I know that punctuation abuse leads inexorably to further punctuation abuse but the cycle must stop somewhere. It's too late for "%" but let's save "/" while we still can! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

On Mon, Apr 02, 2001 at 06:06:49PM -0700, Paul Prescod wrote:
Peter Funk wrote:
Agreed, on both issues. We don't have 'printf', lets not use something as inexplicable as 'scanf'! -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Peter, if you can do a prototype implementation (in Python would be best), the idea might be received better. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum:
Peter, if you can do a prototype implementation (in Python would be best), the idea might be received better.
I believe a strawman derived from the UserString class could be done in pure Python. But I'm sorry: I've no time for this during April. I'm also not sure, whether this is really a worthwile effort and whether I should champion this idea further. From Pauls response I got the impression that people already consider the '%' string interpolation operator as a language wart rather than an elegant feature. I however often like the infix notation better. That may be a matter of taste. Imagine we would have to write: "%5d %20s %s\n".printf((num, name, adr)) instead of "%5d %20s %s\n" % (num, name, adr) I'm happy, that this is not the case in todays Python. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)

Oh well, maybe someone else will like the idea.
Well, that was one response. Besides, it's easy to factor out two separate design decisions: (1) a string scanning mechanism that takes two strings (a format and an input string) and returns one or more values extracted from the input string according to the rules set by the format string, and (2) how to spell this: scanf(format, input) or format/input or input/format or whatever.
I however often like the infix notation better.
See my two examples above for a concern: already I cannot recall whether your PEP proposes format/input or input/format. That's a bad sign for either spelling. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum: [...]
Hmmm.... May be I've stressed the analogy to arithmetic a bit to far: If the string interpolation operator were '*' instead of '%' then you could think of "multiplying" a format string with one or more values gives a result string which than represents some kind of "product". Taking this result string now as input to the scanning operation is some kind of "division" reverting the previous string interpolation operation. From that POV it would be pretty obvious, that "dividing" the input string by the format string as denominator returns the values previously formatted into it. But since the string interpolation operator is '%' the analogy from multiplication to formatting is obviously not at all that obvious. :-( Regards, Peter

Peter Funk wrote:
Either way it is infix (as opposed to prefix or postfix). The question is whether it is an infix *operator* or a method. I believe that the only thing aesthetically wrong with this: "%5d %20s %s\n".insert(num, name, adr) is that people are not "used" to seeing method invocations on literal strings. But then new Python programmers are not used to seeing people divide or mod strings either! And the nice thing about using a method name is that you can look method names up in the indexes of books easily and even guess the meaning of them from their English meanings. Symbols are (IMHO) best reserved for usages where their meanings are already set by real-world convention. (i.e. 5+3!) If some other language convinces millions of programmers that string division is natural then we could follow suit but I'd rather not lead the way. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

[Peter Funk]
I believe a strawman derived from the UserString class could be done in pure Python. But I'm sorry: I've no time for this during April.
sscanf for Python gets reinvented like clockwork; e.g., see ftp://ftp.python.org/pub/python/ contrib-09-Dec-1999/Misc/sscanfmodule.README for 1995's version of this crusade.
Not me! Infix "%" is great. But while "%" was mnemonic for the heavy use of "%" in format strings, "/" doesn't say anything to me. Combine that with the relative infrequency of sscanf vs sprintf calls (in C code, Perl code, or (I sure suspect) in Python code too), and I'm -1 on infix "/" for sscanf. Making it a method of the format string would be fine (why the format string? because capturing a bound method object like parse3d = "%d %d %d".whatever would be darned useful, but the other way wouldn't be). Finally, since .scanf() is a rotten method name (like .join() before it, it doesn't make clear which operand is scanned and which format), try something like format.scanning(string) instead. language-design-is-easy<wink>-ly y'rs - tim

"TP" == Tim Peters <tim.one@home.com> writes:
TP> Making it a method of the format string would be fine (why the TP> format string? because capturing a bound method object like TP> parse3d = "%d %d %d".whatever TP> would be darned useful, but the other way wouldn't be). TP> Finally, since .scanf() is a rotten method name (like .join() TP> before it, it doesn't make clear which operand is scanned and TP> which format), try something like format.scanning(string) TP> instead. My preference would be to have a separate module with the necessary support. It sure would be easy to add to the language. I imagine something like this: import fileinput import scanf fmt = scanf.Format("%d %d %d") for line in fileinput.intput(): mo = fmt.scan(line) if mo: print mo.group(1, 2, 3) Jeremy

Jeremy> I imagine something like this: Jeremy> import fileinput Jeremy> import scanf ... Placing the functionality in a module is fine as well, but again, "scanf" only means something if you've programmed in C before. I suspect there are college students graduating from CS departments now who have used C++ but not C and wouldn't have the slightest idea what "scanf" means. Skip

"SM" == Skip Montanaro <skip@pobox.com> writes:
Jeremy> I imagine something like this: Jeremy> import fileinput import scanf SM> ... SM> Placing the functionality in a module is fine as well, but SM> again, "scanf" only means something if you've programmed in C SM> before. I suspect there are college students graduating from CS SM> departments now who have used C++ but not C and wouldn't have SM> the slightest idea what "scanf" means. I don't care much about the name. scanf is fine with me ("scan with format") but so is "scan" -- or "parrot." I do care about it being based on a module rather than a builtin operator or a string method. I see scanf-based scanning as roughly equivalent to regular expressions, which live happily in a module. If we're going to add a scan method to strings, I can imagine people wanting "\d+".re_match() and "\d+".re_search() methods on strings, too. Jeremy

Tim> Finally, since .scanf() is a rotten method name (like .join() Tim> before it, it doesn't make clear which operand is scanned and which Tim> format), try something like format.scanning(string) instead. Hmmm... If method names are the way to go, I'd much rather we found a more active verb name than "scanning". How about "extract" or "slice"? Even simply "scan" sounds better to me. Back to the infix operator idea, I agree with Peter on the one hand that there's a certain symmetry to using infix "/" and with the opposing camp that the only reason "%" works for emitting strings is the use of C's % format character. "*" sort of suggests exploding... ;-) Skip

Peter Funk wrote:
I would prefer "foo".scanf("%5d %8s") or maybe "parse" or "parseformats" or something like that. I know that punctuation abuse leads inexorably to further punctuation abuse but the cycle must stop somewhere. It's too late for "%" but let's save "/" while we still can! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

On Mon, Apr 02, 2001 at 06:06:49PM -0700, Paul Prescod wrote:
Peter Funk wrote:
Agreed, on both issues. We don't have 'printf', lets not use something as inexplicable as 'scanf'! -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Peter, if you can do a prototype implementation (in Python would be best), the idea might be received better. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum:
Peter, if you can do a prototype implementation (in Python would be best), the idea might be received better.
I believe a strawman derived from the UserString class could be done in pure Python. But I'm sorry: I've no time for this during April. I'm also not sure, whether this is really a worthwile effort and whether I should champion this idea further. From Pauls response I got the impression that people already consider the '%' string interpolation operator as a language wart rather than an elegant feature. I however often like the infix notation better. That may be a matter of taste. Imagine we would have to write: "%5d %20s %s\n".printf((num, name, adr)) instead of "%5d %20s %s\n" % (num, name, adr) I'm happy, that this is not the case in todays Python. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)

Oh well, maybe someone else will like the idea.
Well, that was one response. Besides, it's easy to factor out two separate design decisions: (1) a string scanning mechanism that takes two strings (a format and an input string) and returns one or more values extracted from the input string according to the rules set by the format string, and (2) how to spell this: scanf(format, input) or format/input or input/format or whatever.
I however often like the infix notation better.
See my two examples above for a concern: already I cannot recall whether your PEP proposes format/input or input/format. That's a bad sign for either spelling. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum: [...]
Hmmm.... May be I've stressed the analogy to arithmetic a bit to far: If the string interpolation operator were '*' instead of '%' then you could think of "multiplying" a format string with one or more values gives a result string which than represents some kind of "product". Taking this result string now as input to the scanning operation is some kind of "division" reverting the previous string interpolation operation. From that POV it would be pretty obvious, that "dividing" the input string by the format string as denominator returns the values previously formatted into it. But since the string interpolation operator is '%' the analogy from multiplication to formatting is obviously not at all that obvious. :-( Regards, Peter

Peter Funk wrote:
Either way it is infix (as opposed to prefix or postfix). The question is whether it is an infix *operator* or a method. I believe that the only thing aesthetically wrong with this: "%5d %20s %s\n".insert(num, name, adr) is that people are not "used" to seeing method invocations on literal strings. But then new Python programmers are not used to seeing people divide or mod strings either! And the nice thing about using a method name is that you can look method names up in the indexes of books easily and even guess the meaning of them from their English meanings. Symbols are (IMHO) best reserved for usages where their meanings are already set by real-world convention. (i.e. 5+3!) If some other language convinces millions of programmers that string division is natural then we could follow suit but I'd rather not lead the way. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

[Peter Funk]
I believe a strawman derived from the UserString class could be done in pure Python. But I'm sorry: I've no time for this during April.
sscanf for Python gets reinvented like clockwork; e.g., see ftp://ftp.python.org/pub/python/ contrib-09-Dec-1999/Misc/sscanfmodule.README for 1995's version of this crusade.
Not me! Infix "%" is great. But while "%" was mnemonic for the heavy use of "%" in format strings, "/" doesn't say anything to me. Combine that with the relative infrequency of sscanf vs sprintf calls (in C code, Perl code, or (I sure suspect) in Python code too), and I'm -1 on infix "/" for sscanf. Making it a method of the format string would be fine (why the format string? because capturing a bound method object like parse3d = "%d %d %d".whatever would be darned useful, but the other way wouldn't be). Finally, since .scanf() is a rotten method name (like .join() before it, it doesn't make clear which operand is scanned and which format), try something like format.scanning(string) instead. language-design-is-easy<wink>-ly y'rs - tim

"TP" == Tim Peters <tim.one@home.com> writes:
TP> Making it a method of the format string would be fine (why the TP> format string? because capturing a bound method object like TP> parse3d = "%d %d %d".whatever TP> would be darned useful, but the other way wouldn't be). TP> Finally, since .scanf() is a rotten method name (like .join() TP> before it, it doesn't make clear which operand is scanned and TP> which format), try something like format.scanning(string) TP> instead. My preference would be to have a separate module with the necessary support. It sure would be easy to add to the language. I imagine something like this: import fileinput import scanf fmt = scanf.Format("%d %d %d") for line in fileinput.intput(): mo = fmt.scan(line) if mo: print mo.group(1, 2, 3) Jeremy

Jeremy> I imagine something like this: Jeremy> import fileinput Jeremy> import scanf ... Placing the functionality in a module is fine as well, but again, "scanf" only means something if you've programmed in C before. I suspect there are college students graduating from CS departments now who have used C++ but not C and wouldn't have the slightest idea what "scanf" means. Skip

"SM" == Skip Montanaro <skip@pobox.com> writes:
Jeremy> I imagine something like this: Jeremy> import fileinput import scanf SM> ... SM> Placing the functionality in a module is fine as well, but SM> again, "scanf" only means something if you've programmed in C SM> before. I suspect there are college students graduating from CS SM> departments now who have used C++ but not C and wouldn't have SM> the slightest idea what "scanf" means. I don't care much about the name. scanf is fine with me ("scan with format") but so is "scan" -- or "parrot." I do care about it being based on a module rather than a builtin operator or a string method. I see scanf-based scanning as roughly equivalent to regular expressions, which live happily in a module. If we're going to add a scan method to strings, I can imagine people wanting "\d+".re_match() and "\d+".re_search() methods on strings, too. Jeremy

Tim> Finally, since .scanf() is a rotten method name (like .join() Tim> before it, it doesn't make clear which operand is scanned and which Tim> format), try something like format.scanning(string) instead. Hmmm... If method names are the way to go, I'd much rather we found a more active verb name than "scanning". How about "extract" or "slice"? Even simply "scan" sounds better to me. Back to the infix operator idea, I agree with Peter on the one hand that there's a certain symmetry to using infix "/" and with the opposing camp that the only reason "%" works for emitting strings is the use of C's % format character. "*" sort of suggests exploding... ;-) Skip
participants (8)
-
Guido van Rossum
-
Jeremy Hylton
-
Michel Pelletier
-
Paul Prescod
-
pf@artcom-gmbh.de
-
Skip Montanaro
-
Thomas Wouters
-
Tim Peters