Death to string functions!

On Fri, Dec 15, 2000 at 07:09:46AM -0800, Guido van Rossum wrote:
Can you explain the logic behind this recent interest in removing string functions from the standard library? It it performance? Some unicode issue? I don't have a great attachment to string.py but I also don't see the justification for the amount of work it requires. Neil

I figure that at *some* point we should start putting our money where our mouth is, deprecate most uses of the string module, and start warning about it. Not in 2.1 probably, given my experience below. As a realistic test of the warnings module I played with some warnings about the string module, and then found that say most of the std library modules use it, triggering an extraordinary amount of warnings. I then decided to experiment with the conversion. I quickly found out it's too much work to do manually, so I'll hold off until someone comes up with a tool that does 99% of the work. (The selection of std library modules to convert manually was triggered by something pretty random -- I decided to silence a particular cron job I was running. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Neil Schemenauer]
[Guido]
I think this begs Neil's questions: *is* our mouth there <ahem>, and if so, why? The only public notice of impending string module deprecation anyone came up with was a vague note on the 1.6 web page, and one not repeated in any of the 2.0 release material. "string" is right up there with "os" and "sys" as a FIM (Frequently Imported Module), so the required code changes will be massive. As a user, I don't see what's in it for me to endure that pain: the string module functions work fine! Neither are they warts in the language, any more than that we say sin(pi) instead of pi.sin(). Keeping the functions around doesn't hurt anybody that I can see.
Ah, so that's the *easy* way to kill this crusade -- forget I said anything <wink>.

Hm. I'm not saying that this one will be easy. But I don't like having "two ways to do it". It means more learning, etc. (you know the drill). We could have chosen to make the strop module support Unicode; instead, we chose to give string objects methods and promote the use of those methods instead of the string module. (And in a generous mood, we also supported Unicode in the string module -- by providing wrappers that invoke string methods.) If you're saying that we should give users ample time for the transition, I'm with you. If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem. I'd also like to note that using the string module's wrappers incurs the overhead of a Python function call -- using string methods is faster. Finally, I like the look of fields[i].strip().lower() much better than that of string.lower(string.strip(fields[i])) -- an actual example from mimetools.py. Ideally, I would like to deprecate the entire string module, so that I can place a single warning at its top. This will cause a single warning to be issued for programs that still use it (no matter how many times it is imported). Unfortunately, there are a couple of things that still need it: string.letters etc., and string.maketrans(). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Can't we come up with a module similar to unicodedata[.py] ? string.py could then still provide the interfaces, but the implementation would live in stringdata.py [Perhaps we won't need stringdata by then... Unicode will have taken over and the discussion be mood ;-)] -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Hi, Guido van Rossum:
If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem.
I strongly believe the string module is too prominent.
I think most care more about readbility than about run time performance. For people without much OOP experience, the method syntax hurts readability.
Hmmmm.... May be this is just a matter of taste? Like my preference for '<>' instead of '!='? Personally I still like the old fashinoned form more. Especially, if string.join() or string.split() are involved. Since Python 1.5.2 will stay around for several years, keeping backward compatibility in our Python coding is still major issue for us. So we won't change our Python coding style soon if ever.
Just my $0.02, Peter

I don't believe one bit of this. By that standard, we would do better to define a new module "list" and start writing list.append(L, x) for L.append(x).
You are entitled to your opinion, but given that your arguments seem very weak I will continue to ignore it (except to argue with you :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

[string.function(S, ...) vs. S.method(...)] Guido van Rossum:
list objects have only very few methods. Strings have so many methods. Some of them have names, that clash easily with the method names of other kind of objects. Since there are no type declarations in Python, looking at the code in isolation and seeing a line i = string.index(some_parameter) tells at the first glance, that some_parameter should be a string object even if the doc string of this function is too terse. However in i = some_parameter.index() it could be a list, a database or whatever.
You are entitled to your opinion, but given that your arguments seem very weak I will continue to ignore it (except to argue with you :-).
I see. But given the time frame that the string module wouldn't go away any time soon, I guess I have a lot of time to either think about some stronger arguments or to get finally accustomed to that new style of coding. But since we have to keep compatibility with Python 1.5.2 for at least the next two years chances for the latter are bad. Regards and have a nice vacation, Peter

"PF" == Peter Funk <pf@artcom-gmbh.de> writes:
PF> Hmmmm.... May be this is just a matter of taste? Like my PF> preference for '<>' instead of '!='? Personally I still like PF> the old fashinoned form more. Especially, if string.join() or PF> string.split() are involved. Hey cool! I prefer <> over != too, but I also (not surprisingly) strongly prefer string methods over string module functions. TOOWTDI-MA-ly y'rs, -Barry

+1 on deprecating string functions. Every Python book and tutorial (including mine) emphasizes Python's simplicity and lack of Perl-ish redundancy; the more we practice what we preach, the more persuasive this argument is. Greg (who admittedly only has a few thousand lines of Python to maintain)

On Sun, Dec 17, 2000 at 11:25:17AM -0500, Greg Wilson wrote:
+1 on deprecating string functions.
How wonderfully ambiguous ! Do you mean string methods, or the string module? :) FWIW, I agree that in time, the string module should be deprecated. But I also think that 'in time' should be a considerable timespan. Don't deprecate it before everything it provides is available though some other means. Wait a bit longer than that, even, before calling it deprecated -- that scares people off. And then keep it for practically forever (until Py3K) just to support old code. And don't forget to document it 'deprecated' everywhere, not just one minor release note. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas Wouters writes:
FWIW, I agree that in time, the string module should be deprecated. But I also think that 'in time' should be a considerable timespan. Don't deprecate
*If* most functions in the string module are going to be deprecated, that should be done *now*, so that the documentation will include the appropriate warning to users. When they should actually be removed is another matter, and I think Guido is sufficiently aware of their widespread use and won't remove them too quickly -- his creation of Python isn't the reason he's *accepted* as BDFL, it just made it a possibility. He's had to actually *earn* the BDFL position, I think. With regard to converting the standard library to string methods: that needs to be done as part of the deprecation. The code in the library is commonly used as example code, and should be good example code wherever possible.
support old code. And don't forget to document it 'deprecated' everywhere, not just one minor release note.
When Guido tells me exactly what is deprecated, the documentation will be updated with proper deprecation notices in the appropriate places. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

As a longstanding Python advocate and user, I find this thread disturbing, and feel compelled to add a few words:
But with all due respect, there are already _lots_ of places in Python that provide at least two ways to do something already. Why be so strict on this one alone? Consider lambda and def; tuples and lists; map and for loops; the loop else and boolean exit flags; and so on. The notion of Python forcing a single solution is largely a myth. And as someone who makes a living teaching this stuff, I can tell you that none of the existing redundancies prevent anyone from learning Python. More to the point, many of those shiny new features added to 2.0 fall squarely into this category too, and are completely redundant with other tools. Consider list comprehensions and simple loops; extended print statements and sys.std* assignments; augmented assignment statements and simpler ones. Eliminating redundancy at a time when we're also busy introducing it seems a tough goal to sell. I understand the virtues of aesthetics too, but removing the string module seems an incredibly arbitrary application of it.
And to me, this seems the real crux of the matter. For a decade now, the string module _has_ been the right way to do it. And today, half a million Python developers absolutely rely on it as an essential staple in their toolbox. What could possibly be wrong with keeping it around for backward compatibility, albeit as a less recommended option? If almost every Python program ever written suddenly starts issuing warning messages, then I think we do have a problem indeed. Frankly, a Python that changes without regard to its user base seems an ominous thing to me. And keep in mind that I like Python; others will look much less generously upon a tool that seems inclined to rip the rug out from under its users. Trust me on this; I've already heard the rumblings out there. So please: can we keep string around? Like it or not, we're way past the point of removing such core modules at this point. Such a radical change might pass in a future non-backward- compatible Python mutation; I'm not sure such a different system will still be "Python", but that's a topic for another day. All IMHO, of course, --Mark Lutz (http://www.rmi.net~lutz)

[Mark Lutz]
So please: can we keep string around? Like it or not, we're way past the point of removing such core modules at this point.
Of course we're keeping string around. I already said that for backwards compatibility reasons it would not disappear before Py3K. I think there's a misunderstanding about the meaning of deprecation, too. That word doesn't mean to remove a feature. It doesn't even necessarily mean to warn every time a feature is used. It just means (to me) that at some point in the future the feature will change or disappear, there's a new and better way to do it, and that we encourage users to start using the new way, to save them from work later. In my mind, there's no reason to start emitting warnings about every deprecated feature. The warnings are only needed late in the deprecation cycle. PEP 5 says "There must be at least a one-year transition period between the release of the transitional version of Python and the release of the backwards incompatible version." Can we now stop getting all bent out of shape over this? String methods *are* recommended over equivalent string functions. Those string functions *are* already deprecated, in the informal sense (i.e. just that it is recommended to use string methods instead). This *should* (take notice, Fred!) be documented per 2.1. We won't however be issuing run-time warnings about the use of string functions until much later. (Lint-style tools may start warning sooner -- that's up to the author of the lint tool to decide.) Note that I believe Java makes a useful distinction that PEP 5 misses: it defines both deprecated features and obsolete features. *Deprecated* features are simply features for which a better alternative exists. *Obsolete* features are features that are only being kept around for backwards compatibility. Deprecated features may also be (and usually are) *obsolescent*, meaning they will become obsolete in the future. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
Then we're with each other, for suitably large values of "ample" <wink>.
If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem.
We may. Time will tell. It needs a conversion tool, else I think it's unsellable.
I happen to like string methods better myself; I don't think that's at issue (except that loads of people apparently don't like "join" as a string method -- idiots <wink>). The issue to me is purely breaking old code someday -- "string" is in very heavy use, and unlike as when deprecating regex in favor of re (either pre or especially sre), string methods aren't orders of magnitude better than the old way; and also unlike regex-vs-re it's not the case that the string module has become unmaintainable (to the contrary, string.py has become trivial). IOW, this one would be unprecedented fiddling.
I agree it would be useful to define these terms, although those particular definitions appear to be missing the most important point from the user's POV (not a one says "going away someday"). A Google search on "java obsolete obsolescent deprecated" doesn't turn up anything useful, so I doubt the usages you have in mind come from Java (it has "deprecated", but doesn't appear to have any well-defined meaning for the others). In keeping with the religious nature of the battle-- and religion offers precise terms for degrees of damnation! --I suggest: struggling -- a supported feature; the initial state of all features; may transition to Anathematized anathematized -- this feature is now cursed, but is supported; may transition to Condemned or Struggling; intimacy with Anathematized features is perilous condemned -- a feature scheduled for crucifixion; may transition to Crucified, Anathematized (this transition is called "a pardon"), or Struggling (this transition is called "a miracle"); intimacy with Condemned features is suicidal crucified -- a feature that is no longer supported; may transition to Resurrected resurrected -- a once-Crucified feature that is again supported; may transition to Condemned, Anathematized or Struggling; although since Resurrection is a state of grace, there may be no point in human time at which a feature is identifiably Resurrected (i.e., it may *appear*, to the unenlightened, that a feature moved directly from Crucified to Anathematized or Struggling or Condemned -- although saying so out loud is heresy).

Guido van Rossum wrote:
This would also help a lot of programmers out there who are stuch with 100k LOCs of Python code using string.py ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

I figure that at *some* point we should start putting our money where our mouth is, deprecate most uses of the string module, and start warning about it. Not in 2.1 probably, given my experience below. As a realistic test of the warnings module I played with some warnings about the string module, and then found that say most of the std library modules use it, triggering an extraordinary amount of warnings. I then decided to experiment with the conversion. I quickly found out it's too much work to do manually, so I'll hold off until someone comes up with a tool that does 99% of the work. (The selection of std library modules to convert manually was triggered by something pretty random -- I decided to silence a particular cron job I was running. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Neil Schemenauer]
[Guido]
I think this begs Neil's questions: *is* our mouth there <ahem>, and if so, why? The only public notice of impending string module deprecation anyone came up with was a vague note on the 1.6 web page, and one not repeated in any of the 2.0 release material. "string" is right up there with "os" and "sys" as a FIM (Frequently Imported Module), so the required code changes will be massive. As a user, I don't see what's in it for me to endure that pain: the string module functions work fine! Neither are they warts in the language, any more than that we say sin(pi) instead of pi.sin(). Keeping the functions around doesn't hurt anybody that I can see.
Ah, so that's the *easy* way to kill this crusade -- forget I said anything <wink>.

Hm. I'm not saying that this one will be easy. But I don't like having "two ways to do it". It means more learning, etc. (you know the drill). We could have chosen to make the strop module support Unicode; instead, we chose to give string objects methods and promote the use of those methods instead of the string module. (And in a generous mood, we also supported Unicode in the string module -- by providing wrappers that invoke string methods.) If you're saying that we should give users ample time for the transition, I'm with you. If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem. I'd also like to note that using the string module's wrappers incurs the overhead of a Python function call -- using string methods is faster. Finally, I like the look of fields[i].strip().lower() much better than that of string.lower(string.strip(fields[i])) -- an actual example from mimetools.py. Ideally, I would like to deprecate the entire string module, so that I can place a single warning at its top. This will cause a single warning to be issued for programs that still use it (no matter how many times it is imported). Unfortunately, there are a couple of things that still need it: string.letters etc., and string.maketrans(). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Can't we come up with a module similar to unicodedata[.py] ? string.py could then still provide the interfaces, but the implementation would live in stringdata.py [Perhaps we won't need stringdata by then... Unicode will have taken over and the discussion be mood ;-)] -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Hi, Guido van Rossum:
If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem.
I strongly believe the string module is too prominent.
I think most care more about readbility than about run time performance. For people without much OOP experience, the method syntax hurts readability.
Hmmmm.... May be this is just a matter of taste? Like my preference for '<>' instead of '!='? Personally I still like the old fashinoned form more. Especially, if string.join() or string.split() are involved. Since Python 1.5.2 will stay around for several years, keeping backward compatibility in our Python coding is still major issue for us. So we won't change our Python coding style soon if ever.
Just my $0.02, Peter

I don't believe one bit of this. By that standard, we would do better to define a new module "list" and start writing list.append(L, x) for L.append(x).
You are entitled to your opinion, but given that your arguments seem very weak I will continue to ignore it (except to argue with you :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

[string.function(S, ...) vs. S.method(...)] Guido van Rossum:
list objects have only very few methods. Strings have so many methods. Some of them have names, that clash easily with the method names of other kind of objects. Since there are no type declarations in Python, looking at the code in isolation and seeing a line i = string.index(some_parameter) tells at the first glance, that some_parameter should be a string object even if the doc string of this function is too terse. However in i = some_parameter.index() it could be a list, a database or whatever.
You are entitled to your opinion, but given that your arguments seem very weak I will continue to ignore it (except to argue with you :-).
I see. But given the time frame that the string module wouldn't go away any time soon, I guess I have a lot of time to either think about some stronger arguments or to get finally accustomed to that new style of coding. But since we have to keep compatibility with Python 1.5.2 for at least the next two years chances for the latter are bad. Regards and have a nice vacation, Peter

"PF" == Peter Funk <pf@artcom-gmbh.de> writes:
PF> Hmmmm.... May be this is just a matter of taste? Like my PF> preference for '<>' instead of '!='? Personally I still like PF> the old fashinoned form more. Especially, if string.join() or PF> string.split() are involved. Hey cool! I prefer <> over != too, but I also (not surprisingly) strongly prefer string methods over string module functions. TOOWTDI-MA-ly y'rs, -Barry

+1 on deprecating string functions. Every Python book and tutorial (including mine) emphasizes Python's simplicity and lack of Perl-ish redundancy; the more we practice what we preach, the more persuasive this argument is. Greg (who admittedly only has a few thousand lines of Python to maintain)

On Sun, Dec 17, 2000 at 11:25:17AM -0500, Greg Wilson wrote:
+1 on deprecating string functions.
How wonderfully ambiguous ! Do you mean string methods, or the string module? :) FWIW, I agree that in time, the string module should be deprecated. But I also think that 'in time' should be a considerable timespan. Don't deprecate it before everything it provides is available though some other means. Wait a bit longer than that, even, before calling it deprecated -- that scares people off. And then keep it for practically forever (until Py3K) just to support old code. And don't forget to document it 'deprecated' everywhere, not just one minor release note. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Thomas Wouters writes:
FWIW, I agree that in time, the string module should be deprecated. But I also think that 'in time' should be a considerable timespan. Don't deprecate
*If* most functions in the string module are going to be deprecated, that should be done *now*, so that the documentation will include the appropriate warning to users. When they should actually be removed is another matter, and I think Guido is sufficiently aware of their widespread use and won't remove them too quickly -- his creation of Python isn't the reason he's *accepted* as BDFL, it just made it a possibility. He's had to actually *earn* the BDFL position, I think. With regard to converting the standard library to string methods: that needs to be done as part of the deprecation. The code in the library is commonly used as example code, and should be good example code wherever possible.
support old code. And don't forget to document it 'deprecated' everywhere, not just one minor release note.
When Guido tells me exactly what is deprecated, the documentation will be updated with proper deprecation notices in the appropriate places. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

As a longstanding Python advocate and user, I find this thread disturbing, and feel compelled to add a few words:
But with all due respect, there are already _lots_ of places in Python that provide at least two ways to do something already. Why be so strict on this one alone? Consider lambda and def; tuples and lists; map and for loops; the loop else and boolean exit flags; and so on. The notion of Python forcing a single solution is largely a myth. And as someone who makes a living teaching this stuff, I can tell you that none of the existing redundancies prevent anyone from learning Python. More to the point, many of those shiny new features added to 2.0 fall squarely into this category too, and are completely redundant with other tools. Consider list comprehensions and simple loops; extended print statements and sys.std* assignments; augmented assignment statements and simpler ones. Eliminating redundancy at a time when we're also busy introducing it seems a tough goal to sell. I understand the virtues of aesthetics too, but removing the string module seems an incredibly arbitrary application of it.
And to me, this seems the real crux of the matter. For a decade now, the string module _has_ been the right way to do it. And today, half a million Python developers absolutely rely on it as an essential staple in their toolbox. What could possibly be wrong with keeping it around for backward compatibility, albeit as a less recommended option? If almost every Python program ever written suddenly starts issuing warning messages, then I think we do have a problem indeed. Frankly, a Python that changes without regard to its user base seems an ominous thing to me. And keep in mind that I like Python; others will look much less generously upon a tool that seems inclined to rip the rug out from under its users. Trust me on this; I've already heard the rumblings out there. So please: can we keep string around? Like it or not, we're way past the point of removing such core modules at this point. Such a radical change might pass in a future non-backward- compatible Python mutation; I'm not sure such a different system will still be "Python", but that's a topic for another day. All IMHO, of course, --Mark Lutz (http://www.rmi.net~lutz)

[Mark Lutz]
So please: can we keep string around? Like it or not, we're way past the point of removing such core modules at this point.
Of course we're keeping string around. I already said that for backwards compatibility reasons it would not disappear before Py3K. I think there's a misunderstanding about the meaning of deprecation, too. That word doesn't mean to remove a feature. It doesn't even necessarily mean to warn every time a feature is used. It just means (to me) that at some point in the future the feature will change or disappear, there's a new and better way to do it, and that we encourage users to start using the new way, to save them from work later. In my mind, there's no reason to start emitting warnings about every deprecated feature. The warnings are only needed late in the deprecation cycle. PEP 5 says "There must be at least a one-year transition period between the release of the transitional version of Python and the release of the backwards incompatible version." Can we now stop getting all bent out of shape over this? String methods *are* recommended over equivalent string functions. Those string functions *are* already deprecated, in the informal sense (i.e. just that it is recommended to use string methods instead). This *should* (take notice, Fred!) be documented per 2.1. We won't however be issuing run-time warnings about the use of string functions until much later. (Lint-style tools may start warning sooner -- that's up to the author of the lint tool to decide.) Note that I believe Java makes a useful distinction that PEP 5 misses: it defines both deprecated features and obsolete features. *Deprecated* features are simply features for which a better alternative exists. *Obsolete* features are features that are only being kept around for backwards compatibility. Deprecated features may also be (and usually are) *obsolescent*, meaning they will become obsolete in the future. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
Then we're with each other, for suitably large values of "ample" <wink>.
If you're saying that you think the string module is too prominent to ever start deprecating its use, I'm afraid we have a problem.
We may. Time will tell. It needs a conversion tool, else I think it's unsellable.
I happen to like string methods better myself; I don't think that's at issue (except that loads of people apparently don't like "join" as a string method -- idiots <wink>). The issue to me is purely breaking old code someday -- "string" is in very heavy use, and unlike as when deprecating regex in favor of re (either pre or especially sre), string methods aren't orders of magnitude better than the old way; and also unlike regex-vs-re it's not the case that the string module has become unmaintainable (to the contrary, string.py has become trivial). IOW, this one would be unprecedented fiddling.
I agree it would be useful to define these terms, although those particular definitions appear to be missing the most important point from the user's POV (not a one says "going away someday"). A Google search on "java obsolete obsolescent deprecated" doesn't turn up anything useful, so I doubt the usages you have in mind come from Java (it has "deprecated", but doesn't appear to have any well-defined meaning for the others). In keeping with the religious nature of the battle-- and religion offers precise terms for degrees of damnation! --I suggest: struggling -- a supported feature; the initial state of all features; may transition to Anathematized anathematized -- this feature is now cursed, but is supported; may transition to Condemned or Struggling; intimacy with Anathematized features is perilous condemned -- a feature scheduled for crucifixion; may transition to Crucified, Anathematized (this transition is called "a pardon"), or Struggling (this transition is called "a miracle"); intimacy with Condemned features is suicidal crucified -- a feature that is no longer supported; may transition to Resurrected resurrected -- a once-Crucified feature that is again supported; may transition to Condemned, Anathematized or Struggling; although since Resurrection is a state of grace, there may be no point in human time at which a feature is identifiably Resurrected (i.e., it may *appear*, to the unenlightened, that a feature moved directly from Crucified to Anathematized or Struggling or Condemned -- although saying so out loud is heresy).

Guido van Rossum wrote:
This would also help a lot of programmers out there who are stuch with 100k LOCs of Python code using string.py ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (10)
-
barry@digicool.com
-
Fred L. Drake, Jr.
-
Greg Wilson
-
Guido van Rossum
-
M.-A. Lemburg
-
Mark Lutz
-
Neil Schemenauer
-
pf@artcom-gmbh.de
-
Thomas Wouters
-
Tim Peters