Re: [Python-Dev] A house upon the sand
"Tim Peters" <tim.one@home.com> writes:
On the deprecation of the string module: where did this idea come from? I've never seen anything saying that the string module is deprecated.
I thought this, and went looking. I found on http://www.python.org/1.6/, about four fifths of the way down: Changed Modules string - most of this module is deprecated now that strings have methods. This no longer uses the built-in strop module, but takes advantage of the new string methods to provide transparent support for both Unicode and ordinary strings. I hope (and believe) this is Wrong. http://www.python.org/2.0/new-python.html says: The old string module is still around for backwards compatibility, but it mostly acts as a front-end to the new string methods. which is IMHO better. Cheers, M. -- Every day I send overnight packages filled with rabid weasels to people who use frames for no good reason. -- The Usenet Oracle, Oracularity #1017-1
Michael Hudson <mwh21@cam.ac.uk>:
http://www.python.org/2.0/new-python.html says:
The old string module is still around for backwards compatibility,
This still suggests that continuing to use it is frowned upon, though. I think there are still legitimate reasons for using some parts of the string module. For example, if you're one of those stubborn people who refuse to accept that ",".join(foo) is a better way of writing string.join(foo,","). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Greg Ewing writes:
This still suggests that continuing to use it is frowned upon, though.
Backward compatible code is still being written, certainly, and not everything is available as a method (just try ''.letters! ;).
I think there are still legitimate reasons for using some parts of the string module. For example, if you're one of those stubborn people who refuse to accept that ",".join(foo) is a better way of writing string.join(foo,",").
There will never be an excuse for that! .join() should never have been added as a method! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations
Fred L. Drake, Jr. writes:
There will never be an excuse for that! .join() should never have been added as a method!
On the other hand it is not clear to me why "capwords" and "zfill" did not become methods of string objects.
Charles wrote:
On the other hand it is not clear to me why "capwords" and "zfill" did not become methods of string objects.
fwiw, they were both present in my original unicode implementation:
from unicode import unicode a = unicode("hello world") a.capwords() 'Hello World' a.zfill(20) '000000000hello world'
</F>
Fredrik Lundh wrote:
Charles wrote:
On the other hand it is not clear to me why "capwords" and "zfill" did not become methods of string objects.
fwiw, they were both present in my original unicode implementation:
from unicode import unicode a = unicode("hello world") a.capwords() 'Hello World' a.zfill(20) '000000000hello world'
.zfill() is implemented for both strings and Unicode, .capwords() only for Unicode. Both are disabled, though. I talked with Guido about these methods and we decided to leave those two methods disabled in the implementation. They just don't provide much extra generally useful functionality. s.capwords() can be emulated with ' '.join(s.capitalize().split()), BTW. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
.zfill() is implemented for both strings and Unicode, .capwords() only for Unicode. Both are disabled, though. I talked with Guido about these methods and we decided to leave those two methods disabled in the implementation. They just don't provide much extra generally useful functionality.
Exactly. There's a price to pay for adding methods, and I think these two are below the threshold. --Guido van Rossum (home page: http://www.python.org/~guido/)
Charles G Waldman writes:
On the other hand it is not clear to me why "capwords" and "zfill" did not become methods of string objects.
Fredrik Lundh writes:
fwiw, they were both present in my original unicode implementation:
Interesting. I'd have expected capwords() on strings. I wonder why /F's implementation wasn't retained? As for zfill(), I can imagine no one thought it was sufficiently useful to keep around. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations
"GE" == Greg Ewing <greg@cosc.canterbury.ac.nz> writes:
GE> I think there are still legitimate reasons for using some GE> parts of the string module. For example, if you're one of GE> those stubborn people who refuse to accept that ",".join(foo) GE> is a better way of writing string.join(foo,","). Actually, an even better way is COMMA = ',' COMMA.join(foo) To me, that is substantially easier to read than either of the above two alternatives.
"Fred" == Fred L Drake, Jr <fdrake@acm.org> writes:
Fred> There will never be an excuse for that! .join() should Fred> never have been added as a method! Of course, I completely disagree. However, I have noticed that I often define string constants like SPACE = ' ' EMPTY = '' NL = '\n' just so I can write code like NL.join(headers) SPACE.join(names) EMPTY.join(lines) I doubt it's worth it, but maybe having a standard module called stringconstants.py with some of the more obvious choices would make things better? toowtdi-my-foot-ly y'rs, -Barry
[Barry A. Warsaw]
... I have noticed that I often define string constants like
SPACE = ' ' EMPTY = '' NL = '\n'
just so I can write code like
NL.join(headers) SPACE.join(names) EMPTY.join(lines)
I doubt it's worth it, but maybe having a standard module called stringconstants.py with some of the more obvious choices would make things better?
-0. Then I'd expect to see from stringconstants import * at least once, which is at least once too often. Sick trick: SPACE, TAB, NL = " \t\n" sequence-unpacking-is-more-general-than-you-think<wink>-ly y'rs - tim
"TP" == Tim Peters <tim.one@home.com> writes:
TP> at least once, which is at least once too often. Sick trick: TP> SPACE, TAB, NL = " \t\n" Oh, that is perversely delicious! I love it. super-cali-fragi-listic-expi-ali-chomp-chomp-ly y'rs, -Barry
Anyone with an interest in the string functions vs. string methods debate should read this article, which was referenced in comp.lang.python recently: How Non-Member Functions Improve Encapsulation by Scott Meyers http://www.cuj.com/archive/1802/feature.html Mr. Meyers presents some very well-reasoned arguments against the everything-should-be-a-method mentality. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Greg Ewing wrote:
Anyone with an interest in the string functions vs. string methods debate should read this article, which was referenced in comp.lang.python recently:
How Non-Member Functions Improve Encapsulation by Scott Meyers http://www.cuj.com/archive/1802/feature.html
Mr. Meyers presents some very well-reasoned arguments against the everything-should-be-a-method mentality.
Note that the motivation for turning to string methods was that of migrating from strings to Unicode. Adding Unicode support to the strop C module would have caused very complicated code -- methods helped by enabling polymorphic code which is one of the great advantages of writing software for an interface rather than an implementation. Note that functions can make very good use of methods and thus implement polymorphic functionality -- this is not about methods vs. functions it's about methods to enable polymorphic functions. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Mr. Meyers presents some very well-reasoned arguments against the everything-should-be-a-method mentality.
Note that the motivation for turning to string methods was that of migrating from strings to Unicode. Adding Unicode support to the strop C module would have caused very complicated code -- methods helped by enabling polymorphic code which is one of the great advantages of writing software for an interface rather than an implementation.
Of course. Meyers starts by saying that if it needs implementation details it must be a method. This is true (if only for efficiency reasons) for most string methods.
Note that functions can make very good use of methods and thus implement polymorphic functionality -- this is not about methods vs. functions it's about methods to enable polymorphic functions.
Meyers also says that if it needs to be virtual it needs to be a method. Polymorphism is roughly equivalent to virtual in this context, and this alone justifies the move to methods. But join() is special: it is polymorphic in two arguments, and making it a method of the separator argument doesn't help. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> But join() is special: it is polymorphic in two arguments, GvR> and making it a method of the separator argument doesn't GvR> help. What about calling the built-in strjoin(). That avoids the conflict with os.path.join() and strengthens the connection with the intended behavior -- that the return value is a string and the elements are str()-ified. or-use-sjoin()-ly y'rs, -Barry
Guido van Rossum wrote:
But join() is special: it is polymorphic in two arguments, and making it a method of the separator argument doesn't help.
join() is special indeed, but what about the semantics we talked about last year (?)... join(seq, sep) := seq[0] + sep + seq[1] + sep + ... + seq[n] This should fit all uses of join() (accept maybe os.path.join). How about naming the beast concat() with sep defaulting to '' to avoid the problems with os.path.join() ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
join() is special indeed, but what about the semantics we talked about last year (?)...
join(seq, sep) := seq[0] + sep + seq[1] + sep + ... + seq[n]
This should fit all uses of join() (accept maybe os.path.join).
This is much more general than the current definition -- e.g. join(range(5), 0) would yield 10. I'm not too keen on widening the definition this much.
How about naming the beast concat() with sep defaulting to '' to avoid the problems with os.path.join() ?!
Hm... if we can stick to the string semantics this would be okay. But we'd lose the symmetry of split/join. Note that string.join has a default separator of ' ' to roughly match the default behavoir of split. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
join() is special indeed, but what about the semantics we talked about last year (?)...
join(seq, sep) := seq[0] + sep + seq[1] + sep + ... + seq[n]
This should fit all uses of join() (accept maybe os.path.join).
This is much more general than the current definition -- e.g. join(range(5), 0) would yield 10. I'm not too keen on widening the definition this much.
No, if n is the length of the sequence, the above definition would calculate 10 and then raise IndexError :-) ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
mal wrote:
Mr. Meyers presents some very well-reasoned arguments against the everything-should-be-a-method mentality.
Note that the motivation for turning to string methods was that of migrating from strings to Unicode. Adding Unicode support to the strop C module would have caused very complicated code -- methods helped by enabling polymorphic code which is one of the great advantages of writing software for an interface rather than an implementation.
as Alex pointed out on comp.lang.python, the first if-clause in Meyers algorithm is: if (f needs to be virtual) make f a member function of C; which probably covers the join case pretty well...
Note that functions can make very good use of methods and thus implement polymorphic functionality -- this is not about methods vs. functions it's about methods to enable polymorphic functions.
a bit further down, the algorithm goes: else if (f can be implemented via C's public interface) make f a non-member function; which covers all remaining non-trivial functions in the string module... (okay, you could get around this by renaming the method to something a bit more annoying; let's say the "public interface" contains a virtual method called "use_myself_to_join_sequence", in which "join" could be a non-member convenience function using nothing but the public interface. but that's just silly...) ::: fwiw, I still think that Meyers article cannot be fully applied to Python, since he's ignoring the case: else if (f can be implemented via C's public interface, but may have to be overridden by a subclass to C) make f a member function; this isn't a problem in a language that supports function over- loading (see the interfaces/packages section in meyers article), but as we all know, Python doesn't... </F>
Fredrik Lundh <fredrik@effbot.org>:
he's ignoring the case:
else if (f can be implemented via C's public interface, but may have to be overridden by a subclass to C)
Isn't that the same as "f needs to be virtual"? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Greg Ewing wrote:
he's ignoring the case:
else if (f can be implemented via C's public interface, but may have to be overridden by a subclass to C)
Isn't that the same as "f needs to be virtual"?
in Python, sure. in C++, it depends. thanks to function overloading, you can have polymorphism without having to make everything virtual... </F>
Anyone with an interest in the string functions vs. string methods debate should read this article, which was referenced in comp.lang.python recently:
How Non-Member Functions Improve Encapsulation by Scott Meyers http://www.cuj.com/archive/1802/feature.html
Mr. Meyers presents some very well-reasoned arguments against the everything-should-be-a-method mentality.
Just skimmed it -- seems to be a good argument. (Also for why capwords and zfill aren't worth adding as methods. :-) It would be easy to propose join() as a built-in, and this looks attractive, if it weren't for the existence of os.path.join(). Some people probably write ``from os.path import join'' and once join() is a built-in function, this may be confusing for readers who miss that a different join is imported. (I'm not saying that it breaks existing code -- just that it's confusing to have two joins.) But maybe this is a moot argument since string.join() is already a function. I wonder what Java enthusiasts would say after reading Meyers' article: there *are* no non-member non-friend functions in Java. The best approximation is static methods, but they still live in a class... --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido said:
It would be easy to propose join() as a built-in, and this looks attractive, if it weren't for the existence of os.path.join().
and
join() is special: it is polymorphic in two arguments, and making it a method of the separator argument doesn't help.
How about an operator, then? We already have % rather than a string.format or some such. Presumably this is because Guido thought it was such a handy thing that it should be instantly available at our fingertips. I think a similar argument could be made for string.join, and also its inverse string.split. So, how about: x & s == string.join(x, s) s1 | s2 == string.split(s1, s2) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
How about an operator, then?
We already have % rather than a string.format or some such. Presumably this is because Guido thought it was such a handy thing that it should be instantly available at our fingertips. I think a similar argument could be made for string.join, and also its inverse string.split.
So, how about:
x & s == string.join(x, s)
s1 | s2 == string.split(s1, s2)
I don't see the mnemonic. (For %, the mnemonic is clear: %s, %d etc.) --Guido van Rossum (home page: http://www.python.org/~guido/)
"GE" == Greg Ewing <greg@cosc.canterbury.ac.nz> writes:
GE> Anyone with an interest in the string functions vs. GE> string methods debate should read this article, which GE> was referenced in comp.lang.python recently: | How Non-Member Functions Improve Encapsulation | by Scott Meyers | http://www.cuj.com/archive/1802/feature.html GE> Mr. Meyers presents some very well-reasoned arguments GE> against the everything-should-be-a-method mentality. Remember that Meyers is talking about C++ here, where there's a distinction in the `friendliness' of a function. He himself says that the choice is between non-member non-friend functions and member functions (i.e. methods). Friend functions have the same dependency on implementation changes as methods. In Python, there is no such friend distinction, although you can come close to faking it with private names. So the argument based on degrees of encapsulation doesn't hold water in Python. in-python-everyone's-your-friend-and-thank-guido-for-that-ly y'rs, -Barry
Remember that Meyers is talking about C++ here, where there's a distinction in the `friendliness' of a function. He himself says that the choice is between non-member non-friend functions and member functions (i.e. methods). Friend functions have the same dependency on implementation changes as methods.
In Python, there is no such friend distinction, although you can come close to faking it with private names. So the argument based on degrees of encapsulation doesn't hold water in Python.
in-python-everyone's-your-friend-and-thank-guido-for-that-ly y'rs,
I disagree. While non-methods *can* look inside an implementation, it's still implied that they break encapsulation if they do. (This may explain why you like to use private names -- they make the encapsulation explicit. For me, it exists even when it is implicit.) --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> I disagree. While non-methods *can* look inside an GvR> implementation, it's still implied that they break GvR> encapsulation if they do. (This may explain why you like to GvR> use private names -- they make the encapsulation explicit. GvR> For me, it exists even when it is implicit.) True, but it still means that changing the implementation has an unknown effect on existing code, because there /could/ be a lot of breakage. That can be (and has been! :) written off as the fault of the 3rd party author, but still, it isn't enforcible like it is in C++, where, if it ain't a friend, it's not going to break. In C++ you can count the number of functions that will break to determine the degree of encapsulation, but you can't in Python. Witness the recent change to cgi.FieldStorage. We removed the self.lines attribute because it was a memory bloat, and we /thought/ that it wouldn't affect existing code, but there was no way to really know for sure because the intent of the cgi module author wasn't at all clear. Was self.lines intended to be public but just accidentally undocumented? Was it a debugging tool? As an aside, this argument's not really true with built-in types such as strings because the internal implementation isn't publically available to the Python programmer. -Barry
---------- My feelings about this topic have been expressed adequately by others. I'm not sure what to say about the code that worked in 1.5.2 but broke in 2.0 -- if these were book examples, I suspect that the book was probably using undocumented (if commonly seen) features, line multi-argument append() or connect(). As someone else said: Change happens. Get over it. :-)
"Tim Peters" <tim.one@home.com> writes:
On the deprecation of the string module: where did this idea come from? I've never seen anything saying that the string module is deprecated.
Michael Hudson:
I thought this, and went looking. I found on http://www.python.org/1.6/, about four fifths of the way down:
Changed Modules
string - most of this module is deprecated now that strings have methods. This no longer uses the built-in strop module, but takes advantage of the new string methods to provide transparent support for both Unicode and ordinary strings.
I hope (and believe) this is Wrong.
Because of its importance, the deprecation time of the string module will be longer than that of most deprecated modules. I expect it won't be removed until Python 3000.
http://www.python.org/2.0/new-python.html says:
The old string module is still around for backwards compatibility, but it mostly acts as a front-end to the new string methods.
which is IMHO better.
Yes. And Greg Ewing:
I think there are still legitimate reasons for using some parts of the string module. For example, if you're one of those stubborn people who refuse to accept that ",".join(foo) is a better way of writing string.join(foo,",").
This has been discussed -- jut note that continuing to use the string module *is* frowned upon, and such stubborn code will get its just desserts when Py3K arrives. I suggest adding the following to the string module's documentation (rather than marking it as explicitly deprecated): This module exists for backwards compatibility only. It will eventually be deprecated and its use should be avoided in new code. I also suggest that someone go through the standard library and get rid of all uses of the string module. (What to do with string.letters c.s.?) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I suggest adding the following to the string module's documentation (rather than marking it as explicitly deprecated):
This module exists for backwards compatibility only. It will eventually be deprecated and its use should be avoided in new code.
I also suggest that someone go through the standard library and get rid of all uses of the string module. (What to do with string.letters c.s.?)
We will need some place to put constants and other non-method based string related functions (much like the Unicode database module which serves a similar purpose)... the ideal complement would be a stringdata module which provides the same interfaces as the unicodedata module if reasonably possible (we'd get into encoding problems again...) ! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[Guido]
... Because of its importance, the deprecation time of the string module will be longer than that of most deprecated modules. I expect it won't be removed until Python 3000.
I see nothing in the 2.0 docs, code, or "what's new" web pages saying that it's deprecated. So I don't think you can even start the clock on this one before 2.1 (a fuzzy stmt on the web page for the unused 1.6 release doesn't count ...).
... This has been discussed -- jut note that continuing to use the string module *is* frowned upon, and such stubborn code will get its just desserts when Py3K arrives.
I suggest adding the following to the string module's documentation (rather than marking it as explicitly deprecated):
This module exists for backwards compatibility only. It will eventually be deprecated and its use should be avoided in new code.
I also suggest that someone go through the standard library and get rid of all uses of the string module. (What to do with string.letters c.s.?)
They have to be in a module. They've always been in the string module. Common sense thus dictates "leave them in the string module" <0.3 wink>. change-for-the-sake-of-irritation-is-arguably-irritating-ly y'rs - tim
On 29 November 2000, Tim Peters said:
[Guido]
... Because of its importance, the deprecation time of the string module will be longer than that of most deprecated modules. I expect it won't be removed until Python 3000.
I see nothing in the 2.0 docs, code, or "what's new" web pages saying that it's deprecated. So I don't think you can even start the clock on this one before 2.1 (a fuzzy stmt on the web page for the unused 1.6 release doesn't count ...).
FWIW, I would argue against *ever* removing (much less "deprecating", ie. threatening to remove) the string module. To a rough approximation, every piece of Python code in existence code prior to Python 1.6 depends on the string module. I for one do not want to have to change all occurences of string.foo(x) to x.foo() -- it just doesn't buy enough to make it worth changing all that code. Not only does the amount of code to change mean the change would be non-trivial, it's not always the right thing, especially if you happen to be one of the people who dislikes the "delim.join(list)" idiom. (I'm still undecided.) Greg
I fully support Greg Wards view. If string was removed I'd not update the old code but add in my own string module. Given the effort you guys went to to keep the C extension protocol the same (in the context of crashing on importing a 1.5 dll into 2.0) I amazed you think that string could be removed... Could you split the lib into blessed and backward compatibility sections? Then by some suitable mechanism I can choose the compatibility I need? Oh and as for join obviously a method of a list... ['thats','better'].join(' ') Barry
Barry Scott wrote:
I fully support Greg Wards view. If string was removed I'd not update the old code but add in my own string module.
Given the effort you guys went to to keep the C extension protocol the same (in the context of crashing on importing a 1.5 dll into 2.0) I amazed you think that string could be removed...
Could you split the lib into blessed and backward compatibility sections? Then by some suitable mechanism I can choose the compatibility I need?
Oh and as for join obviously a method of a list...
['thats','better'].join(' ')
The above is the way as it is defined for JavaScript. But in JavaScript, the list join method performs an implicit str() on the list elements. As has been discussed some time ago, Python's lists are too versatile to justify a string-centric method. Marc André pointed out that one could do a reduction with the semantics of the "+" operator, but Guido said that he wouldn't like to see [2, 3, 5].join(7) being reduced to 2+7+3+7+5 == 24. That could only be avoided if there were a way to distinguish numeric addition from concatenation. but-I-could-live-with-it - ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
participants (12)
-
Barry Scott
-
barry@digicool.com
-
Charles G Waldman
-
Christian Tismer
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Greg Ewing
-
Greg Ward
-
Guido van Rossum
-
M.-A. Lemburg
-
Michael Hudson
-
Tim Peters