
It just occurred to me as I was replying to a request on the main list, that Python's text handling capabilities could be a bit better than they are. This will probably not come as a revelation to many of you, but I finally put it together with the standard argument against beefing things up One fix would be to add regular expressions to the language core and have special syntax for them, as Perl has done. However, I don't like this solution because Python is a general-purpose language, and regular expressions are used for the single application domain of text processing. For other application domains, regular expressions may be of no interest, and you might want to remove them to save memory and code size. and the observation that Python does support some builtin objects and syntax that are fairly specific to some much more restricted application domains than text processing. I stole the above quote from Andrew Kuchling's Python Warts page, which I also happened to read earlier today. What AMK says makes perfect sense until you examine some of the other things that are in the language, like the Ellipsis object and complex numbers. If I recall correctly both were added as a result of the NumPy package development. I have nothing against ellipses or complex numbers. They are fine first class objects that should remain in the language. But I have never used either one in my day-to-day work. On the other hand, I read files and manipulate them with regular expressions all the time. I rather suspect that more people use Python for some sort of text processing than any other single application domain. Python should be good at it. While I don't want to turn Python into Perl, I would like to see it do a better job of what most people probably use the language for. Here is a very short list of things I think need attention: 1. When using something like the simple file i/o idiom for line in f.readlines(): dofunstuff(line) the programmer should not have to care how big the file is. It should just work in a reasonably efficient manner without gobbling up all of memory. I realize this may require some change to the syntax of the common idiom. 2. The re module needs to be sped up, if not to catch up with Perl, then to catch up with the deprecated regex module. Depending how far people want to go with things, adding some language syntax to support regular expressions might be in order. I don't see that as compelling as adding complex numbers however. Another possibility, now that Barry Warsaw has opened the floodgates, is to add regular expression methods to strings. 3. I've not yet used it, but I am told the pattern matching in Marc-Andre Lemburg's mxTextTools (http://starship.python.net/crew/lemburg/) is both powerful and efficient (though it certainly appears complex). Perhaps it deserves consideration for incorporation into the core Python distribution. I'm sure other people will come up with other suggestions. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...