Most important text processing examples
hat at se-46.wpa.wtb.tue.nl
Thu Jul 5 05:22:10 EDT 2001
In comp.lang.python, you wrote:
>Always so impressed by the Python discussion group, I figure I will ask
>its advice. I have recently contracted to write a book called _Text
>Processing in Python_ for Sybex. The proposal and outline can be
>skimmed at <http://gnosis.cx/tpip.proposal>, if you want to see what I
>am aiming at (I've made minor adjustments as I've started writing, and
>will certainly make more as I go along).
Although not really happy about providing clues to some one that aims to make
money from my suggestions, well, here it goes.
My normal text processing problems are 2-fold
- Process structured text, as in a simple language (thus, scanning, parsing,
checking and code generation). I use Spark for the first 2 tasks, and pseudo
ad-hoc for the latter two (but I know that I need a more powerful approach,
especially for code generation)
PS Spark uses the Earley Parsing Algorithm, which covers all LR and LL
parsing techniques and more
- Perform interaction with other programs (either existing other programs, in a
PyExpect-like way (there is not a PyExpect yet, is there?)
For new programs, I will use XML techniques to exchange data.
Almost eveyrthing I do involves executing multiple programs on one or more
Linux machines, that interact with each other through TCP/IP.
>thing where my focus will differ somewhat from the ASPN cookbooks is
>that I want to keep the emphasis on the problem itself, and make the
This sounds good. I have learned to stay away from optimization as much as
possible. You may want to add something along the lines of 'having a good
cleanly designed algorithm beats any optimization'.
PS Speed is always over-emphasized in my opinion. If you want speed, go to
assembler or C. Don't try to do anything fast in an interpreted language.
One of the 'big' problems of cookbooks is that they are only cooking small
details of the puzzle. By making a good design of the solution for the entire
problem, low level details like the optimal way of solving string manipulation
is not important any more, because the good design makes up for the non-optimal
solution of the details.
In my opinion, this is really lacking in 90% of the computer books.
(ok, I am biased. I have a degree in computer science and software engineering,
and will defend my PhD on design on November 6, so I am not your average
Constructing a computer program is like writing a painting
More information about the Python-list