How to Write grep in Emacs Lisp (tutorial)
xahlee at gmail.com
Tue Feb 8 13:54:05 CET 2011
〈How to Write grep in Emacs Lisp〉
On Feb 8, 12:22 am, Tassilo Horn <tass... at member.fsf.org> wrote:
> Hi Xah,
> > • Often, the string i need to search is long, containing 300 hundred
> > chars or more. You could put your search string in a file with grep,
> > but it is not convenient.
> Well, you seem to encode the search string in your script, so I don't
> see how that is better than relying on your shell history, which is
> managed automatically, searchable, editable...
not sure what you meant above. I made a mistake above. I meant to say
my search string is few hundred chars. Usually a snippet of html code
<div class="chtk"><script type="text/
> > • grep can't really deal with directories recursively. (there's -r,
> > but then you can't specify file pattern such as “*\.html” (maybe it is
> > possible, but i find it quite frustrating to trial error man page loop
> > with unix tools.))
> You can rely on shell globbing, so that grep gets a list of all files in
> all subdirectories. For example, I can grep all header files of the
> linux kernel using
> % grep FOO /usr/src/linux/**/*.h
say, i want to search in the dir
but no more than 2 levels deep, and only files ending in “.html”. This
is not a toy question. I actually need to do that.
> However, on older systems or on windows, that may produce a too long
> command line. Alternatively, you can use the -R option to grep a
> directory recursively, and specify an include globbing pattern (or many,
> and/or one or many exclude patterns).
> % grep -R FOO --include='*.h' /usr/src/linux/
> You can also use a combination of `find', `xargs' and `grep' (with some
> complications for allowing spaces in file names [-print0 to find]), or,
> when using zsh, you can use
> % zargs /usr/src/linux/**/*.h -- grep FOO
> which does all relevant quoting and stuff for you.
problem with find xargs is that they spawn grep for each file, which
becomes too slow to be usable.
To not use xargs but “find ... -exec” instead is possible of course
but i always have problems with the syntax...
> > • unix grep and associated tool bag (sort, wc, uniq, pipe, sed, awk,
> > …) is not flexible. When your need is slightly more complex, unix
> > shell tool bag can't handle it. For example, suppose you need to find
> > a string in HTML file only if the string happens inside another tag.
> > (extending the limit of unix tool bag is how Perl was born in 1987.)
> There are many things you can also do with a plain shell script. I'm
> always amazed how good and concise you can do all sorts of file/text
> manipulation using `zsh' builtins.
never really got into bash for shell scripting... sometimes tried but
the ratio power/syntax isn't tolerable. Knowing perl well pretty much
killed any possible incentive left.
... in late 1990s, my thoughts was that i'll just learn perl well and
to learn other lang or shell for any text processing and sys admin
personal use. The thinking is that it'd be efficient in the sense of
to waste time learning multiple langs for doing the same thing. (not
job requirement in a company) So i have written a lot perl scripts for
replace and file management stuff and tried to make them as general as
lol. But what turns out is that, over the years, for one reason or
just learned python, php, then in 2007 elisp. Maybe the love for
inevitably won over my one-powerful-coherent-system efficiency
also, i end up rewrote many of my text processing script in each lang.
part of it is exercise when learning a new lang.
... anyway, i guess am random babbling, but one thing i learned is
that for misc
text processing scripts, the idea of writing a generic flexible
once for all just doesn't work, because the coverage are too wide and
that needs to be done at one time are too specific. (and i think this
sense, because the idea of one language or one generic script for all
from ideology, not really out of practical need. If we look at the
it's almost always a disparate mess of components and systems.)
my text processing scripts ends up being a mess. There are like
in different langs. A few are general, but most are basically used
once or in a
particular year only. (many of them do more or less the same thing).
When i need to do some
particular task, i found it easier just to write a new one in whatever
currently in my brain memory than trying to spend time fishing out and
revisit old scripts.
some concrete example...
e.g. i wrote this general script in 2000, intended to be one-stop for
all find/replace needs
〈Perl: Find ＆ Replace on Multiple Files〉
in 2005, while i was learning python, i wrote (several) versions in
〈Python: Find ＆ Replace Strings in Unicode Files〉
it's not a port of the perl code. The python version doesn't have much
features as the perl. But for some reason, i have stopped using the
perl version. Didn't need all that perl version features for some
reason, and when i do need them, i have several other python scripts
that address a particular need. (e.g. one for unicode, one for
multiple pairs in one shot, one for regex one for plain text, one for
find only one for finde+replace, several for find/replace only if
particular condition is met, etc.)
then in 2006, i fell into the emacs hole and start to learn elisp. In
the process, i realized that elisp for text processing is more
powerful than perl or python. Not due to lisp the lang, but more due
to emacs the text-editing environment and system. I tried to explain
this in few places but mostly here:
〈Text Processing: Emacs Lisp vs Perl〉
so, all my new scripts for text processing are in elisp. A few of my
python script i still use, but almost everything is now in elisp.
also, sometimes in 2008, i grew a shell script that process weblogs
using the bunch of unix bag cat grep awk sort uniq. It's about 100
lines. You can see it here:
at one time i wondered, why am i doing it. Didn't i thought that perl
replace all shell scripts? I gave it a little thought, and i think
conclusion is that for this task, the shell script is actually more
and simpler to write. Possibly if i started with perl for this task
and i might
end up with a good structured code and not necessarily less
efficient... but you
know things in life isn't all planned. It began when i just need a few
grep to see something in my web log. Then, over the years, added
another line, then another, all need based. If in any of those time i
“let's scratch this and restart with perl”, that'd be wasting time.
that, i have some doubt that perl would do a better job for this. With
tools, each line just do one simple thing with piping. To do it in
have to read-in the huge log file then maintain some data structure
and try to
parse it... too much memory and thinking would involved. If i code
emulating the shell code line-by-line, then it makes no sense to do it
since it's just shell bag in perl.
Also note, this shell script can't be replaced by elisp, because elisp
is not suitable when the file size is large.
well, that's my story — extempore! ☺
More information about the Python-list