How to Write grep in Emacs Lisp (tutorial)

Xah Lee xahlee at gmail.com
Tue Feb 8 13:54:05 CET 2011


hi Tass,

Xah wrote:
〈How to Write grep in Emacs Lisp〉
http://xahlee.org/emacs/elisp_grep_script.html

On Feb 8, 12:22 am, Tassilo Horn <tass... at member.fsf.org> wrote:
> Hi Xah,
>
> > • Often, the string i need to search is long, containing 300 hundred
> > chars or more. You could put your search string in a file with grep,
> > but it is not convenient.
>
> Well, you seem to encode the search string in your script, so I don't
> see how that is better than relying on your shell history, which is
> managed automatically, searchable, editable...

not sure what you meant above. I made a mistake above. I meant to say
my search string is few hundred chars. Usually a snippet of html code
that may contain javascript code and also unicode chars.

e.g.

<div class="chtk"><script type="text/
javascript">ch_client="thoucm";ch_width=550;ch_height=90;ch_type="mpu";ch_sid="Chitika
Default";ch_backfill=1;ch_color_site_link="#00C";ch_color_title="#00C";ch_color_border="#FFF";ch_color_text="#000";ch_color_bg="#FFF";</
script><script src="http://scripts.chitika.net/eminimalls/amm.js"
type="text/javascript"></script></div>

> > • grep can't really deal with directories recursively. (there's -r,
> > but then you can't specify file pattern such as “*\.html” (maybe it is
> > possible, but i find it quite frustrating to trial error man page loop
> > with unix tools.))
>
> You can rely on shell globbing, so that grep gets a list of all files in
> all subdirectories.  For example, I can grep all header files of the
> linux kernel using
>
>   % grep FOO /usr/src/linux/**/*.h

say, i want to search in the dir
~/web/xahlee_org/

but no more than 2 levels deep, and only files ending in “.html”. This
is not a toy question. I actually need to do that.

> However, on older systems or on windows, that may produce a too long
> command line.  Alternatively, you can use the -R option to grep a
> directory recursively, and specify an include globbing pattern (or many,
> and/or one or many exclude patterns).
>
>   % grep -R FOO --include='*.h' /usr/src/linux/
>
> You can also use a combination of `find', `xargs' and `grep' (with some
> complications for allowing spaces in file names [-print0 to find]), or,
> when using zsh, you can use
>
>   % zargs /usr/src/linux/**/*.h -- grep FOO
>
> which does all relevant quoting and stuff for you.

problem with find xargs is that they spawn grep for each file, which
becomes too slow to be usable.
To not use xargs but “find ... -exec” instead is possible of course
but i always have problems with the syntax...

> > • unix grep and associated tool bag (sort, wc, uniq, pipe, sed, awk,
> > …) is not flexible. When your need is slightly more complex, unix
> > shell tool bag can't handle it. For example, suppose you need to find
> > a string in HTML file only if the string happens inside another tag.
> > (extending the limit of unix tool bag is how Perl was born in 1987.)
>
> There are many things you can also do with a plain shell script.  I'm
> always amazed how good and concise you can do all sorts of file/text
> manipulation using `zsh' builtins.

never really got into bash for shell scripting... sometimes tried but
the ratio power/syntax isn't tolerable. Knowing perl well pretty much
killed any possible incentive left.

... in late 1990s, my thoughts was that i'll just learn perl well and
never need
to learn other lang or shell for any text processing and sys admin
tasks for
personal use. The thinking is that it'd be efficient in the sense of
not having
to waste time learning multiple langs for doing the same thing. (not
counting
job requirement in a company) So i have written a lot perl scripts for
find &
replace and file management stuff and tried to make them as general as
possible.
lol. But what turns out is that, over the years, for one reason or
another, i
just learned python, php, then in 2007 elisp. Maybe the love for
languages
inevitably won over my one-powerful-coherent-system efficiency
obsession. But
also, i end up rewrote many of my text processing script in each lang.
I guess
part of it is exercise when learning a new lang.

... anyway, i guess am random babbling, but one thing i learned is
that for misc
text processing scripts, the idea of writing a generic flexible
powerful one
once for all just doesn't work, because the coverage are too wide and
tasks
that needs to be done at one time are too specific. (and i think this
makes
sense, because the idea of one language or one generic script for all
is mostly
from ideology, not really out of practical need. If we look at the
real world,
it's almost always a disparate mess of components and systems.)

my text processing scripts ends up being a mess. There are like
several versions
in different langs. A few are general, but most are basically used
once or in a
particular year only. (many of them do more or less the same thing).
When i need to do some
particular task, i found it easier just to write a new one in whatever
lang that's
currently in my brain memory than trying to spend time fishing out and
revisit old scripts.

some concrete example...

e.g. i wrote this general script in 2000, intended to be one-stop for
all find/replace needs

〈Perl: Find & Replace on Multiple Files〉
http://xahlee.org/perl-python/find_replace_perl.html

in 2005, while i was learning python, i wrote (several) versions in
python. e.g.

〈Python: Find & Replace Strings in Unicode Files〉
http://xahlee.org/perl-python/find_replace_unicode.html

it's not a port of the perl code. The python version doesn't have much
features as the perl. But for some reason, i have stopped using the
perl version. Didn't need all that perl version features for some
reason, and when i do need them, i have several other python scripts
that address a particular need. (e.g. one for unicode, one for
multiple pairs in one shot, one for regex one for plain text, one for
find only one for finde+replace, several for find/replace only if
particular condition is met, etc.)

then in 2006, i fell into the emacs hole and start to learn elisp. In
the process, i realized that elisp for text processing is more
powerful than perl or python. Not due to lisp the lang, but more due
to emacs the text-editing environment and system. I tried to explain
this in few places but mostly here:

〈Text Processing: Emacs Lisp vs Perl〉
http://xahlee.org/emacs/elisp_text_processing_lang.html

so, all my new scripts for text processing are in elisp. A few of my
python script i still use, but almost everything is now in elisp.

also, sometimes in 2008, i grew a shell script that process weblogs
using the bunch of unix bag cat grep awk sort uniq. It's about 100
lines. You can see it here:

http://xahlee.org/comp/weblog_process.sh

at one time i wondered, why am i doing it. Didn't i thought that perl
replace all shell scripts?  I gave it a little thought, and i think
the
conclusion is that for this task, the shell script is actually more
efficient
and simpler to write. Possibly if i started with perl for this task
and i might
end up with a good structured code and not necessarily less
efficient... but you
know things in life isn't all planned. It began when i just need a few
lines of
grep to see something in my web log. Then, over the years, added
another line,
another line, then another, all need based. If in any of those time i
thought
“let's scratch this and restart with perl”, that'd be wasting time.
Besides
that, i have some doubt that perl would do a better job for this. With
shell
tools, each line just do one simple thing with piping. To do it in
perl, one'd
have to read-in the huge log file then maintain some data structure
and try to
parse it... too much memory and thinking would involved. If i code
perl by
emulating the shell code line-by-line, then it makes no sense to do it
in perl,
since it's just shell bag in perl.

Also note, this shell script can't be replaced by elisp, because elisp
is not suitable when the file size is large.

well, that's my story — extempore! ☺

 Xah Lee



More information about the Python-list mailing list