agrepy 1.1: Python port of agrep (string matching with errors)
Michael Wise Remove X to activate
Fri, 01 Oct 99 09:57:36 GMT
Agrep, written by Sun Wu and Udi Manber (described in "Fast Text Searching
Allowing Errors", CACM, 35(10), 1992), is a suite of C functions which
together perform various string matching operations under UNIX (i.e.
specified at the commandline). For example, agrep is able find matches
despite differences due to American versus British spelling
agrepy takes agrep from its user level setting and makes it available as a
Python module. Specifically, what this port implements are those functions
from agrep relating to inexact matching of text strings which contain no
metacharacters. For example, agrep is able find matches despite differences
due to American versus British spelling.
apgrepy also extends agrep in the following sense: given a pattern and a
text string, agrepy and returns a list of all, non-overlapping pairs of text
indexes such that the start index is the first character of the text that
matches the earliest pattern character exactly, and the end is the last text
character that matches exactly. The end index of each match is 1 place
greater than the actual index so it can be immediately used to construct a
slice. (agrep itself is content with recognizing that the input line
contains a match, but does not say where or differentiate multiple matches.)
agrepy 1.0 (July 1999) had an end-of-text bug in the short pattern module,
while both the long and short pattern modules had problems at times finding
the precise ends of a match. There also can be a genuine ambiguity
specifying the ends of match which has equal number of errs. The
methodology now is that the original algorithms find the ends of matches
fairly accurately, and a separate, recursive function firms up the end
The authors of the original programs are Sun Wu and Udi Manber, as described
above. See below for the copyright on the orginal algorithms. Other portions
have been written by Michael J. Wise and are copyright under the terms of
Open Source Definition (http://www.opensource.org/osd.html).
This material was developed by Sun Wu and Udi Manber
at the University of Arizona, Department of Computer Science.
Permission is granted to copy this software, to redistribute it
on a nonprofit basis, and to use it for any purpose, subject to
the following restrictions and understandings.
1. Any copy made of this software must include this copyright notice
2. All materials developed as a consequence of the use of this
software shall duly acknowledge such use, in accordance with the usual
standards of acknowledging credit in academic research.
3. The authors have made no warranty or representation that the
operation of this software will be error-free or suitable for any
application, and they are under under no obligation to provide any
services, by way of maintenance, update, or otherwise. The software
is an experimental prototype offered on an as-is basis.
4. Redistribution for profit requires the express, written permission
of the authors.
and finds the start position.
The code for port is available from:
Dr Michael J. Wise
Bristol-Myers Squibb Senior Research Fellow
Pembroke College | Centre for Communications Systems Research (CCSR)
Cambridge CB2 1RF | 10 Downing St
England | Cambridge CB2 3DS
Telephone: (+44 1223) 740 121
FAX: (+44 1223) 740 099
Internet: M.Wise1@ccsr.cam.ac.uk (remove 1 to use the address)
Research Visitor at the European Bioinformatics Institute
"If I'm not for myself, who is for me?
But if I am only for myself, what am I?
AND IF NOT NOW, WHEN?"
- Sayings of the Fathers (Hillel)
(emphasis my own)
1.1</A> - String matching in the presence of a small number of
----------- comp.lang.python.announce (moderated) ----------
Article Submission Address: email@example.com
Python Language Home Page: http://www.python.org/
Python Quick Help Index: http://www.python.org/Help.html