hachoir-regex 1.0 released

haypo victor.stinner at haypocalc.com
Fri Jun 29 01:10:31 CEST 2007


hachoir-regex is a Python library for regular expression (regex or
regexp) manupulation. You can use a|b (or) and a+b (and) operators.
Expressions are optimized during the construction: merge ranges,
simplify repetitions, etc. It also contains a class for pattern
matching allowing to search multiple strings and regex at the same
time.

Website: http://hachoir.org/wiki/hachoir-regex

Regex examples
==============

Different methods to create regex:

   >>> from hachoir_regex import parse, createRange, createString
   >>> createString("bike") + createString("motor")
   <RegexString 'bikemotor'>
   >>> parse('(foo|fooo|foot|football)')
   <RegexAnd 'foo(|[ot]|tball)'>
   >>> regex = createString("1") | createString("3"); regex
   <RegexRange '[13]'>
   >>> regex |= createRange("2", "4"); regex
   <RegexRange '[1-4]'>

As you can see, you can use classic "a|b" (or) and "a+b" (and) Python
operators, and expressions are optimized for fast pattern matching.

Regex using repetition:

   >>> parse("(a{2,}){3,4}")
   <RegexRepeat 'a{6,}'>
   >>> parse("(a*|b)*")
   <RegexRepeat '[ab]*'>
   >>> parse("(a*|b|){4,5}")
   <RegexRepeat '(a+|b){0,5}'>

Compute minimum and maximum length of matched pattern:

   >>> r=parse('(cat|horse)')
   >>> r.minLength(), r.maxLength()
   (3, 5)
   >>> r=parse('(a{2,}|b+)')
   >>> r.minLength(), r.maxLength()
   (1, None)

Pattern maching
===============

Use PatternMaching if you would like to match multiple strings and
regex at the same time:

    >>> from hachoir_regex import PatternMatching
    >>> p = PatternMatching()
    >>> p.addString("a")
    >>> p.addString("b")
    >>> p.addRegex("[cd]")
    >>> for start, end, item in p.search("a b c d"):
    ...    print "%s..%s: %s" % (start, end, item)
    ...
    0..1: a
    2..3: b
    4..5: [cd]
    6..7: [cd]

You can also attach user data to a pattern:

    >>> p = PatternMatching()
    >>> p.addString("un", 1)
    >>> p.addString("deux", 2)
    >>> for start, end, item in p.search("un deux"):
    ...    print "%r at %s: userdata=%r" % (item, start, item.user)
    ...
    <StringPattern 'un'> at 0: userdata=1
    <StringPattern 'deux'> at 3: userdata=2

Download hachoir-regex on Python cheeseshop, it's distributed under
GNU GPL license.

Victor Stinner aka haypo
http://hachoir.org/



More information about the Python-announce-list mailing list