[Tutor] Regular Expressions part 2

Michael P. Reilly arcege@shore.net
Fri, 19 May 2000 19:52:47 -0400 (EDT)


> I think you need to post your code.
> 
> I tried what you said and it worked for me:
> 
> >>> import re
> >>> g = "bbbbbbkkdkdkdkdk"
> >>> r = re.findall('b',g)
> >>> type(r)
> <type 'list'>
> >>> r
> ['b', 'b', 'b', 'b', 'b', 'b']
> >>> len(r)
> 6
> >>> r[0]
> 'b'
> >>>

Using findall is well and good.. but it doesn't give context to what is
found, just "these are the strings found".  What is probably more
useful is retrieving the match objects and using those:

Python 1.5.2 (#2, May 11 1999, 17:14:37)  [GCC 2.7.2.1] on freebsd3
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import re
>>> s = 'bbbbbbkkdkdkdkdk'
>>> r = re.findall('b', s)
>>> r
['b', 'b', 'b', 'b', 'b', 'b']
>>> type(r[0])
<type 'string'>
>>> p = re.compile('b')
>>> i = 0
>>> while i < len(s):
...   m = p.match(s, i)
...   if not m: # no match found
...     break
...   (st, ed) = (m.start(0), m.end(0))
...   print (st, ed), m.string[st:ed]
...   i = ed
...
(0, 1) b
(1, 2) b
(2, 3) b
(3, 4) b
(4, 5) b
(5, 6) b
>>>

The first returns just strings, going through the loop can get the
positions inside the strings.  It's fairly easy to abstract this into
something like findall, but I'll leave that as an exercise.

  -Arcege

-- 
------------------------------------------------------------------------
| Michael P. Reilly, Release Engineer | Email: arcege@shore.net        |
| Salem, Mass. USA  01970             |                                |
------------------------------------------------------------------------