regex: multiple matching for one string
Scott David Daniels
Scott.Daniels at Acm.Org
Fri Jul 24 11:54:32 EDT 2009
rurpy at yahoo.com wrote:
> Nick Dumas wrote:
>> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
>>> scriptlearner at gmail.com wrote:
>>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>>>> will like to take out the values (valuea, valueb, and valuec). How do
>>>> I do that in Python? The group method will only return the matched
>>>> part. Thanks.
>>>>
>>>> p = re.compile('#a=*;b=*;c=*;')
>>>> m = p.match(line)
>>>> if m:
>>>> print m.group(),
>>> IMHO a regex for this is overkill, a combination of string methods such
>>> as split and find should suffice.
>
> You're saying that something like the following
> is better than the simple regex used by the OP?
> [untested]
> values = []
> parts = line.split(';')
> if len(parts) != 4: raise SomeError()
> for p, expected in zip (parts[-1], ('#a','b','c')):
> name, x, value = p.partition ('=')
> if name != expected or x != '=':
> raise SomeError()
> values.append (value)
> print values[0], values[1], values[2]
I call straw man: [tested]
line = "#a=valuea;b=valueb;c=valuec;"
d = dict(single.split('=', 1)
for single in line.split(';') if single)
d['#a'], d['b'], d['c']
If you want checking code, add:
if len(d) != 3:
raise ValueError('Too many keys: %s in %r)' % (
sorted(d), line))
> Blech, not in my book. The regex checks the
> format of the string, extracts the values, and
> does so very clearly. Further, it is easily
> adapted to other similar formats, or evolutionary
> changes in format. It is also (once one is
> familiar with regexes -- a useful skill outside
> of Python too) easier to get right (at least in
> a simple case like this.)
The posted regex doesn't work; this might be homework, so
I'll not fix the two problems. The fact that you did not
see the failure weakens your claim of "does so very clearly."
--Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list