a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')
Steve Holden
steve at holdenweb.com
Thu Nov 25 10:25:52 EST 2010
- Previous message (by thread): a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')
- Next message (by thread): a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
On 11/24/2010 10:46 PM, Phlip wrote:
> HypoNt:
>
> I need to turn a human-readable list into a list():
>
> print re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and
> c').groups()
>
> That currently returns ('c',). I'm trying to match "any word \w+
> followed by a comma, or a final word preceded by and."
>
> The match returns 'a, bbb, and c', but the groups return ('bbb', 'c').
> What do I type for .groups() to also get the 'a'?
>
> Please go easy on me (and no RTFM!), because I have only been using
> regular expressions for about 20 years...
A kind of lazy way just uses a pattern for the separators to fuel a call
to re.split(). I assume that " and " and " , " are both acceptable in
any position:
The best I've been able to do so far (due to split's annoying habit of
including the matches of any groups in the pattern I have to throw away
every second element) is:
>>> re.split("\s*(,|and)?\s*", 'whatever a, bbb, and c')[::2]
['whatever', 'a', 'bbb', '', 'c']
That empty string is because of the ", and" which isn't recognise as a
single delimiter.
A parsing package might give you better results.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon 2011 Atlanta March 9-17 http://us.pycon.org/
See Python Video! http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
- Previous message (by thread): a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')
- Next message (by thread): a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Python-list
mailing list