How to write simple code to match strings?
beginner
zyzhu2000 at gmail.com
Wed Dec 30 02:07:15 EST 2009
Hi Steve,
On Dec 30, 12:01 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Tue, 29 Dec 2009 21:01:05 -0800, beginner wrote:
> > Hi All,
>
> > I run into a problem. I have a string s that can be a number of
> > possible things. I use a regular expression code like below to match and
> > parse it. But it looks very ugly. Also, the strings are literally
> > matched twice -- once for matching and once for extraction -- which
> > seems to be very slow. Is there any better way to handle this?
>
> The most important thing you should do is to put the regular expressions
> into named variables, rather than typing them out twice. The names
> should, preferably, describe what they represent.
>
> Oh, and you should use raw strings for regexes. In this particular
> example, I don't think it makes a difference, but if you ever modify the
> strings, it will!
>
> You should get rid of the unnecessary double calls to match. That's just
> wasteful. Also, since re.match tests the start of the string, you don't
> need the leading ^ regex (but you do need the $ to match the end of the
> string).
>
> You should also fix the syntax error, where you have "elif s=='-'"
> instead of "elif s='-'".
>
> You should consider putting the cheapest test(s) first, or even moving
> the expensive tests into a separate function.
>
> And don't be so stingy with spaces in your source code, it helps
> readability by reducing the density of characters.
>
> So, here's my version:
>
> def _re_match_items(s):
> # Setup some regular expressions.
> COMMON_RE = r'\$?([-+]?[0-9,]*\.?[0-9,]+)'
> FLOAT_RE = COMMON_RE + '$'
> BRACKETED_FLOAT_RE = r'\(' + COMMON_RE + r'\)$'
> DATE_RE = r'\d{1,2}-\w+-\d{1,2}$'
> mo = re.match(FLOAT_RE, s) # "mo" short for "match object"
> if mo:
> return float(mo.group(1).replace(',', ''))
> # Otherwise mo will be None and we go on to the next test.
> mo = re.match(BRACKETED_FLOAT_RE, s)
> if mo:
> return -float(mo.group(1).replace(',', ''))
> if re.match(DATE_RE, s):
> return dateutil.parser.parse(s, dayfirst=True)
> raise ValueError("bad string can't be matched")
>
> def convert_data_item(s):
> if s = '-':
> return None
> else:
> try:
> return _re_match_items(s)
> except ValueError:
> print "Unrecognized format %s" % s
> return s
>
> Hope this helps.
>
> --
> Steven
This definitely helps.
I don't know if it should be s=='-' or s='-'. I thought == means equal
and = means assignment?
Thanks again,
G
More information about the Python-list
mailing list