idiom for RE matching
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue Jul 24 00:01:08 EDT 2007
En Tue, 24 Jul 2007 00:23:46 -0300, Gordon Airporte <JHoover at fbi.gov>
escribió:
> mik3l3374 at gmail.com wrote:
>> if your search is not overly complicated, i think regexp is not
>> needed. if you want, you can post a sample what you want to search,
>> and some sample input.
>
> I'm afraid it's pretty complicated :-). I'm doing analysis of hand
> histories that online poker sites leave for you. Here's one hand of a
> play money ring game:
>
>
> Full Tilt Poker Game #2042984473: Table Play Chip 344 - 10/20 - Limit
> Hold'em - 18:07:20 ET - 2007/03/22
> Seat 1: grandmarambo (1,595)
> Seat 4: justnoldfoolm (2,430)
> justnoldfoolm posts the small blind of 5
> rickrn posts the big blind of 10
> The button is in seat #1
> *** HOLE CARDS ***
> Dealt to moi [Jd 2c]
> justnoldfoolm bets 10
> [more sample lines]
>
> So I'm picking out all kinds of info about my cards, my stack, my
> betting, my position, board cards, other people's cards, etc. For
> example, this pattern picks out which player bet and how much:
>
> betsRe = re.compile('^(.*) bets ([\d,]*)')
>
> I have 13 such patterns. The files I'm analyzing are just a session's
> worth of histories like this, separated by \n\n\n. All of this
> information needs to be organized by hand or by when it happened in a
> hand, so I can't just run patterns over the whole file or I'll lose
> context.
> (Of course, in theory I could write a single monster expression that
> would chop it all up properly and organize by context, but it would be
> next to impossible to write/debug/maintain.)
But you don't HAVE to use a regular expression. For so simple and
predictable input, using partition or 'xxx in string' is around 4x faster:
import re
betsRe = re.compile('^(.*) bets ([\d,]*)')
def test_partition(line):
who, bets, amount = line.partition(" bets ")
if bets:
return who, amount
def test_re(line):
r = betsRe.match(line)
if r:
return r.group(1), r.group(2)
line1 = "justnoldfoolm bets 10"
assert test_re(line1) == test_partition(line1) == ("justnoldfoolm", "10")
line2 = "Uncalled bet of 20 returned to justnoldfoolm"
assert test_re(line2) == test_partition(line2) == None
py> timeit.Timer("test_partition(line1)", "from __main__ import
*").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[1.1922188434563594, 1.2086988709458808, 1.1956522407177488]
py> timeit.Timer("test_re(line1)", "from __main__ import *").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[5.2871529761464018, 5.2763971398599523, 5.2791986132315714]
As is often the case, a regular expression is NOT the right tool to use in
this case.
--
Gabriel Genellina
More information about the Python-list
mailing list