Newbie Question: Regular Expressions
gbreed at cix.compulink.co.uk
gbreed at cix.compulink.co.uk
Thu Jul 12 12:42:09 EDT 2001
In article <mailman.994953021.32263.python-list at python.org>,
fett at tradersdata.com () wrote:
> I have a really dumb program that i would like to make smarter. I need
> to take a file on my hard drive and filter out everything except for the
> standings which are written in it. I have tried to use regular
> expressions with no success, but i still think that they are probably
> the best way. I created the following simple fix, but it is unreliable
> if the data changed posistions.
>
>
> input = open('rawdata', 'r')
> S = input.read()
> print S[4021:6095]
>
> Output :
> League Standings
> American League
> EAST W L PCT GB HOME ROAD EAST CENT WEST NL L10 STRK
> Red Sox 43 29 .597 - 23-15 20-14 23-13 8-7 6-6 6-3 6-4 L2
> Yankees 41 31 .569 2.0 21-15 20-16 19-11 12-9 5-7 5-4 6-3 W2
> Blue Jays 35 38 .479 8.5 18-20 17-18 14-13 6-7 11-13 4-5 5-5 W3
> Orioles 34 39 .466 9.5 20-20 14-19 15-17 9-12 6-5 4-5 5-5 L1
> ........( it continues with all the standings)
Even without regular expressions, I think input.readlines()[4:] or the
like would work, and be simpler than what you do now.
re.findall('((?:[A-Z]\w+ ){1,2}[-0-9. ]+\w\d)', S) does the trick on this
data.
(?:[A-Z]\w+ )
matches a capital letter followed by alphanumerics followed by a space,
and doesn't group on it. Perhaps should be (?:[A-Z][a-z]+ )
{1,2}
matches 1 or 2 words, this would fail on a team with a three word name
[-0-9. ]+
matches more than one numeral, =, . or space. That covers the stuff in
the middle. You may like to make it more specific.
\w\d
then an alphanumeric followed by a digit to end. If the first character
is always a capital letter, it could be [A-Z]\d and if it's always W or L,
[WL]\d
)
and return the whole match as a group.
> Also could you tell me if its possible to download the data from the
> web-page in python so that it doesnt even have to deal with opening the
> file.
Sure is! Check out the urllib module.
Graham
More information about the Python-list
mailing list