[Tutor] Regexp with multiple patterns in Python

Kent Johnson kent37 at tds.net
Tue Aug 16 13:43:43 CEST 2005


Kristian Evensen wrote:
> What I want to do is to check for two patterns to make sure all 
> occurrences of pattern1 and pattern2 come in the same order as they do 
> in the file I parse. It it contains a number of computer-games I would 
> like the output to look something like this:
> 
> PC, Battlefield, Battlefield2
> 
> PS2, Battlefield 2: Modern Combat.
> 
>  
> 
> The file is constructed somewhat similar to this:
> 
> PC
>             Battlefield, Battfiled2
> PS2
>             Battlefield 2: Modern Combat

Are you trying to *check* that the data is in this format, or are you trying to read this format and output a different format?

If the data is really this simple you can just iterate over the lines in pairs:
f = open('myfile.txt')
fi = iter(f)
for line in fi:
  nextline = fi.next()
  print '%s, %s' % (line.strip(), nextline.strip())

You could add code to the loop to check that the data is in the expected format.

> Using the following expression (and re.findall) I get somewhat closer:
> 
> pattern8 = re.compile(r'search.asp\?title=battlefield.*?><.*?>(PCCD|XBOX 
> 360|XBOX|PLAYSTATION PSP|PLAYSTATION 2) - 
> TITLE<|game.asp\?id=(\d+).*?><.*?><.*?>(.*?)<')

>From this regex it looks like the data is actually in XML. You might want to use an XML parser like ElementTree or BeautifulSoup to parse the file, then extract the data from the resulting tree.

If neither of these suggestions helps, please post a sample of your actual data, the actual results you want, and the code you have so far.

Kent



More information about the Tutor mailing list