[Tutor] parsing a string

Kent Johnson kent_johnson at skillsoft.com
Sun Oct 10 20:40:39 CEST 2004


You can do this with the CSV module or with regular expressions:

import csv, re

# Data in a list so csv.reader can iterate over it
data = [ '1 1997 2 "Henrik Larsson"' ]

r = csv.reader(data, delimiter=' ')
for row in r:
     print row  # prints ['1', '1997', '2', 'Henrik Larsson']


# Regular expression to match three groups of digits separated by 
whitespace, then whatever is between the quotes
lineRe = re.compile(r'(\d+)\s+(\d+)\s+(\d+)\s+"(.*)"')
match = lineRe.search(data[0])
print match.group(1, 2, 3, 4)  # prints ('1', '1997', '2', 'Henrik Larsson')


The csv version might be handier if the data is in a file or file-like 
object, because it expects to iterate over the input. Also if the quotes 
are optional it will work just fine.

The regex version might be better if you get the strings one at a time. If 
the quotes are optional you should change the regex to something like this:
r'(\d+)\s+(\d+)\s+(\d+)\s+"?(.*)"?

Kent

At 08:00 PM 10/10/2004 +0200, L&L wrote:
>Hi All,
>
>Suppose I have a string that looks like this:
>
>1 1997 2 "Henrik Larsson"
>
>I want to convert the string to a list, with four members. Is there an 
>easy way to do this (the hard way would be to find all quotes, save to a 
>separate string the area between the quotes, remove this part from the 
>original string, use string.split, and put the string back together.
>
>Thanks.
>
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list