Using csv.DictReader with \r\n in the middle of fields
Neil Cerutti
neilc at norwich.edu
Wed Oct 13 11:01:31 EDT 2010
On 2010-10-13, pstatham <pstatham at sefas.com> wrote:
> Hopefully this will interest some, I have a csv file (can be
> downloaded from http://www.paulstathamphotography.co.uk/45.txt) which
> has five fields separated by ~ delimiters. To read this I've been
> using a csv.DictReader which works in 99% of the cases. Occasionally
> however the description field has errant \r\n characters in the middle
> of the record. This causes the reader to assume it's a new record and
> try to read it.
Here's an alternative idea. Working with csv module for this job
is too difficult for me. ;)
import re
record_re = "(?P<PROGTITLE>.*?)~(?P<SUBTITLE>.*?)~(?P<EPISODE>.*?)~(?P<DESCRIPTION>.*?)~(?P<DATE>.*?)\n(.*)"
def parse_file(fname):
with open(fname) as f:
data = f.read()
m = re.match(record_re, data, flags=re.M | re.S)
while m:
yield m.groupdict()
m = re.match(record_re, m.group(6), flags=re.M | re.S)
for record in parse_file('45.txt'):
print(record)
--
Neil Cerutti
More information about the Python-list
mailing list