Using csv.DictReader with \r\n in the middle of fields

Neil Cerutti neilc at norwich.edu
Wed Oct 13 11:01:31 EDT 2010


On 2010-10-13, pstatham <pstatham at sefas.com> wrote:
> Hopefully this will interest some, I have a csv file (can be
> downloaded from http://www.paulstathamphotography.co.uk/45.txt) which
> has five fields separated by ~ delimiters. To read this I've been
> using a csv.DictReader which works in 99% of the cases. Occasionally
> however the description field has errant \r\n characters in the middle
> of the record. This causes the reader to assume it's a new record and
> try to read it.

Here's an alternative idea. Working with csv module for this job
is too difficult for me. ;)

import re

record_re = "(?P<PROGTITLE>.*?)~(?P<SUBTITLE>.*?)~(?P<EPISODE>.*?)~(?P<DESCRIPTION>.*?)~(?P<DATE>.*?)\n(.*)"

def parse_file(fname):
    with open(fname) as f:
        data = f.read()
        m = re.match(record_re, data, flags=re.M | re.S)
        while m:
            yield m.groupdict()
            m = re.match(record_re, m.group(6), flags=re.M | re.S)

for record in parse_file('45.txt'):
    print(record)

-- 
Neil Cerutti



More information about the Python-list mailing list