RE Help splitting CVS data

Roy Smith roy at
Mon Jan 21 01:00:50 CET 2013

In article <3e1e8567-b9f4-446a-8a59-75f45367d2ac at>,
 Garry <ggkraemer at> wrote:

> Actual data:
> [F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
> [F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a
> code snippet follows:
> import os
> import re
> #I'm using the following regex in an attempt to decode the data:

First suggestion, don't try to parse CSV data with regex.  I'm a huge 
regex fan, but it's just the wrong tool for this job.  Use the built-in 
csv module (  Or, if you want 
something fancier, read_csv() from pandas (

Second, when you use regexes, *always* use raw strings around the 

RegExp2 = r'....'

Lastly, take a look at the re.VERBOSE flag.  It lets you write monster 
regexes split up into several lines.  Between re.VERBOSE and raw 
strings, it can make the difference between line noise like this:

> RegExp2 = 
> "^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d
> {,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"

and something that mere mortals can understand.

More information about the Python-list mailing list