RE Help splitting CVS data
Roy Smith
roy at panix.com
Sun Jan 20 19:00:50 EST 2013
In article <3e1e8567-b9f4-446a-8a59-75f45367d2ac at googlegroups.com>,
Garry <ggkraemer at gmail.com> wrote:
> Actual data:
> [F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
> [F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a
>
> code snippet follows:
>
> import os
> import re
> #I'm using the following regex in an attempt to decode the data:
First suggestion, don't try to parse CSV data with regex. I'm a huge
regex fan, but it's just the wrong tool for this job. Use the built-in
csv module (http://docs.python.org/2/library/csv.html). Or, if you want
something fancier, read_csv() from pandas (http://tinyurl.com/ajxdxjm).
Second, when you use regexes, *always* use raw strings around the
pattern:
RegExp2 = r'....'
Lastly, take a look at the re.VERBOSE flag. It lets you write monster
regexes split up into several lines. Between re.VERBOSE and raw
strings, it can make the difference between line noise like this:
> RegExp2 =
> "^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d
> {,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"
and something that mere mortals can understand.
More information about the Python-list
mailing list