RE Help splitting CVS data

Garry ggkraemer at gmail.com
Sun Jan 20 23:04:39 CET 2013


I'm trying to manipulate family tree data using Python.
I'm using linux and Python 2.7.3 and have data files saved as Linux formatted cvs files
The data appears in this format:

Marriage,Husband,Wife,Date,Place,Source,Note0x0a
Note: the Source field or the Note field can contain quoted data (same as the Place field)

Actual data:
[F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
[F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a

code snippet follows:

import os
import re
#I'm using the following regex in an attempt to decode the data:
RegExp2 = "^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d{,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"
#
line = "[F0244],[I0690],[I0354],1916-06-08,\"Neely's Landing, Cape Gir. Co, MO\",,"
#
(Marriage,Husband,Wife,Date,Place,Source,Note) = re.split(RegExp2,line)
#
#However, this does not decode the 7 fields.
# The following error is displayed:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
#
# When I use xx the fields apparently get unpacked.
xx = re.split(RegExp2,line)
#
>>> print xx[0]

>>> print xx[1]
[F0244]
>>> print xx[5]
"Neely's Landing, Cape Gir. Co, MO"
>>> print xx[6]

>>> print xx[7]

>>> print xx[8]

Why is there an extra NULL field before and after my record contents?
I'm stuck, comments and solutions greatly appreciated.

Garry 




More information about the Python-list mailing list