Strange re problem

John Machin sjmachin at lexicon.net
Fri Jun 20 14:06:57 CEST 2008


On Jun 20, 9:01 pm, TYR <a.harrow... at gmail.com> wrote:
> OK, this ought to be simple. I'm parsing a large text file (originally
> a database dump) in order to process the contents back into a SQLite3
> database. The data looks like this:
>
> 'AAA','PF',-17.416666666667,-145.5,'Anaa, French Polynesia','Pacific/
> Tahiti','Anaa';'AAB','AU',-26.75,141,'Arrabury, Queensland,
> Australia','?','?';'AAC','EG',31.133333333333,33.8,'Al Arish,
> Egypt','Africa/Cairo','El Arish International';'AAE','DZ',
> 36.833333333333,8,'Annaba','Africa/Algiers','Rabah Bitat';
>
> which goes on for another 308 lines.

308 lines or 308 rows? Another way of asking the same question: do you
have line terminators like \n or \r\n or \r in your file? If so, you
will need to do something like this:

rows = open('myfile', 'rb').read().replace('\r\n', '').split(';')

> As keen and agile minds will no
> doubt spot, the rows are separated by a ; so it should be simple to
> parse it using a regex. So, I establish a db connection and cursor,
> create the table, and open the source file.
>
> Then we do this:
>
> f = file.readlines()
> biglist = re.split(';', f)
>
> and then iterate over the output from re.split(), inserting each set
> of values into the db,

Where we left off, you had a list of rows. Each row will be a string
like:
'AAB','AU',-26.75,141,'Arrabury, Queensland,
Australia','?','?'

How do you propose to parse that string into a "set of values"?  Can
you rely there being data commas only in the 5th field, or do you need
a general solution? What if (as Peter remarked) there is a ';' in the
data? What if there's a "'" in the data (think O'Hare)?

> and finally close the file and commit
> transactions. But instead, I get this error:
>
> Traceback (most recent call last):
>   File "converter.py", line 12, in <module>
>     biglist = re.split(';', f)
>   File "/usr/lib/python2.5/re.py", line 157, in split
>     return _compile(pattern, 0).split(string, maxsplit)
> TypeError: expected string or buffer
>
> Is this because the lat and long values are integers rather than
> strings? (If so, any ideas?)

At the stage where it blew up, you didn't even have rows, let alone
fields, let alone worries about converting your lat and long fields
from string to float (not integer!).

HTH,
John



More information about the Python-list mailing list