Reading plain text file database tables
Gordon McMillan
gmcm at hypernet.com
Thu Sep 9 08:26:32 EDT 1999
I think you'll find a regex that does this on Hans Nowaks snippets
page, whereever that is at the moment (he just posted a notice that
it had moved in the last few days).
Just found it at http://www.hvision.nl/~ivnowa/snippets/source/67.py
Li Dongfeng wrote:
> Thanks for the direction. I'm also
> trying to solve this with RE. But in order
> to extract "ab \" cd" like structure,
> your [^"\\] seems puzzling, because anything
> inside [] represent only one character, not
> the '\"' two character sequence. Any real
> worked out result?
>
> Stephan Houben wrote:
> >
> > On Thu, 09 Sep 1999 14:15:38 +0800, Li Dongfeng
> > <mavip5 at inet.polyu.edu.hk> wrote:
> >
> > >
> > >Do we have a module to read plain text file
> > >database tables?
> > >
> > >All the data management software, e.g. excel,
> > >dBase, SAS, etc., support input/output a table
> > >from/to a plain text file, fields can be separated
> > >by their column position, by spaces, by tabs,
> > >by commas, etc.
> > >
> > >How can we read this kind of file into a matrix like
> > >structure(list of lists)? I have written one reading
> > >files with fixed-width fields. For delimited files,
> > >simply using string.split work most of the time, but
> > >fails reading lines like
> > >
> > > "Peter Thomson" 25 36
> > >
> > >or even
> > >
> > > "Peter\" Thomson" 25 36
> > >
> > >I think this is a common task, so maybe someone has
> > >already given a very good solution.
> >
> > Use regular expressions.
> > A regular expression that matches strings within "..",
> > while escaping " with \, is:
> >
> > r = re.compile(r'"([^"\\]|(\\.))*"')
> >
> > Extend this to get what you want.
> > If you're very worried about speed, create
> > a lexer using (f)lex, and then call the generated
> > C code from python.
> >
> > Greetings,
> >
> > Stephan
>
> --
> http://www.python.org/mailman/listinfo/python-list
- Gordon
More information about the Python-list
mailing list