help splitting strings

Sean 'Shaleh' Perry shalehperry at attbi.com
Thu Apr 18 12:33:21 EDT 2002


On 18-Apr-2002 Tony Case wrote:
> Hi all
>       I'm pretty new to Python which probably explains my problems,
> I'm trying to read data from a number of comma separated variable
> files.  These are fairly large, 5000 lines plus.
>       I'm reading the files and using split to separate the
> individual values, some numeric some string.  The problem is that a
> number of the lines have commas and double quotes embeded in the
> strings so there are more elements in the list returned from split
> than there should be, in one file there are 400 + lines like this so
> it's not easy to edit the original file.
>       Is there a programatic way of dealing with this?
> The input files are in the form :-
> int,int,int,"string","string",int,"string"
> etc.
> I tried using the re.findall and re.split but didn't get very far.
>       TIA     T Case
> -- 
> http://mail.python.org/mailman/listinfo/python-list

my first thought is something like this:

re.compile(r'^(\d+),(\d+),(\d+),((?:"[^"]+")|(?:[^,]+)),((?:"[^"]+")|(?:[^,]+)),
(\d+),((?:"[^"]+")|(?:[^,]+))$')

\d+ means any number.  The ugly part is the string matching.  "[^"]+" means
match a double quote, then some number of non-double quote chars followed by a
double quote.  This is wrapped in (?:) to make it not put that chunk in a
group.  Same goes for the alternative which is simply a string of anything but
commas.





More information about the Python-list mailing list