help splitting strings
Sean 'Shaleh' Perry
shalehperry at attbi.com
Thu Apr 18 12:33:21 EDT 2002
On 18-Apr-2002 Tony Case wrote:
> Hi all
> I'm pretty new to Python which probably explains my problems,
> I'm trying to read data from a number of comma separated variable
> files. These are fairly large, 5000 lines plus.
> I'm reading the files and using split to separate the
> individual values, some numeric some string. The problem is that a
> number of the lines have commas and double quotes embeded in the
> strings so there are more elements in the list returned from split
> than there should be, in one file there are 400 + lines like this so
> it's not easy to edit the original file.
> Is there a programatic way of dealing with this?
> The input files are in the form :-
> int,int,int,"string","string",int,"string"
> etc.
> I tried using the re.findall and re.split but didn't get very far.
> TIA T Case
> --
> http://mail.python.org/mailman/listinfo/python-list
my first thought is something like this:
re.compile(r'^(\d+),(\d+),(\d+),((?:"[^"]+")|(?:[^,]+)),((?:"[^"]+")|(?:[^,]+)),
(\d+),((?:"[^"]+")|(?:[^,]+))$')
\d+ means any number. The ugly part is the string matching. "[^"]+" means
match a double quote, then some number of non-double quote chars followed by a
double quote. This is wrapped in (?:) to make it not put that chunk in a
group. Same goes for the alternative which is simply a string of anything but
commas.
More information about the Python-list
mailing list