help splitting strings
Michael Gilfix
mgilfix at eecs.tufts.edu
Thu Apr 18 12:34:03 EDT 2002
You definitely want to use the re module and finall
here. That way you can have tighter control over what
constitutes a field. The difficulty here is crafting your
regular expression so it matches what you want. Check out
http://www.python.org/doc/current/lib/re-syntax.html for a description
of the regular expression syntax.
You'll probably want to code something like this:
text = # line to parse
field_re = re.compile (r' <expression here> ')
match_obj = field_re.finall (text)
if match_obj is not None:
my_fields = list (match_obj.groups ())
As for the regular expression, you'll probably want a look ahead to
check if there are commas within strings:
something like:
expr = r'(".*,.*")?,"
I doubt that'll work but it'll give you an idea anyway :)
-- Mike
On Thu, Apr 18 @ 16:35, Tony Case wrote:
> Hi all
> I'm pretty new to Python which probably explains my problems,
> I'm trying to read data from a number of comma separated variable
> files. These are fairly large, 5000 lines plus.
> I'm reading the files and using split to separate the
> individual values, some numeric some string. The problem is that a
> number of the lines have commas and double quotes embeded in the
> strings so there are more elements in the list returned from split
> than there should be, in one file there are 400 + lines like this so
> it's not easy to edit the original file.
> Is there a programatic way of dealing with this?
> The input files are in the form :-
> int,int,int,"string","string",int,"string"
> etc.
> I tried using the re.findall and re.split but didn't get very far.
> TIA T Case
> --
> http://mail.python.org/mailman/listinfo/python-list
`-> (tony)
--
Michael Gilfix
mgilfix at eecs.tufts.edu
For my gpg public key:
http://www.eecs.tufts.edu/~mgilfix/contact.html
More information about the Python-list
mailing list