help splitting strings

Michael Gilfix mgilfix at eecs.tufts.edu
Thu Apr 18 12:34:03 EDT 2002


  You definitely want to use the re module and finall
here.  That way you can have tighter control over what
constitutes a field.  The difficulty here is crafting your
regular expression so it matches what you want. Check out
http://www.python.org/doc/current/lib/re-syntax.html for a description
of the regular expression syntax.

  You'll probably want to code something like this:

   text = # line to parse

   field_re = re.compile (r' <expression here> ')
   match_obj = field_re.finall (text)

   if match_obj is not None:
     my_fields = list (match_obj.groups ())

  As for the regular expression, you'll probably want a look ahead to
check if there are commas within strings:

  something like:

       expr = r'(".*,.*")?,"

  I doubt that'll work but it'll give you an idea anyway :)

            -- Mike

On Thu, Apr 18 @ 16:35, Tony Case wrote:
> Hi all
> 	I'm pretty new to Python which probably explains my problems,
> I'm trying to read data from a number of comma separated variable
> files.  These are fairly large, 5000 lines plus.
> 	I'm reading the files and using split to separate the
> individual values, some numeric some string.  The problem is that a
> number of the lines have commas and double quotes embeded in the
> strings so there are more elements in the list returned from split
> than there should be, in one file there are 400 + lines like this so
> it's not easy to edit the original file.
> 	Is there a programatic way of dealing with this?
> The input files are in the form :-
> int,int,int,"string","string",int,"string"
> etc.
> I tried using the re.findall and re.split but didn't get very far.
> 	TIA	T Case
> -- 
> http://mail.python.org/mailman/listinfo/python-list
`-> (tony)

-- 
Michael Gilfix
mgilfix at eecs.tufts.edu

For my gpg public key:
http://www.eecs.tufts.edu/~mgilfix/contact.html





More information about the Python-list mailing list