split encloser
Jason Tiller
jtiller at sjm.com
Thu Apr 3 19:45:12 EST 2003
Hi, "Aussie", :)
On 3 Apr 2003 aussie2010 at yahoo.com wrote:
> string.split() takes a delimiter and works fine as long as the
> delimiter isn't part of the data fields. But frequently they are.
> e.g. 'John Doe,135 South Main St.,#122, Springfield, Iowa' or
> ' so long goodbye see ya'
> Because the fields can contain the delimiter in some cases, an
> encloser is usually used (typically "") to handle those fields.
> The above strings would be written:
>
> 'John Doe,"135 South Main St., #122", Springfield, Iowa'
> and
> '"so long" goodbye "see ya"'
> I don't understand regular expressions but I was wondering if anyone
> that did knew of a way to get re.split() to handle "enclosers" as
> used above.
Hmm. I am not yet a knowledgable user of Python's regex features, but
I *do* know Perl's pretty well. With Perl, you might split up your
fields like this:
my $data = "Tiller, Jason, \"39177 Sundale Dr., Apt. 12\", Fremont, CA";
while( $data =~ /\G(?:\"(.+?)\",? ?)|\G(.+?), ?/g ) {
my $field = $1 || $2;
print "$field\n";
}
This says (essentially):
Starting from the previous match point,
Look for either:
A series of characters surrounded by the "enclosers" or
A series of characters up to the split character
If found:
Set our field to which of the matches succeeded
Set the previous match point
Else:
Exit
The RE breaks down like this:
/\G # Anchored to the previous match location:
(?: # Don't capture the following group.
\" # Find an encloser
(.+?) # followed by a set of any characters (non-greedy)
\" # bounded by another encloser
,? ?) # and potentially a split character and a space
| # OR
\G # Again achored at the previous match location
(.+?) # Find a set of any characters (non-greedy)
, ? # That end with the split character and possibly a space
/gx # Do this until the match fails and ignore whitespace in
# the pattern
>From what I was reading in a modern edition (2.0) of "Programming
Python," Python has adopted many of Perl's RE enhancements
(non-capturing subexpressions, non-greedy quantifiers, additional
anchors, etc.), so the pattern may work in Python without much effort.
I dunno.
Good luck! I hope this pattern at least gives you a starting point
for implementing the split() in Python.
---Jason
Sonos Handbell Ensemble
More information about the Python-list
mailing list