split encloser

Steven Taschuk staschuk at telusplanet.net
Thu Apr 3 23:28:22 EST 2003


Quoth Chris:
> string.split() takes a delimiter and works fine as long as the
> delimiter isn't part of the data fields. But frequently they are.
> e.g. 'John Doe,135 South Main St.,#122, Springfield, Iowa' or
>       ' so long goodbye see ya'
> 
> Because the fields can contain the delimiter in some cases, an
> encloser is usually used (typically "") to handle those fields.
> 
> The above strings would be written:
> 'John Doe,"135 South Main St., #122", Springfield, Iowa'
> and 
> '"so long" goodbye "see ya"'

What if the field data contains double quotes?

> I don't understand regular expressions but I was wondering if anyone
> that did knew of a way to get re.split() to handle "enclosers" as used
> above.

Why use a regular expression?  string.split can do the trick:

	in_out = line.split('"')
	fields = []
	for i in range(len(in_out)):
		if i % 2:
			results.append(in_out[i])
		else:
			results.extend(in_out[i].split(','))

Or, more clearly:

	fields = []
	while line:
		if line.startswith('"'):
			endquote = line.index('"', 1)
			field = line[1:endquote]
			# assumed " is followed by ,
			line = line[endquote+1:]
		else:
			field, line = line.split(',', 1)
		fields.append(field)

-- 
Steven Taschuk                                     staschuk at telusplanet.net
Receive them ignorant; dispatch them confused.  (Weschler's Teaching Motto)





More information about the Python-list mailing list