Regular expression query
Tim Chase
python.list at tim.thechases.com
Sun Mar 12 14:51:46 EDT 2017
On 2017-03-12 09:22, rahulrasal at gmail.com wrote:
> aaaaa,bbbbb,ccccc "4873898374", ddddd, eeeeee "3343,23,23,5,,5,45",
> fffff "5546,3434,345,34,34,5,34,543,7"
>
> It is comma saperated string, but some of the fields have a double
> quoted string as part of it (and that double quoted string can have
> commas). Above string have only 6 fields. First is aaaaa, second is
> bbbbb and last is fffff "5546,3434,345,34,34,5,34,543,7". How can I
> split this string in its fields using regular expression ? or even
> if there is any other way to do this, please speak out.
Your desired output seems to silently ignore the spaces after the
commas (e.g. why is it "ddddd" instead of " ddddd"?). You also don't
mention what should happen in the event there's an empty field:
aaa,,ccc,ddd "ee",ff
For a close approximation, you might try
import re
instr = 'aaaaa,bbbbb,ccccc "4873898374", ddddd, eeeeee "3343,23,23,5,,5,45", fffff "5546,3434,345,34,34,5,34,543,7"'
desired = [
"aaaaa",
"bbbbb",
'ccccc "4873898374"',
"ddddd",
'eeeeee "3343,23,23,5,,5,45"',
'fffff "5546,3434,345,34,34,5,34,543,7"',
]
r = re.compile(r'(?!,|$)(?:"[^"]*"|[^,])*')
# result = r.findall(instr)
# strip them because of the aforementioned leading-space issue
result = [s.strip() for s in r.findall(instr)]
assert len(result) == len(desired), str(result)
assert result == desired, str(result)
It doesn't address the empty field issue, but it's at least a start.
-tkc
More information about the Python-list
mailing list