Yet another "split string by spaces preserving single quotes" problem
Tim Chase
python.list at tim.thechases.com
Mon May 14 20:23:50 EDT 2012
On 05/13/12 16:14, Massi wrote:
> Hi everyone,
> I know this question has been asked thousands of times, but in my case
> I have an additional requirement to be satisfied. I need to handle
> substrings in the form 'string with spaces':'another string with
> spaces' as a single token; I mean, if I have this string:
>
> s ="This is a 'simple test':'string which' shows 'exactly my'
> problem"
>
> I need to split it as follow (the single quotes must be mantained in
> the splitted list):
The "quotes must be maintained" bit is what makes this different
from most common use-cases. Without that condition, using
shlex.split() from the standard library does everything else that
you need. Alternatively, one might try hacking csv.reader() to do
the splitting for you, though I had less luck than with shlex.
> Up to know I have written some ugly code which uses regular
> expression:
>
> splitter = re.compile("(?=\s|^)('[^']+') | ('[^']+')(?=\s|$)")
You might try
r = re.compile(r"""(?:'[^']*'|"[^"]*"|[^'" ]+)+""")
print r.findall(s)
which seems to match your desired output. It doesn't currently
handle tabs, but by breaking it out, it's easy to modify (and may
help understand what it's doing)
>>> single_quoted = "'[^']*'"
>>> double_quoted = '"[^"]*"'
>>> other = """[^'" \t]+""" # added a "\t" tab here
>>> matches = '|'.join((single_quoted, double_quoted, other))
>>> regex = r'(?:%s)+' % matches
>>> r = re.compile(regex)
>>> r.findall(s)
['This', 'is', 'a', "'simple test':'string which'", 'shows',
"'exactly my'", 'problem']
Hope this helps,
-tkc
More information about the Python-list
mailing list