[Python-Dev] New string method - splitquoted

Heiko Wundram me+python at modelnine.org
Thu May 18 08:59:26 CEST 2006


Am Donnerstag 18 Mai 2006 06:06 schrieb Dave Cinege:
> This is useful, but possibly better put into practice as a separate
> method??

I personally don't think it's particularily useful, at least not in the 
special case that your patch tries to address.

1) Generally, you won't only have one character that does quoting, but 
several. Think of the Python syntax, where you have ", ', """ and ''', which 
all behave slightly differently. The logic for " and ' is simple enough to 
implement (basically that's what your patch does, and I'm sure it's easy 
enough to extend it to accept a range of characters as splitters), but if you 
have more complicated quoting operators (such as """), are you sure it's 
sensible to implement the logic in split()?

2) What should the result of "this is a \"test string".split(None,-1,'"') be? 
An exception (ParseError)? Silently ignoring the missing delimiter, and 
returning ['this','is','a','test string']? Ignoring the delimiter altogether, 
returning ['this','is','a','"test','string']? I don't think there's one case 
to satisfy all here...

3) What about escapes of the delimiter? Your current patch doesn't address 
them at all (AFAICT) at the moment, but what should the escaping character 
be? Should "escape processing" take place, i.E. what should the result 
of "this is a \\\"delimiter \\test".split(None,-1,'"') be?

Don't get me wrong, I personally find this functionality very, very 
interesting (I'm +0.5 on adding it in some way or another), especially as a 
part of the standard library (not necessarily as an extension to .split()).

But there's quite a lot of semantic stuff to get right before you can 
implement it properly; see the complexity of the csv module, where you have 
to define pretty much all of this in the dialect you use to parse the csv 
file...

Why not write up a PEP?

--- Heiko.


More information about the Python-Dev mailing list