Re: [Python-checkins] CVS: distutils/distutils util.py,1.36,1.37
Would the "shlex" module be helpful here? It is in the standard library and is (well?) maintained by ESR. It could help reduce the code inside distutils. [ I've always questioned the need for distutils' own "copy file" functions and whatnot... seems there is a bit of duplication occurring... ] Cheers, -g On Sat, Jun 24, 2000 at 01:40:05PM -0700, Greg Ward wrote:
Update of /cvsroot/python/distutils/distutils In directory slayer.i.sourceforge.net:/tmp/cvs-serv28287
Modified Files: util.py Log Message: Added 'split_quoted()' function to deal with strings that are quoted in Unix shell-like syntax (eg. in Python's Makefile, for one thing -- now that I have this function, I'll probably allow quoted strings in config files too.
Index: util.py =================================================================== RCS file: /cvsroot/python/distutils/distutils/util.py,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -r1.36 -r1.37 *** util.py 2000/06/18 15:45:55 1.36 --- util.py 2000/06/24 20:40:02 1.37 *************** *** 167,168 **** --- 167,235 ----
return error + + + # Needed by 'split_quoted()' + _wordchars_re = re.compile(r'[^\\\'\"\ ]*') + _squote_re = re.compile(r"'(?:[^'\\]|\\.)*'") + _dquote_re = re.compile(r'"(?:[^"\\]|\\.)*"') + + def split_quoted (s): + """Split a string up according to Unix shell-like rules for quotes and + backslashes. In short: words are delimited by spaces, as long as those + spaces are not escaped by a backslash, or inside a quoted string. + Single and double quotes are equivalent, and the quote characters can + be backslash-escaped. The backslash is stripped from any two-character + escape sequence, leaving only the escaped character. The quote + characters are stripped from any quoted string. Returns a list of + words. + """ + + # This is a nice algorithm for splitting up a single string, since it + # doesn't require character-by-character examination. It was a little + # bit of a brain-bender to get it working right, though... + + s = string.strip(s) + words = [] + pos = 0 + + while s: + m = _wordchars_re.match(s, pos) + end = m.end() + if end == len(s): + words.append(s[:end]) + break + + if s[end] == ' ': # unescaped, unquoted space: now + words.append(s[:end]) # we definitely have a word delimiter + s = string.lstrip(s[end:]) + pos = 0 + + elif s[end] == '\\': # preserve whatever is being escaped; + # will become part of the current word + s = s[:end] + s[end+1:] + pos = end+1 + + else: + if s[end] == "'": # slurp singly-quoted string + m = _squote_re.match(s, end) + elif s[end] == '"': # slurp doubly-quoted string + m = _dquote_re.match(s, end) + else: + raise RuntimeError, \ + "this can't happen (bad char '%c')" % s[end] + + if m is None: + raise ValueError, \ + "bad string (mismatched %s quotes?)" % s[end] + + (beg, end) = m.span() + s = s[:beg] + s[beg+1:end-1] + s[end:] + pos = m.end() - 2 + + if pos >= len(s): + words.append(s) + break + + return words + + # split_quoted ()
_______________________________________________ Python-checkins mailing list Python-checkins@python.org http://www.python.org/mailman/listinfo/python-checkins
-- Greg Stein, http://www.lyra.org/
On 24 June 2000, Greg Stein said:
Would the "shlex" module be helpful here? It is in the standard library and is (well?) maintained by ESR. It could help reduce the code inside distutils.
I looked at "shlex", but didn't like the fact that it 1) does character-by-character analysis of input, and 2) requires a file-like object. Just a performance concern, really.
[ I've always questioned the need for distutils' own "copy file" functions and whatnot... seems there is a bit of duplication occurring... ]
Two reasons for that: bugs in the standard library versions, and missing features in the standard library versions. I think the first argument goes away now that I've given up on 1.5.1 compatibility (shutil.py was really broken in 1.5.1), but the fact remains that the copy functions in shutil.py don't have a dry_run option, don't have a verbose option, don't have a preserve_times option, don't have a preserve_symlinks option, etc. All of these things are somewhere between useful and necessary. I'm always open for ideas on reducing the amount of code in the Distutils; it really is getting ridiculous. It cracked 10k lines of code+comments+doc this weekend -- about 5300 lines of straight code, I think. Anyways, the basic required functionality is now in place, so I'm open to clever refactoring/reduction/simplification patches. Greg
participants (2)
-
Greg Stein
-
Greg Ward