[IPython-dev] Patch for paren completion glitch

Thu Jul 22 02:42:10 EDT 2004

Ville Vainio wrote:
> I managed to make "(foo)bar" completion work for filenames, without 
> breaking completion for space characters (or anything else I could try 
> out quickly). It seems to work on Linux at  least - haven't tried it on 
> Windows yet, I thought it might be a good idea to show the patch for 
> some quick feedback before that.
> 
> As you can guess it's brutally hackish, but I figure that's the way it 
> goes with readline ;-).
> 
> It should be trivial to add the functionality for other delimiter chars. 
> I used shlex.split directly, it should be changef to shlex_split from 
> magic for bacwards compatibility but I thought I'll do it later on...

It's pretty ugly, but I doubt much better is achievable given the problem at 
hand.  I am, however, a bit concerned about performance given that this stuff 
gets called for _every_ filename in the completions match, which can be a lot 
in a big directory.  People expect tab-completion to be near-instantaneous, 
and I'd like to keep it that way.  In particular, I think that

def unprotect_filename(s):
     chs = []
     in_escape = False
     for ch in s:
         if in_escape:
             chs.append(ch)
             in_escape = False
             continue
         if ch == '\\':
             in_escape = True
             continue
         chs.append(ch)

     return "".join(chs)

is essentially:

# Alternative unprotect_filename
# About 5 times faster than the original
unprotect_filename2 = lambda s:s.replace('\\','')

Am I right?  If that's the case, it can (and should) be explicitly inlined, 
since function call overhead in python is violent.  Even as a function, the 
second form is about 5 times faster for short strings, which is significant. 
Once inlined, the payoff will be even bigger.

I'm attaching a file which tests that indeed these two return identical 
results for a bunch of random tests.  I also checked the protect_filename, and 
could manage very minor improvements by using a string instead of a list for 
the 'in' check: checking 'char in string' is faster than 'char in list_of_chars'.

You can use the tester at the end for other checks like the one I suggest below.

It would be worth also checking if this:

+            lsplit = shlex.split(lbuf[:self.readline.get_endidx()])[-1]

is faster when done with a regexp instead of shlex.split (the latter is HUGE, 
so I expect it to be pretty slow).

Here:
+                matches = [text0 + protect_filename(f[len(lsplit):]) for f in m0]

the len(lsplit) should be kept in a local outside, since python does not lift 
constants out of loops or listcomps (the python compiler is absolutely 
primitive in the optimizations it attempts).

I agree that it's necessary to do this correctly because people do have 
filenames with these chars in them. But since this is smack in the middle of 
the interactive loop, I really want to be sure that the code is as absolutely 
tight as possible.  Also keep in mind that there may be users out there with 
hardware much slower than yours, so coding for absolute efficiency is 
important, even if it seems to run fine on good hardware.

I'm sure we'll converge on a nice solution shortly.  Just go over every line 
with a maniac eye for optimization fine-tunings.  I've always tried to write 
ipython that way, in the code paths which lie in the middle of the interactive 
loop.  I think the fact that even with all the stuff that goes on it still 
feels reasonably responsive is a testament to the effort being worth it (and 
obviously to the quality of python's implementation).

Thanks for the work!

Best,

f
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comp_strfuns.py
Type: text/x-python
Size: 2238 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20040722/1f0f61fb/attachment.py>