Overriding iadd for dictionary like objects

RunThePun ubershmekel at gmail.com
Sun Aug 30 03:56:31 EDT 2009


On Aug 29, 1:58 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Aug 28, 10:37 pm, Joshua Judson Rosen <roz... at geekspace.com> wrote:
>
>
>
>
>
> > Carl Banks <pavlovevide... at gmail.com> writes:
>
> > > On Aug 28, 2:42 pm, Terry Reedy <tjre... at udel.edu> wrote:
>
> > > > Carl Banks wrote:
> > > > > I don't think it needs a syntax for that, but I'm not so sure a method
> > > > > to modify a value in place with a single key lookup wouldn't
> > > > > occasioanally be useful.
>
> > > > Augmented assignment does that.
>
> > > Internally uses two lookups, one for getting, and one for setting.
>
> > > I think this is an unavoidable given Python's semantics.  Look at
> > > the traceback:
>
> > > >>> def x():
> > > ...     d['a'] += 1
> > > ...
> > > >>> dis.dis(x)
> > >   2           0 LOAD_GLOBAL              0 (d)
> > >               3 LOAD_CONST               1 ('a')
> > >               6 DUP_TOPX                 2
> > >               9 BINARY_SUBSCR
>
> > OK, there's one lookup, but...
>
> > >              10 LOAD_CONST               2 (1)
> > >              13 INPLACE_ADD
> > >              14 ROT_THREE
> > >              15 STORE_SUBSCR
> > >              16 LOAD_CONST               0 (None)
> > >              19 RETURN_VALUE
>
> > ... I don't see anything in there that retrieves the value a second time....
>
> STORE_SUBSCR has to look up the position in the hash table to store
> the value, hence the second lookup.
>
> > > > > As a workaround, if lookups are expensive,
>
> > > > But they are not. Because (C)Python is heavily based on dict name lookup
> > > > for builtins and global names and attributes, as well as overt dict
> > > > lookup, must effort has gone into optimizing dict lookup.
>
> > > The actual lookup algorithm Python dicts use is well-optimized, yes,
> > > but the dict could contain keys that have expensive comparison and
> > > hash-code calculation, in which case lookup is going to be slow.
>
> > I'll like the originator correct me if I've made a mistake, but I read
> > "lookup" as actually meaning "lookup", not "value-comparison".
>
> This has nothing to do with value comparison.  I was talking about key
> comparison, which happens when looking up a position in a hash table.
> I was the first person to use the word "lookup" in this thread and I
> specifically meant hash-table position lookup.
>
> > At least in part because the question, as it was posed, specifically
> > related to a wrapper-class (providing a mapping ("dict like") interface)
> > around a database of some sort other than Python's dict class per se.
>
> > How do the details of Python's native dict-type's internal (hashtable)
> > algorithm matter when they're explicitly /not/ being used?
>
> Well it doesn't apply specifically to the OP's problem.  I changed the
> topic a bit by making it specific to dicts.  Is that ok with you?  Was
> that not allowed?
>
> The OP can add a method like apply_to_value to his own class, but one
> can't do that for dicts.  Ergo why something like apply_to_value()
> would be useful enough in rare circumstances where lookup is very slow
> to merit a moments consideration before being rejected.
>
> (If dict did have a method like that, the OP would at least know which
> method to override.)
>
> Carl Banks

First of all I'd like to say thanks for this discussion, you guys are
awesome.

I probably should have explained my problem better to begin with and I
apologize for that. So now I'll start from the top:

I made a DictMixin where the keys are filenames and the values are the
file contents. It was very simple and easy to do thanks to DictMixin.

For example this code writes "abc" in a file named "temp.txt" and
prints the contents of the file named "swallow", these files are
looked up/created/deleted in the directory "spam":
>>> d = FilesDict('spam')
>>> d['temp.txt'] = 'abc'
>>> print(d['swallow'])

This was very convenient for me because I wanted to use a simple DB
which could be read and edited by shell scripts and non-pythonistas,
without the heavy ORM. Also, if in the future an online full featured
DB would be needed, I could easily convert the DictMixin methods.  So
up to here I had a good solution.

My problem arose when I wanted to append a string to a file which
using open(..., 'ab') would have been miles more efficient because I
wouldn't have to read the entire file (__getitem__) and then write the
entire file back (__setitem__). The files are expected to be as big as
600 KB which will be appended 30 bytes at a time about 3 times a
second. Performance-wise the system would probably work without open
(..., 'ab') but it would be a real thrashing so the current solution
uses a method "AddTo" as Robert suggested, sacrificing the neat
getitem/setitem syntax.

Just so I don't leave out any information, I actually also made a
DirectoryDict for having multiple 'tables'. In this DictMixin, keys
are directory names and values are FilesDict instances. So to write
"European or African" to the file "/root/tmp/velocity" one would be
just:
>>> d = DirectoryDict("/root")
>>> d["tmp"]["velocity"] = "European or African"

So now I hope it's clearer why and how I wanted a special
__item_iadd__ for the dictionary syntax,

RunThePun



More information about the Python-list mailing list