Disable automatic interning

George Sakkis george.sakkis at gmail.com
Thu Mar 19 02:19:21 CET 2009

On Mar 18, 4:06 pm, Daniel Fetchinson <fetchin... at googlemail.com>

> > I'm working on some graph generation problem where the node identity
> > is significant (e.g. "if node1 is node2: # do something) but ideally I
> > wouldn't want to impose any constraint on what a node is (i.e. require
> > a base Node class). It's not a show stopper, but it would be
> > problematic if something broke when nodes happen to be (small)
> > integers or strings.
> But if two different nodes are both identified by, let's say the
> string 'x' then you surely consider this an error anyway, don't you?
> What's the point of identifying two different nodes by the same
> string?

In this particular problem the graph represents web surfing behavior
and in the simplest case the nodes are plain URLs. Now suppose a
session log has recorded the URL sequence [u1, u2, u1]. There are two
scenarios for the second occurrence of u1: it's either caused by a
forward action (e.g. clicking on a link to u1 from page u2) or a back
action (i.e. the user clicked the back button). If this information is
available, it makes sense to differentiate them. One way to do so is
to represent the result of every forward action with a brand-new node
and the result of a back action with an existing node. So even though
the state of the two occurrences of u1 are the same, they are not
necessarily represented by a single node.

If it was always possible to make a copy of  a string instance (say,
with a str.new() classmethod), then it would be sufficient to pass "map
(str.new, session_urls)" to the graph generator. Equality would still
work as before but all instances in the sequence would be guaranteed
to be unique. Thankfully, as Martin mentioned, this is easy even
without str.new(), simply by wrapping each url in an instance of a
small Node class.


More information about the Python-list mailing list