dictionary comparison

Mon Aug 14 06:31:39 EDT 2000

In article <8n0tsh01lnv at news2.newsguy.com>,
  "Alex Martelli" <alex at magenta.com> wrote:
> <theoryboy at my-deja.com> wrote in message
news:8n0ebs$65n$1 at nnrp1.deja.com...
>     [snip]
> >
> > I will of course get duplicate entries. I could simply check to see
if
> > the items have already been added, but this still means n^2
comparisons,
> > not sum to n which is all I need.
>
> OK, problem is clearer now, I think.  Except that I fail to see the
fact
> that you think you only need n comparisons.  (n*n)/2, yes.  But, n?
>

No, you're right, O(n^2). I meant (n*n)/2 when I said "sum to N"

>
> Yes, the fact that you can save the list-of-keys once and for all:
>
> keys=FSA.keys()
> for istate1 in range(len(keys)-1):
>     state1=keys[istate1]
>     for istate2 in range(istate1+1, len(keys)):
>         state2=keys[istate2]
>         if FSA[state1]==FSA[state2]:
>             keep_list.append(state1)
>             remove_list.append(state2)
>
> so you're basically still operating with a list -- that of the keys.
>

I actually came up with another solution which involved taking a list of
keys at the start, using this for the inner loop, and removing items
from this list as they occur in the outer loop ie:

states = FSA.keys()
for i in FSA.keys():
	states.remove(i)
	for j in states:
		...

although I suspect your solution will be more efficient than searching
and removing from a dictionary.

[snip]
> I.e., suppose FSA has 3 states a,b,c, all with identical lists
> of transitions, which happen to come in this order.  Now with
> your code, keeplist would be [a,b] and removelist [b,c,c].  I
> think you'd want a keeplist of [a] and a removelist of [b,c],
> right?  In this case, here's a fleshed-out approach...:
>
> keys=FSA.keys()
> keepdict={}
> removedict={}
>
> for istate1 in range(len(keys)-1):
>     state1=keys[istate1]
>     if removedict.has_key(state1): continue
>     for istate2 in range(istate1+1, len(keys)):
>         state2=keys[istate2]
>         if FSA[state1]==FSA[state2]:
>             keepdict[state1]=1
>             removedict[state2]=1
>
> keeplist = keepdict.keys()
> removelist = removedict.keys()
>
> If state1 is in the removedict, so also are any other
> 'following' states whose transitionlists might be
> identical with it (they've been placed there on the
> same outerloop as state1 was); so we need check
> no more.

This is indeed a much more elegant solution than my current one, but I
am now faced with the problem that in fact simply a remove list is not
enough information to trim the FSA. The items from the remove list
cannot simply be excised, since any other state that might happen to be
pointing at them must be rerouted to one of the states in the keep list
which is equivalent to it. For example if states a and b are equivalent,
a will be added to the keep list and b to the remove list. However, when
b comes to be removed a further state d, which contains a transition to
b, must have this entry changed to instead indicate a transition to the
equivalent a.

At the moment, after every match I iterate through the entire structure
rerouting the transitions, but the FSAs often contain a large number of
identical states, and to remove them all and reroute in one go would be
preferable. If I store with the removal list a state to which
transitions to the now redundant state should be rerouted, how can I be
sure the rerouting point will not itself be a member of the list to
remove without excessive book-keeping?

If I can make this approach efficient, a number of iterations of it will
implement a DFA reduction algorithm without the more involved process
described by Hopcroft and Ullman - I have already implemented a subset
construction routine for NFA-DFA conversion.

Thanks for your help so far,

Peter

(apologies for mixed name posting - I'm not at my regular terminal for a
few months)

Sent via Deja.com http://www.deja.com/
Before you buy.