Data-structure for multiway associativity in Python
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Jan 28 09:42:43 EST 2018
On Sat, 27 Jan 2018 10:01:47 -0800, qrious wrote:
> I need a data structure and a corresponding (hopefully fast) mechanism
> associated with it to do the following. While I am looking for the
> concept first, my preference for implementation of this will be in
> Python.
>
> [c1, c2,..., cn] is a list of strings (for my own implementation, but
> could be any other type for the generic problem). There will be many
> hundreds, if not thousands, of such lists with no shared member.
>
> The method getAssocList(e) will return lists of the lists for which e is
> an element.
Since you specified that there are no lists with shared members, why
bother returning a list of lists? There will only ever be a single
matching list.
data = [
[c1, c2, ..., cn],
[d1, d2, ..., dn],
# hundreds more...
[z1, z2, ..., zn],
]
def getAssocList(element):
for L in data:
if element in L:
return L
raise ValueError('not found')
For faster results, use sets {c1, c2, ..., cn} rather than lists. The
outer list still has to be a list though.
To speed it up more, we'd need to know more information about how you are
using this. For example, if the values c1, ... d1, ... etc have some sort
of relationship, you might be able to generate some kind of multiway tree
that avoids having to search all of the thousands of lists before giving
up.
Are searches going to typically hit the same set c1... over and over
again? If so, then after matching, bring it to the front of the master
list. (You might want to use a deque for that.)
> Here a hash may be a way to go, but need help in figuring it out. Also,
> can there be a faster and more memory efficient solution other than
> hashes?
Probably not, not unless you have some sort of internal structure you can
take advantage of. For example, if all the strings in any group start
with the same letter, then you can dispatch directly to the right list:
data = {'c': {c1, c2, c3, ..., cn},
'd': {d1, ... } # and so on...
}
def getAssocList(element):
if element in data[element[0]]:
return L
raise ValueError('not found')
But if there is no structure at all, and the elements in each list are
completely arbitrary, then there is nothing better than a linear search
through the entire collection of lists, looking at every single element
until you've matched what you're after.
But your description is pretty vague and for all I know what you actually
want is already a solved problem. Can you give more detail and cut-down
example of your data set? Say, two or three values per list, and two or
three lists.
--
Steve
More information about the Python-list
mailing list