Storing a big amount of path names
Rob Gaddi
rgaddi at highlandtechnology.invalid
Thu Feb 11 21:17:49 EST 2016
Tim Chase wrote:
> On 2016-02-12 00:31, Paulo da Silva wrote:
>> What is the best (shortest memory usage) way to store lots of
>> pathnames in memory where:
>>
>> 1. Path names are pathname=(dirname,filename)
>> 2. There many different dirnames but much less than pathnames
>> 3. dirnames have in general many chars
>>
>> The idea is to share the common dirnames.
>
> Well, you can create a dict that has dirname->list(filenames) which
> will reduce the dirname to a single instance. You could store that
> dict in the class, shared by all of the instances, though that starts
> to pick up a code-smell.
>
> But unless you're talking about an obscenely large number of
> dirnames & filenames, or a severely resource-limited machine, just
> use the default built-ins. If you start to push the boundaries of
> system resources, then I'd try the "anydbm" module or use the
> "shelve" module to marshal them out to disk. Finally, you *could*
> create an actual sqlite database on disk if size really does exceed
> reasonable system specs.
>
> -tkc
>
Probably more memory efficient to make a list of lists, and just declare
that element[0] of each list is the dirname. That way you're not
wasting memory on the unused entryies of the hashtable.
But unless the OP has both a) plus of a million entries and b) let's say
at least 20 filenames to each dirname, it's not worth doing.
Now, if you do really have a million entries, one thing that would help
with memory is setting __slots__ for MyFile rather than letting it
create an instance dictionary for each one.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
More information about the Python-list
mailing list