Suggesting feature for python - Big Set, Big Dictionary

Hi, (This is great language, sincere thanks to all!) I took a quick glance of https://devguide.python.org/langchanges/ (20.3. Suggesting new features and language changes), while writing this post. Please ignore this, if similar proposals are discussed earlier. Suggestion: 1. Big Set, Big Dictionary - During design time, i am considering this question. With limited RAM (example 4 GB), how can I store a dictionary/set of size greater than RAM. Is it possible to introduce Big Set / Big Dictionary which stores data in hard disk (based on some partition mechanism - like big data)? (i.e.Big Set/Big Dictionary with set access mechanism identical to current set and dictionary). Background: I am working on machine learning NLP and would like to use such feature in data generator. Regards, Vijay <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Mar 24, 2021 at 5:35 AM Vijay Patel <vijay96238@gmail.com> wrote:
How would it be stored on disk? Would it be in some sort of database? If so, check out a database access module, and maybe some sort of access layer that gives you a set-like or dict-like interface. Flat files? It probably exists on PyPI, but you could easily make a set-like or dict-like interface to something that uses simple file storage. Ultimately, it's going to be a lot of tradeoffs that can't be made as part of the language, as they're heavily tied to the exact way you're going to use this. ChrisA

23.03.21 07:55, Vijay Patel пише:
Look at the shelve module. It provides a dict-like class with permanent storage. It can be used to work with data larger than the RAM volume if you sync the cache at times. It has limitations: keys can only be strings. It would be not easier to support arbitrary objects as keys because the pickle protocol does not guarantee that equal values are serialized to the same data. But if you need to work with non-string keys, you can write your own wrapper which serializes your keys to strings. In any case you can take that implementation as an example and write your own implementation which suits your needs. I do not think it will be added in the stdlib, as it looks pretty niche solution, but you can publish it on PyPI.

On Wed, Mar 24, 2021 at 5:35 AM Vijay Patel <vijay96238@gmail.com> wrote:
How would it be stored on disk? Would it be in some sort of database? If so, check out a database access module, and maybe some sort of access layer that gives you a set-like or dict-like interface. Flat files? It probably exists on PyPI, but you could easily make a set-like or dict-like interface to something that uses simple file storage. Ultimately, it's going to be a lot of tradeoffs that can't be made as part of the language, as they're heavily tied to the exact way you're going to use this. ChrisA

23.03.21 07:55, Vijay Patel пише:
Look at the shelve module. It provides a dict-like class with permanent storage. It can be used to work with data larger than the RAM volume if you sync the cache at times. It has limitations: keys can only be strings. It would be not easier to support arbitrary objects as keys because the pickle protocol does not guarantee that equal values are serialized to the same data. But if you need to work with non-string keys, you can write your own wrapper which serializes your keys to strings. In any case you can take that implementation as an example and write your own implementation which suits your needs. I do not think it will be added in the stdlib, as it looks pretty niche solution, but you can publish it on PyPI.
participants (4)
-
Chris Angelico
-
Greg Ewing
-
Serhiy Storchaka
-
Vijay Patel