How to turn a string into a list of integers?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Sep 7 20:00:32 CEST 2014


Roy Smith wrote:

> In article <mailman.13850.1410102277.18130.python-list at python.org>,
>  Chris Angelico <rosuav at gmail.com> wrote:
> 
>> You can't store a list in memory; what you store is a set of bits
>> which represent some metadata and a bunch of pointers.
> 
> 
> Well, technically, what you store is something which has the right
> behavior.  If I wrote:

No. Chris is right: what you store is bits, because that is how memory in
modern hardware works. Some old Soviet era mainframes used trits, trinary
digits, instead of bits, but that's just a footnote in the history of
computing. We're talking implementation here, not interface. The
implementation of a list in memory is bits, not trits, nor holograms,
fragments of DNA, or nanometre-sized carbon rods. Some day there may be
computers with implementations not based on bits, but this is not that day.

[Aside: technically, very few if any modern memory chips support direct
access to individual bits, only to words, which these days are usually
8-bit bytes. At the hardware level, the bits may not even be implemented as
individual on/off two-state switches. So technically we should be talking
about bytes rather than bits, since that's the interface which the memory
chip provides. What it does internally is up to the chip designer.]


> my_huffman_coded_list = [0] * 1000000
> 
> I don't know of anything which requires Python to actually generate a
> million 0's and store them somewhere (even ignoring interning of
> integers).  

That's a tricky question. There's not a lot of wiggle-room for what a
compliant implementation of Python can do here, but there is some. The
language requires that:

- the expression must generate a genuine built-in list of length 
  exactly one million;

- even if you monkey-patch builtins and replace list with something
  else, you still get a genuine list, not the monkey-patched version;

- all million items must refer to the same instance;

- regardless of whether that instance is cached (like 0) or not;

- reading and writing to any index must be O(1);

- although I guess it would be acceptable if that was O(1) amortised.

(Not all of these requirements are documented per se, some are taken from
the reference implementation, CPython, and some from comments made by
Guido, e.g. he has stated that he would consider it unacceptable for a
Python implementation to use a linked list instead of an array for lists,
because of the performance implementations.)

Beyond that, an implementation could do anything it likes. If the
implementer can come up with some clever way to have [0]*1000000 use less
memory while not compromising the above, they can do so.


> As long as it generated an object (perhaps a subclass of 
> list) which responded to all of list's methods the same way a real list
> would, it could certainly build a more compact representation.

However a subclass of list would not be acceptable. 


-- 
Steven




More information about the Python-list mailing list