Python usage numbers
roy at panix.com
Mon Feb 13 02:11:04 CET 2012
In article <mailman.5750.1329094801.27778.python-list at python.org>,
Chris Angelico <rosuav at gmail.com> wrote:
> The advantage, though, is that you can always know how many bytes to
> read for X characters. In ASCII, you allocate 80 bytes of storage and
> you can store 80 characters. In UTF-8, if you want an 80-character
> buffer, you can probably get away with allocating 240 characters...
> but maybe not. In UTF-32, it's easy - just allocate 320 bytes and you
> know you can store them. Also, you know exactly where the 17th
> character is; in UTF-8, you have to count. That's a huge advantage for
> in-memory strings; but is it useful on disk, where (as likely as not)
> you're actually looking for lines, which you still have to scan for?
> I'm thinking not, so it makes sense to use a smaller disk image than
> UTF-32 - less total bytes means less sectors to read/write, which
> translates fairly directly into performance.
You might just write files compressed. My guess is that a typical
gzipped UTF-32 text file will be smaller than the same data stored as
More information about the Python-list