string storage [was: Re: imaplib: is this really so unwieldy?]
Alan Gauld
alan.gauld at yahoo.co.uk
Wed May 26 19:17:10 EDT 2021
On 26/05/2021 22:15, Tim Chase wrote:
> If you don't decode it upon reading it in, it should still be 100MB
> because it's a stream of encoded bytes.
I usually convert them to utf8.
> You don't specify what you then do with this humongous string,
Mainly I search for regex patterns which can span multiple lines.
I could chunk it up if memory was an issue but a single read is
just more convenient. Up until now it hasn't been an issue and
to be honest I don't often hit multi-byte characters so mostly
it will be single byte character strings.
They are mostly research papers and such from my university days
written on a Commodore PET and various early DOS computers with
weird long-lost word processors. Over the years they've been
exported/converted/reimported and then re-xported several times.
A very few have embedded text or "graphics"/equations which might
have some unicode characters but its not a big issue for me in practice.
I was more just thinking of the kinds of scenario where big strings
might become a problem if suddenly consuming 4x the storage
you expect.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Python-list
mailing list