Serhiy Storchaka writes:
I think representing bytes as an array of ints was good decision. If you need indexing to return a substring, you should use str instead. It is as well memory efficient thanks to PEP 393.
You can do this by using latin-1 as the codec, but that's pretty unpleasant, because of the risk of combining with another str and getting mojibake. I have long thought that it would be interesting to have a codec and an extension to PEP 393 that gives "asciibytes" behavior. That is, the codec simply slops the bytes into the 8-bit storage of a string, but when joined with another string the result types are: asciibytes other arg result has 8bit type type yes pure ascii asciibytes yes asciibytes asciibytes yes other str str with 8bit bytes from asciibytes encoded as PEP 383 surrogateescape (note: promotes latin1 to 2-byte-wide) no whatever whatever I think Nick actually had a module that worked pretty much like this, but he never pushed it. I've never had time to reason out the possible failure modes, though, or the performance issues. And it's not an itch I personally need to scratch. I believe (but haven't proved) that the failure modes with the above operation table are the same as for str containing PEP 383 surrogates. I'm not sure what other issues you might run into. Also, I'm not sure it's reasonable to have an asciibytes with no 8bit bytes. Steve