[Python-Dev] Python3 "complexity"
chris.barker at noaa.gov
Fri Jan 10 00:53:35 CET 2014
On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Sorry, I was too short with my example. My use case is binary files, with
> ASCII metadata and binary metadata, as well as ASCII-encoded numeric
> values, binary-coded numeric values, ASCII-encoded boolean values, and
> who-knows-what-(before checking the in-band metadata)-encoded text. I have
> to process all of it, and before we say "It's just a documentation issue" I
> want to make sure it /is/ just a documentation issue.
As I am coming to understand it -- yes, using latin-1 would let you work
with all that. You could decode the binary data using latin-1, which would
give you a unicode object, which would:
1) act like ascii for ascii values, for the normal string operations,
search, replace, etc, etc...
2) have a 1:1 mapping of indexes to bytes in the original.
3) be not-too-bad for memory and other performance (as I understand it py3
now has a cool unicode implementation that does not waste a lot of bytes
for low codepoints)
4) would preserve the binary data that was not directly touched.
Though you'd still have to encode() to bytes to get chunks that could be
used as binary -- i.e. passed to the struct module, or to a frombytes() or
frombuffer() method of say numpy, or PIL or something...
But I'm no expert....
> Python-Dev mailing list
> Python-Dev at python.org
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev