Python 3 how to convert a list of bytes objects to a list of strings?
Chris Green
cl at isbd.net
Fri Aug 28 07:26:07 EDT 2020
Cameron Simpson <cs at cskk.id.au> wrote:
> On 28Aug2020 08:56, Chris Green <cl at isbd.net> wrote:
> >Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> >> Chris Angelico <rosuav at gmail.com> writes:
> >> >But this is a really good job for a list comprehension:
> >> >sss = [str(word) for word in bbb]
> >>
> >> Are you all sure that "str" is really what you all want?
> >>
> >Not absolutely, you no doubt have been following other threads related
> >to this one. :-)
>
> It is almost certainly not what you want. You want some flavour of
> bytes.decode. If the BytesParser doesn't cope, you may need to parse the
> headers as some kind of text (eg ISO8859-1) until you find a
> content-transfer-encoding header (which still applies only to the body,
> not the headers).
>
> >> |>>> b = b"b"
> >> |>>> str( b )
> >> |"b'b'"
> >>
> >> Maybe try to /decode/ the bytes?
> >>
> >> |>>> b.decode( "ASCII" )
> >> |'b'
> >>
> >>
> >Therein lies the problem, the incoming byte stream *isn't* ASCII, it's
> >an E-Mail message which may, for example, have UTF-8 or other encoded
> >characters in it. Hopefully it will have an encoding given in the
> >header but that's only if the sender is 'well behaved', one needs to
> >be able to handle almost anything and it must be done without 'manual'
> >interaction.
>
> POP3 is presumably handing you bytes containing a message. If the Python
> email.BytesParser doesn't handle it, stash the raw bytes _elsewhere_ in
> a distinct file in some directory.
>
> with open('evil_msg_bytes', 'wb') as f:
> for bs in bbb:
> f.write(bs)
>
> No interpreation requires, since parsing failed. Then you can start
> dealing with these exceptions. _Do not_ write unparsable messages into
> an mbox!
>
Maybe I shouldn't but Python 2 has been managing to do so for several
years without any issues. I know I *could* put the exceptions in a
bucket somewhere and deal with them separately but I'd really rather
not.
At prsent (with the Python 2 code still installed) it all 'just works'
and the absolute worst corruption I ever see in an E-Mail is things
like accented characters missing altogether or £ signs coming out as a
funny looking string. Either of these don't really make the message
unintelligible.
Are we saying that Python 3 really can't be made to handle things
'tolerantly' like Python 2 used to?
--
Chris Green
·
More information about the Python-list
mailing list