How do I do this in Python 3 (string.join())?
Chris Green
cl at isbd.net
Thu Aug 27 09:36:01 EDT 2020
Cameron Simpson <cs at cskk.id.au> wrote:
> On 27Aug2020 09:16, Chris Green <cl at isbd.net> wrote:
> >Cameron Simpson <cs at cskk.id.au> wrote:
> >> But note: joining bytes like strings is uncommon, and may indicate
> >> that
> >> you should be working in strings to start with. Eg you may want to
> >> convert popmsg from bytes to str and do a str.join anyway. It depends on
> >> exactly what you're dealing with: are you doing text work, or are you
> >> doing "binary data" work?
> >>
> >> I know many network protocols are "bytes-as-text, but that is
> >> accomplished by implying an encoding of the text, eg as ASCII, where
> >> characters all fit in single bytes/octets.
> >>
> >Yes, I realise that making everything a string before I start might be
> >the 'right' way to do things but one is a bit limited by what the mail
> >handling modules in Python provide.
>
> I do ok, though most of my message processing happens to messages
> already landed in my "spool" Maildir by getmail. My setup uses getmail
> to get messages with POP into a single Maildir, and then I process the
> message files from there.
>
Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
on my desktop machine which stays on permanently.
The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider. It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.
> >E.g. in this case the only (well the only ready made) way to get a
> >POP3 message is using poplib and this just gives you a list of lines
> >made up of "bytes as text" :-
> >
> > popmsg = pop3.retr(i+1)
>
> Ok, so you have bytes? You need to know.
>
The documentation says (and it's exactly the same for Python 2 and
Python 3):-
POP3.retr(which)
Retrieve whole message number which, and set its seen flag. Result
is in form (response, ['line', ...], octets).
Which isn't amazingly explicit unless 'line' implies a string.
> >I join the lines to feed them into mailbox.mbox() to create a mbox I
> >can analyse and also a message which can be sent using SMTP.
> >
> >Should I be converting to string somewhere?
>
> I have not used poplib, but the Python email modules have a BytesParser,
> which gets you a Message object; I would feed the poplib bytes to that
> to parse the received message. A Message object can then be transcribed
> as text via its .as_string method. Or you can do other things with it.
>
> I think my main points are:
>
> - know whether you're using bytes (uninterpreted data) or text (strings
> of _characters_); treating bytes _as_ text implies an encoding, and
> when that assumption is incorrect you get mojibake[1]
>
> - look at the email modules' parsers, which return Messages, a
> representation of the message in a structure (so that MIME subparts
> etc are correctly broken out, and the character sets are _known_, post
> parse)
OK, thanks Cameron.
--
Chris Green
·
More information about the Python-list
mailing list