How to manage accented characters in mail header?
Chris Green
cl at isbd.net
Sat Jan 4 14:07:57 EST 2025
Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> Chris Green <cl at isbd.net> wrote or quoted:
> >From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon at amvs.fr>
>
> In Python, when you roll with decode_header from the email.header
> module, it spits out a list of parts, where each part is like
> a tuple of (decoded string, charset). To smash these decoded
> sections into one string, you’ll want to loop through the list,
> decode each piece (if it needs it), and then throw them together.
> Here’s a straightforward example of how to pull this off:
>
> from email.header import decode_header
>
> # Example header
> header_example = \
> 'From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon at amvs.fr>'
>
> # Decode the header
> decoded_parts = decode_header(header_example)
>
> # Kick off an empty list for the decoded strings
> decoded_strings = []
>
> for part, charset in decoded_parts:
> if isinstance(part, bytes):
> # Decode the bytes to a string using the charset
> decoded_string = part.decode(charset or 'utf-8')
> else:
> # If it’s already a string, just roll with it
> decoded_string = part
> decoded_strings.append(decoded_string)
>
> # Join the parts into a single string
> final_string = ''.join(decoded_strings)
>
> print(final_string)# From: Sébastien Crignon <sebastien.crignon at amvs.fr>
>
> Breakdown
>
> decode_header(header_example): This line takes your email header
> and breaks it down into a list of tuples.
>
> Looping through decoded_parts: You check if each part is in
> bytes. If it is, you decode it using whatever charset it’s
> got (defaulting to 'utf-8' if it’s a little vague).
>
> Appending Decoded Strings: You toss each decoded part into a list.
>
> Joining Strings: Finally, you use ''.join(decoded_strings) to glue
> all the decoded strings into a single, coherent piece.
>
> Just a Heads Up
>
> Keep an eye out for cases where the charset might be None. In those
> moments, it’s smart to fall back to 'utf-8' or something safe.
>
Thanks, I think! :-)
Is there a simple[r] way to extract just the 'real' address between
the <>, that's all I actually need. I think it has the be the last
chunk of the From: doesn't it?
--
Chris Green
·
More information about the Python-list
mailing list