[Mailman-Users] Re: What is character set of the log files?

June 1, 2020 · *not*


      Bernd Petrovitsch writes:
...
On 27/05/2020 01:08, Mark Sapiro wrote:
...
...
Basically unknown. For the most part, log files are us-ascii, but some
entries contain user entered data such as names or (malformed) email
If the user enters his name in a HTML form with e.g. German umlauts,
it should be UTF-8 encoded, not?
That depends on a lot of things.  "Should," yes, but things don't
always turn out that way even if both server and client intend to be
"reasonable".
But usually the most "interesting" messages in the logs are when
behavior is *not* reasonable!  There's no way to enforce reasonable
behavior on the client.  Spammers and teen-age wannabe programmers are
not known for respecting RFCs; if they're accessing servers with
scripts, anything could happen through carelessness.  And actively
malicious actors might be trying to exploit a vulnerability by
declaring one thing in the header and sending something else in the
body.
So it is an interesting question how to specify that encoding.  It
depends on several components of the system.
Of course Mailman could decide to provide a printable UTF-8
representation of anything that's not UTF-8, marking it somehow, but
since all of our message components are ASCII, is it worth going to
that trouble, complicating and slowing down logging?

[Mailman-Users] Re: What is character set of the log files?

Stephen J. Turnbull