
Bernd Petrovitsch writes:
On 27/05/2020 01:08, Mark Sapiro wrote:
Basically unknown. For the most part, log files are us-ascii, but some entries contain user entered data such as names or (malformed) email
If the user enters his name in a HTML form with e.g. German umlauts, it should be UTF-8 encoded, not?
That depends on a lot of things. "Should," yes, but things don't always turn out that way even if both server and client intend to be "reasonable".
But usually the most "interesting" messages in the logs are when behavior is *not* reasonable! There's no way to enforce reasonable behavior on the client. Spammers and teen-age wannabe programmers are not known for respecting RFCs; if they're accessing servers with scripts, anything could happen through carelessness. And actively malicious actors might be trying to exploit a vulnerability by declaring one thing in the header and sending something else in the body.
So it is an interesting question how to specify that encoding. It depends on several components of the system.
Of course Mailman could decide to provide a printable UTF-8 representation of anything that's not UTF-8, marking it somehow, but since all of our message components are ASCII, is it worth going to that trouble, complicating and slowing down logging?