What is character set of the log files?
Hi!
What is the character set coding for the log files, please? I'm using MM 2.1.29
I'm asking because a list admin want to view them via a web browser and I need to config Apache.
Thanks.
Brett
On 5/26/20 4:30 PM, Brett Delmage wrote:
Hi!
What is the character set coding for the log files, please? I'm using MM 2.1.29
Basically unknown. For the most part, log files are us-ascii, but some entries contain user entered data such as names or (malformed) email addresses that might contain non-ascii and might be encoded in the character set of the list's preferred language or something else.
I'm asking because a list admin want to view them via a web browser and I need to config Apache.
I have done this with nothing special in Apache. I just put a symlink to the respective log(s) in a browser accessible place giving it a .txt extension, e.g. vette.txt -> /var/lib/mailman/logs/vette.log.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Tue, 26 May 2020, Mark Sapiro wrote:
What is the character set coding for the log files, please? I'm using MM 2.1.29
Basically unknown. For the most part, log files are us-ascii, but some entries contain user entered data such as names or (malformed) email addresses that might contain non-ascii and might be encoded in the character set of the list's preferred language or something else.
Thanks. I thought it might be something like that. It's not critical for me, as the list admin / mdoerator just needs to look at logs sometimes for possible troublshooting. If the log files don't all render perfectly it's not a major issue.
A few weeks ago his MSP started bouncing his list emails. He noticed after he was suspended for excessive bounces. While the problem could not (or would not?) be explained, I suspect he tagged a list "via" message (or many) as spam, causing his MSP to block the list address. But who knows.
I have done this with nothing special in Apache. I just put a symlink to the respective log(s) in a browser accessible place giving it a .txt extension, e.g. vette.txt -> /var/lib/mailman/logs/vette.log.
Thanks for the tip.
Prior to my inquiry, I ended up with this Apache config stanza to permit viewing all the log files using Apache autoindex, allowing Apache and the web browser to do the work.
Adding "AddEncoding x-gzip .gz" allows direct viewing of the compressed (rotated) log files in the log directory. Adding "ForceType text/plain" avoids the need for the .txt extension for MIME setting. I was also going to set the charset, but being as it's random... :-)
# Mailman logs Alias /logs /var/log/mailman/ <Directory /var/log/mailman/> AuthType Basic AuthName "Mailman admins" AuthUserFile "/home/.../passwd" Require valid-user
# ENable autoindex!
DirectoryIndex disabled
AddEncoding x-gzip .gz
ForceType text/plain
Options FollowSymlinks Indexes
AllowOverride None
</Directory>
I also created a subdirctory "latest" in /logs with symlinks to the current versions of the log files. Hmm, I think I could probably do a virtual location in Apache itself for this.
Thanks for your incredible support for Mailman users over the past 20 years, Mark! I posted this message, then told me partner (another list and Linux user for 20 years) that I expected you might be responding at dinner -- and here you are!
Brett
Mark Sapiro writes:
On 5/26/20 4:30 PM, Brett Delmage wrote:
Hi!
What is the character set coding for the log files, please? I'm using MM 2.1.29
Basically unknown. For the most part, log files are us-ascii,
I would consider declaring ISO-8859-1, ISO-8859-15, or Windows-1252. All contain mappings for all 256 octets, so you will never get a decoding error. US users may prefer Windows-1252, since it's the main 8-bit encoding for them which has "smart quotes" and the like, and the 8-bit control character area is mapped to graphic characters which are less likely to upset terminals. ISO-8859-15 may be more popular in Western Europe since it is a variant of ISO-8859-1 with the EURO SIGN.
Steve
Hi!
On 27/05/2020 01:08, Mark Sapiro wrote:
On 5/26/20 4:30 PM, Brett Delmage wrote: [...]
What is the character set coding for the log files, please? I'm using MM 2.1.29
Basically unknown. For the most part, log files are us-ascii, but some entries contain user entered data such as names or (malformed) email
If the user enters his name in a HTML form with e.g. German umlauts, it should be UTF-8 encoded, not?
MfG, Bernd
There is no cloud, just other people computers. -- https://static.fsf.org/nosvn/stickers/thereisnocloud.svg
Bernd Petrovitsch writes:
On 27/05/2020 01:08, Mark Sapiro wrote:
Basically unknown. For the most part, log files are us-ascii, but some entries contain user entered data such as names or (malformed) email
If the user enters his name in a HTML form with e.g. German umlauts, it should be UTF-8 encoded, not?
That depends on a lot of things. "Should," yes, but things don't always turn out that way even if both server and client intend to be "reasonable".
But usually the most "interesting" messages in the logs are when behavior is *not* reasonable! There's no way to enforce reasonable behavior on the client. Spammers and teen-age wannabe programmers are not known for respecting RFCs; if they're accessing servers with scripts, anything could happen through carelessness. And actively malicious actors might be trying to exploit a vulnerability by declaring one thing in the header and sending something else in the body.
So it is an interesting question how to specify that encoding. It depends on several components of the system.
Of course Mailman could decide to provide a printable UTF-8 representation of anything that's not UTF-8, marking it somehow, but since all of our message components are ASCII, is it worth going to that trouble, complicating and slowing down logging?
participants (5)
-
Bernd Petrovitsch
-
Brett Delmage
-
Brett Delmage
-
Mark Sapiro
-
Stephen J. Turnbull