On 4/9/21 5:55 AM, Mark Dale via Mailman-Users wrote:
In the archive's downloaded .txt (and also .gz) file, the non-ascii characters are missing and displayed as "?".
Any advice on getting the non-ascii characters written into the archive .txt file would be gratefully received.
The message is prepared for the .txt file by the Article.as_text()
method in HyperArch.py
In order to do the email address obfuscation in the message body,
whether or not ARCHIVER_OBSCURES_EMAILADDRS is True, the method first
converts the body to unicode using the charset of the list's language
and then after possible obfuscation, converts it back, again using the
charset of the list's language. Both these conversions use
errors=replace which replaces any characters not in the charset with,
in the case of ascii,
One way to avoid this replacement would be to change the charset for English from ascii to utf-8. See https://wiki.list.org/x/15958250.
This isn't a complete solution in the case where the non-ascii
characters are encoded other than
iso-8859-1, in the
original message, but will probably handle most cases
-- Mark Sapiro email@example.com The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan