Re: [Mailman-Developers] [Mailman-checkins] SF.net SVN: mailman: [7858] trunk/mailman
Hi Barry,
I had some hours playing with the new svn trunk. (I was a little bit busy because our academic year begins April.)
bwarsaw@users.sourceforge.net wrote:
Revision: 7858 Author: bwarsaw Date: 2006-04-16 21:08:17 -0700 (Sun, 16 Apr 2006) ViewCVS: http://svn.sourceforge.net/mailman/?rev=7858&view=rev
Log Message:
- Convert all logging to Python's standard logging module. Get rid of all traces of our crufty old Syslog. Most of this work was purely mechanical, except for:
I get this for a fresh install of svn trunk. You may have old install remained, if you haven't experienced this.
% bin/mailmanctl start Traceback (most recent call last): File "bin/mailmanctl", line 112, in ? from Mailman.Logging.Syslog import syslog ImportError: No module named Logging.Syslog
Also, if you send SIGHUP to reopen the logs, only the last reopen messages is recorded because each runners try to reopen the log file. We may have to restart qrunners if mailmanctl receive SIGHUP and it has started new log files. We may also utilize the backupCount feature for log rotation (intruducing LOG_BACKUP_COUNT in Defaults.py).
Initializing the loggers. For this, there's a new module Mailman/loginit.py (yes all modules from now on will use PEP 8 names). We can't call this 'logging.py' because that will interfere with importing the stdlib module of the same name (can you say Python 2.5 and absolute imports?).
If you want to write log messages both to the log file and to stderr, pass True to loginit.initialize(). This will turn on propagation of log messages to the parent 'mailman' logger, which is set up to print to stderr. This is how bin/qrunner works when not running as a subprocess of mailmanctl.
The driver script. I had to untwist the StampedLogger stuff and implement differently printing exceptions and such to log/error because standard logging objects don't have a write() method. So we write to a cStringIO and then pass that to the logger.
SMTPDirect.py because of the configurability of the log messages. This required changing SafeDict into a dict subclass (which is better than using UserDicts anyway -- yay Python 2.3!). It's probably still possible to flummox things up if you change the name of the loggers in the SMTP_LOG_* variables in mm_cfg.py. However, the worst you can do is cause output to go to stderr and not go to a log file.
Note too that all entry points into the Mailman system must call Mailman.loginit.initialize() or the log output will go to stderr (which may occasionally be what you want). Currently all CGIs and qrunners should be working properly.
I wish I could have tested all code paths that touch the logger, but that's infeasible. I have tested this, but it's possible that there were some mistakes in the translation.
Mailman.Bouncers.BounceAPI.Stop is a singleton, but not a class instance any more.
True/False code cleanup, PEP 8 import restructuring, whitespace normalization, and copyright year updates, as appropriate.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
On Sun, 2006-04-23 at 08:30 +0900, Tokio Kikuchi wrote:
I had some hours playing with the new svn trunk. (I was a little bit busy because our academic year begins April.)
bwarsaw@users.sourceforge.net wrote:
Revision: 7858 Author: bwarsaw Date: 2006-04-16 21:08:17 -0700 (Sun, 16 Apr 2006) ViewCVS: http://svn.sourceforge.net/mailman/?rev=7858&view=rev
Log Message:
- Convert all logging to Python's standard logging module. Get rid of all traces of our crufty old Syslog. Most of this work was purely mechanical, except for:
I get this for a fresh install of svn trunk. You may have old install remained, if you haven't experienced this.
% bin/mailmanctl start Traceback (most recent call last): File "bin/mailmanctl", line 112, in ? from Mailman.Logging.Syslog import syslog ImportError: No module named Logging.Syslog
Try r7871. I think I've fixed this now.
Also, if you send SIGHUP to reopen the logs, only the last reopen messages is recorded because each runners try to reopen the log file. We may have to restart qrunners if mailmanctl receive SIGHUP and it has started new log files. We may also utilize the backupCount feature for log rotation (intruducing LOG_BACKUP_COUNT in Defaults.py).
I decided not to use the RotatingFileHandler and leave file rotation to external tools like logrotate. Instead I implemented a subclass of FileHandler that allows for reopening the log files (I wonder why this isn't part of the base FileHandler).
One thing we may have to do though is set the log file encoding. What do you think about that?
-Barry
At 11:59 PM -0400 2006-04-23, Barry Warsaw wrote:
One thing we may have to do though is set the log file encoding. What do you think about that?
Log file encoding? I'm not sure I understand what you mean. I
can think of a few different ways that could be interpreted, and I don't know for sure that any of them are the meaning you intended to convey.
Could you clarify and/or elaborate?
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
LOPSA member since December 2005. See <http://www.lopsa.org/>.
Brad Knowles wrote:
At 11:59 PM -0400 2006-04-23, Barry Warsaw wrote:
One thing we may have to do though is set the log file encoding. What do you think about that?
Log file encoding? I'm not sure I understand what you mean. I can think of a few different ways that could be interpreted, and I don't know for sure that any of them are the meaning you intended to convey.
Could you clarify and/or elaborate?
Well, it should be a mess. :-(
Consider mailman get a spam from a foreign country and caused an error. Mailman may complain UnicodeDecodeError and spew an excerpt containing unknown charset string. This is certainly not printable if there is no encoding which means only us-ascii is accepted for the log file. Even if you determine the charset for your language (eg. euc-jp for japanese), you still get error for a chinese spam.
It may be useful if the log output use 'replace' feature of encode() method.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
"Tokio" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:
Tokio> Consider mailman get a spam from a foreign country and
Tokio> caused an error. Mailman may complain UnicodeDecodeError
Tokio> and spew an excerpt containing unknown charset string.
This really should not happen. Mailman should trap *all* UnicodeDecodeErrors at a very low level. (You simply cannot yet count on malformed message == SPAM in all contexts yet. Eg, just last week the Mac users here started flaming the Windows-using administration for distributing mojibake.)
Then it should wash the message to make it safe. RFC 2047-encode any 8-bit headers, and use a base64 Content-Transfer-Encoding for any 8-bit message bodies or body parts that don't have a known, approved charset specified. Bonus points for checking that 8-bit body parts with a specified charset actually conform to it.
Finally, reraise some kind of exception that can be handled at the filtering policy level.
-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
On Mon, 2006-04-24 at 17:12 +0900, Stephen J. Turnbull wrote:
"Tokio" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:
Tokio> Consider mailman get a spam from a foreign country and Tokio> caused an error. Mailman may complain UnicodeDecodeError Tokio> and spew an excerpt containing unknown charset string.
This really should not happen. Mailman should trap *all* UnicodeDecodeErrors at a very low level. (You simply cannot yet count on malformed message == SPAM in all contexts yet. Eg, just last week the Mac users here started flaming the Windows-using administration for distributing mojibake.)
The general approach should be that /everything/ gets converted to Unicode at the boundaries of the system. In Mailman 2.1, all the Unicode and i18n stuff was bolted on afterward, which is why we've had so much pain throughout, dealing with Unicode conversions. Ideally, we'd get rid of all that for 2.2 and deal only with Unicode internally.
We may have to make modifications to the email package though, but I'm not sure. It should probably always return Unicode for everything.
Then it should wash the message to make it safe. RFC 2047-encode any 8-bit headers, and use a base64 Content-Transfer-Encoding for any 8-bit message bodies or body parts that don't have a known, approved charset specified. Bonus points for checking that 8-bit body parts with a specified charset actually conform to it.
Finally, reraise some kind of exception that can be handled at the filtering policy level.
That sounds about right. Probably the email package should convert everything to Unicode internally and place Defects on the message objects that have illegal encodings.
-Barry
"BAW" == Barry Warsaw <barry@python.org> writes:
BAW> Ideally, we'd get rid of all that for 2.2 and deal only with
BAW> Unicode internally.
The original encoded stuff should be squirreled away somewhere for debugging and maybe spam detection, though.
BAW> We may have to make modifications to the email package
BAW> though, but I'm not sure. It should probably always return
BAW> Unicode for everything.
That would be my recommendation (modulo preserving the original headers at least, and probably the original body too, for debugging etc).
-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
On Tue, 2006-04-25 at 23:23 +0900, Stephen J. Turnbull wrote:
"BAW" == Barry Warsaw <barry@python.org> writes:
BAW> We may have to make modifications to the email package BAW> though, but I'm not sure. It should probably always return BAW> Unicode for everything.
That would be my recommendation (modulo preserving the original headers at least, and probably the original body too, for debugging etc).
We still have some time to do this in the email package for Python 2.5, but not much. PEP 356 says that Python 2.5 beta 1 is scheduled for June 24th, and after that we'll be feature frozen.
Can we discuss any necessary changes on the email-sig, please?
-Barry
On Mon, 2006-04-24 at 15:19 +0900, Tokio Kikuchi wrote:
Well, it should be a mess. :-(
I'm hoping we can make it less so!
Consider mailman get a spam from a foreign country and caused an error. Mailman may complain UnicodeDecodeError and spew an excerpt containing unknown charset string. This is certainly not printable if there is no encoding which means only us-ascii is accepted for the log file. Even if you determine the charset for your language (eg. euc-jp for japanese), you still get error for a chinese spam.
It may be useful if the log output use 'replace' feature of encode() method.
That's probably a good idea. Also, I'm wondering if we should allow users to set the log file encoding in Defaults.py, or whether we should force utf-8, or try to interrogate the system for the encoding value.
Basically, the logger should be as liberal as possible, just in case we let encoding problems slip through (more on that in the next follow up).
-Barry
At 2:25 PM -0400 2006-04-24, Barry Warsaw wrote:
That's probably a good idea. Also, I'm wondering if we should allow users to set the log file encoding in Defaults.py, or whether we should force utf-8, or try to interrogate the system for the encoding value.
Personally, I think we should default to US-ASCII in the log
files, but I can see where some people might want to select a different encoding in mm_cfg.py.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
LOPSA member since December 2005. See <http://www.lopsa.org/>.
"Brad" == Brad Knowles <brad@stop.mail-abuse.org> writes:
Brad> Personally, I think we should default to US-ASCII in
Brad> the log files, but I can see where some people might want to
Brad> select a different encoding in mm_cfg.py.
I really think the log files should be UTF-8. The point is to make them as ASCII as possible, but if you've got readable garbage that you want to log, it should be readable. People who lack the fonts or whatever wouldn't be able to read it anyway; people who can will be able to convert the UTF-8 to something they can use.
If the garbage doesn't seem to be readable (eg, naked 8-bit crap in the headers) then it should be BASE64'd in the logs.
-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
participants (4)
-
Barry Warsaw
-
Brad Knowles
-
Stephen J. Turnbull
-
Tokio Kikuchi