[Mailman-Users] utf-8 subjects; extended "." regexp really necessary?

Adrian Pepper arpepper at uwaterloo.ca
Tue Nov 24 16:37:26 EST 2015


(Mailman 2.1.12, some local mods, but not around topics...)

 I had a utf-8 subject I was having difficulty matching with a topic regexp.

 Eventually I concluded the subject still had newlines in it when it was
 matched against the regexp.  (That is the continuation lines were not
 joined before matching).  And "." would not match the newline character(s)).

 So, for test purposes...

     Farmers[_ ]Weekly[\s\S\n\r]*Ac

 seemed to match my particular test subject.

 While the following did not.

     Farmers[_ ]Weekly.*Ac

 Am I correct in my conclusion that .* won't match newline characters,
 but <space-chars><not-space-chars><linefeed><carriage-return> will ?
 (And also, that that is the character class I created).

 For production I might need to put [\s\S\n\r]* between every pair of
 characters after a reasonable point in the expression.  Unless I can
 enumerate the possibilities more precisely.  (Which will probably
 result in an even longer looking character class).

 Empirically I see  ?=\n =?utf-8?q?_ after "Weekly" and before "Ac".
 (And it seems the matching is done on the incoming subject, not the
 one formatted for resending, which, with my tag, and the utf-8
 of an incoming tag pushes the expression entirely onto the second
 line where I think the ".*" variant (or even [_ ]) would match.

 More generally, my question applies to any potentially long subject,
 but utf-8 subjects seem to get longer more easily.

 There is no header-equivalent line in the body (it's mime anyway).


Adrian Pepper



More information about the Mailman-Users mailing list