[Mailman-Users] utf-8 subjects; extended "." regexp really necessary?
arpepper at uwaterloo.ca
Tue Nov 24 16:37:26 EST 2015
(Mailman 2.1.12, some local mods, but not around topics...)
I had a utf-8 subject I was having difficulty matching with a topic regexp.
Eventually I concluded the subject still had newlines in it when it was
matched against the regexp. (That is the continuation lines were not
joined before matching). And "." would not match the newline character(s)).
So, for test purposes...
seemed to match my particular test subject.
While the following did not.
Am I correct in my conclusion that .* won't match newline characters,
but <space-chars><not-space-chars><linefeed><carriage-return> will ?
(And also, that that is the character class I created).
For production I might need to put [\s\S\n\r]* between every pair of
characters after a reasonable point in the expression. Unless I can
enumerate the possibilities more precisely. (Which will probably
result in an even longer looking character class).
Empirically I see ?=\n =?utf-8?q?_ after "Weekly" and before "Ac".
(And it seems the matching is done on the incoming subject, not the
one formatted for resending, which, with my tag, and the utf-8
of an incoming tag pushes the expression entirely onto the second
line where I think the ".*" variant (or even [_ ]) would match.
More generally, my question applies to any potentially long subject,
but utf-8 subjects seem to get longer more easily.
There is no header-equivalent line in the body (it's mime anyway).
More information about the Mailman-Users