Hi everyone and Abhilash in particular :)
I've faced a case when Hypirkitty is unable to chain messages into a thread: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/
(See messages with the subject "WLUG Meeting Feb 11th 2021! Topic: Good question!".)
It's a quite disappointment as GMail does show them correctly - as a single thread.
As per my small investigation, a subscriber Robert N. Evans seems to have "In-Reply-To" headers stripped from the messages that probably causes the thread to break.
I wonder if Hyperkitty is able to leverage some other method to combine the thread correctly in this case?
"Good" and "bad" message examples are in the attachment.
Best regards, Danil Smirnov
On Sat, Feb 13, 2021 at 02:50:47PM +0200, Danil Smirnov wrote:
Hi everyone and Abhilash in particular :)
I've faced a case when Hypirkitty is unable to chain messages into a thread: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/
[..] It's a quite disappointment as GMail does show them correctly - as a single thread.
As per my small investigation, a subscriber Robert N. Evans seems to have "In-Reply-To" headers stripped from the messages that probably causes the thread to break.
I wonder if Hyperkitty is able to leverage some other method to combine the thread correctly in this case?
There are two common methods to group messages into threads:
Using the "In-Reply-To:" header. (The "correct" approach. Downside: gives false negatives if users strip those headers, as you've seen.)
Using the "Subject:" header. (A heuristic approach. Downside: gives false positives if users start a new thread with the same subject as an older thread.)
I believe Gmail uses the "Subject:" header. That would explain why Gmail was able to recognise Robert N. Evans's messages as part of the thread even though they lacked the "In-Reply-To:" header.
I don't know if Hyperkitty allows threading using "Subject:" matching, but if so then that would probably solve your problem.
Sam
-- A: When it messes up the order in which people normally read text. Q: When is top-posting a bad thing?
() ASCII ribbon campaign. Please avoid HTML emails & proprietary /\ file formats. (Why? See e.g. https://v.gd/jrmGbS ). Thank you.
Danil Smirnov writes:
As per my small investigation, a subscriber Robert N. Evans seems to have "In-Reply-To" headers stripped from the messages that probably causes the thread to break.
I wonder if Hyperkitty is able to leverage some other method to combine the thread correctly in this case?
It's simply not possible to guarantee correct threading if neither References nor In-Reply-To are present.
It is possible to place a message in an approximately appropriate place by threading the threadable messages, grouping that message with threads with "the same" subject, and inserting it (and any descendants) after some message with an earlier date, but this is inherently ambiguous as that message could be a reply to *any* such message with an earlier date.
This would work well if there is a single linear thread. But it is unlikely to work at all well if several posters replied to a single message in the recent past so that there are multiple subthreads active at a given time.
Gmail has a big advantage, since they're reading your mail, indexing it, and creating a fine-grained statistical profile. That database can probably be leveraged for better threading. Or if your posters consistently top-post, it's probably not too hard to match quoted content against the top-level content of an earlier post -- if you have both the development and the computational resources of Google. (Come to think of it, for Gmail this would probably allow them to compress their storage by 50%.) Or maybe they just got lucky.
Steve
On 2/13/21 4:50 AM, Danil Smirnov wrote:
Hi everyone and Abhilash in particular :)
I've faced a case when Hypirkitty is unable to chain messages into a thread: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/
(See messages with the subject "WLUG Meeting Feb 11th 2021! Topic: Good question!".)
It's a quite disappointment as GMail does show them correctly - as a single thread.
As per my small investigation, a subscriber Robert N. Evans seems to have "In-Reply-To" headers stripped from the messages that probably causes the thread to break.
As Steve notes, threading by Subject: matching has its own issues and HyperKitty makes no attempt to do that.
Where HyperKitty is deficient is it uses only In-Reply-To: and ignores References:. This is an issue if someone sends a reply to an off-list message back to the list. In that case, Hyperkitty doesn't find the In-Reply-To: message-id so starts a new thread, even though there may be References: message-ids in the archive.
I wonder if Hyperkitty is able to leverage some other method to combine the thread correctly in this case?
There is an article on threading at <https://www.jwz.org/doc/threading.html> and an RFC <https://www.rfc-editor.org/rfc/rfc5256.html>. These describe algorithms which are fairly complex, but if someone wanted to try to implement them in HyperKitty, we would certainly consider the implementation.
Note that even HyperKitty's simple method generally works well. It breaks down when replies to off-list messages go back to the list, when user's mail clients don't add In-Reply-To: (these are fairly rare), and when a user composes what is actually a reply as a new message.
Also note that "combine the thread correctly" is a subjective opinion, at least in some cases.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro writes:
There is an article on threading at <https://www.jwz.org/doc/threading.html> and an RFC <https://www.rfc-editor.org/rfc/rfc5256.html>. These describe algorithms which are fairly complex, but if someone wanted to try to implement them in HyperKitty, we would certainly consider the implementation.
Ouch. I didn't realize we didn't use Jamie's algorithm. It's not that hard to implement[1], and it's robust and extremely efficient[2], modulo the cost of accessing message-id, in-reply-to, and references.
A robust, tested, and documented implementation sounds like a GSoC project to me. And a PyPI package, though that would be somewhat harder.
Footnotes: [1] It took me about a day to get it mostly working in Elisp, and most of the difficulty and the remaining issues were due to working around bugs in the MUA that caused uncaught exceptions in the MUA.
[2] It's multipass, but it's worst-case and average-case linear. Worst-case is linear because the line-length restriction keeps the length of references down to about 15 at most.
Danil Smirnov schrieb:
I wonder if Hyperkitty is able to leverage some other method to combine the thread correctly in this case?
There is no way to display a thread without threading information (in In-Reply-To: or References: headers). One can try to match by Subject and/or Date, but that is a heuristic bound to fail.
The "correct" way would be to fix the client that is erroneously [1] missing or deleting threading headers.
-thh
[1] Violating a SHOULD in RFC 5322, 3.6.4.
Thomas Hochstein writes:
The "correct" way would be to fix the client that is erroneously [1] missing or deleting threading headers.
-thh
[1] Violating a SHOULD in RFC 5322, 3.6.4.
'Tis easier to forgive than to educate. -- Jon Postol, probably
It does warm my heart to see somebody else RFC-geeking out, though!
Steve
participants (5)
-
Danil Smirnov
-
Mark Sapiro
-
Sam Kuper
-
Stephen J. Turnbull
-
Thomas Hochstein