Use of the public suffix list

Hi all, I noticed (from a DMARC mitigation utility that Lindsay extracted) that Mailman features its own approach to using the PSL. Of course, development must go on, and sometimes it is a waste of time trying to make a super-duper scaffolding for a job that can be carried out complying to the KISS principle. At any rate, what is the future of DMARC lookups in Mailman?
- The specs say that "DMARC should be amended to use [a method better than PSL] as soon as it is generally available" [1]. I believe that sentence refers to RDAP, which was released more or less at the same time (March 2015) [2].
[1] https://tools.ietf.org/html/rfc7489#appendix-A.6 [2] https://datatracker.ietf.org/wg/weirds/documents/
- There are various Python packages for domain name splitting. They obviously use the PSL, but supposedly would transparently switch to a better method in case. If Mailman used one such package, a practical advantage for users would be to update the PSL in only one place, if they happened to use the same dependency. I found six packages.
tldextract [3] is the only one of them which caches a JSON object rather than the original textual representation of the list. It uses a frozenset. tld [4] and publicsuffixlist [5] also build a set. publicsuffix[6] and publicsuffix2 [7] build lists of nested dictionaries from all the labels. dnspy [8] builds a dictionary of FQDNs, somewhat like Mailman.
How does the time to build the structure compare with the time taken by DSN queries?
[3] https://pypi.python.org/pypi/tldextract [4] https://pypi.python.org/pypi/tld [5] https://pypi.python.org/pypi/publicsuffixlist [6] https://pypi.python.org/pypi/publicsuffix [7] https://pypi.python.org/pypi/publicsuffix2 [8] https://pypi.python.org/pypi/dnspy
- Debian distributes a publicsuffix package which brings a textual version of the list. Since stretch, it also brings a "dafsa" version. Nowadays, most C implementations (Firefox, Chromium) use dafsa. They build the structure using offsets rather than pointers, so that the blob can be defined in a source file as a literal static array of chars, in order to minimize loading time. That strategy works well as long as the relevant package is upgraded more frequently than the PSL. Otherwise, as for libpsl, one ends up using obsolete data.
Surprisingly, the publisuffix package itself is not upgraded as frequently as the PSL. This bug [9] is what prompted me to write this message. I guess you, as Mailman developers, have pondered this subject and I'd be interested to know what you think.
[9] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=879008
TIA for any reply Ale

Alessandro Vesely writes:
I see nothing in a quick look at the RDAP spec to suggest that an organizational/administrative domain (AD) field has been defined. It seems like it's just intended to be a replacement for whois, of course allowing extensions like delegating the AD to subdomains (or however that would work -- it's not obvious to me). That presumably would either be registered in the RDAP extensions registry or as a separate RFC. I've seen no discussion of this on DMARC channels either.
Surprisingly, the publisuffix package itself is not upgraded as frequently as the PSL.
I'm not surprised. Most users of the package won't be upgrading that frequently either, I suppose, but will rather be downloading it from the source.
In any case, this isn't a problem for Mailman to deal with; it's easy enough to access the public suffix list. A site could do that as a cron job once a day and almost all Mailman subscribers would be protected due to our "count bounces once per day" algorithm -- only sites with an extremely low bounce threshold would have a problem. I suppose there is a backscatter issue, but it's not clear to me that that is such a big deal.
This isn't a big deal for us at the moment, and my assessment is that it will not be one for the forseeable future. With the exception of WePublished1.3BillionAddressBooksToSpammers!.com and WeDidToo.com, I haven't heard of anybody publishing p=reject except for domains that produce only transactional mailflows. I'm sure there are many others, but I expect that most people will be subscribing to lists with mailboxes whose domains either have their own _dmarc TXT record or have an "obvious" administrative domain, or are "p=none" per default.
Do you have a reason to believe otherwise?
Steve

On Thu 02/Nov/2017 03:31:46 +0100 Stephen J. Turnbull wrote:
Yes, RDAP is WHOIS with a machine-readable format. In both cases, a name retrieved from a public domain name registry (DNR) is an "organizational" domain. Rfc7843 defines a publicIds.type to recognize DNRs. That way, one could work out the PSL based on authoritative sources. To make the method practical, RDAP addresses the bootstrap problem (rfc7484), whereby a client could learn DNR info in one shot.
To get a feeling of the state of the project, compare the contents of the following URLs --neither of which is usually up to date:
https://www.iana.org/domains/root/db http://data.iana.org/rdap/dns.json
The second one is mentioned here: https://github.com/arineng/nicinfo/wiki/RDAP-FAQ
PSL take the bother of writing "have your app download the list no more than once per day", so a cron job for each client would do. It is more difficult to coordinate apps, so as to download a single copy for all of them, in the unlikely case that more than one app run on the same host. That way, it becomes a potential packaging problem.
:-)
No, not really. If anything, I see less and less authenticated mail, both ham and spam. (Possibly an effect of that p=reject?) I know that it is a good thing that DMARC and PSL bring the possibility to learn domain names, but I don't know why.
Thank you for sharing your insights Ale

Alessandro Vesely writes:
Yes, but my understanding of the discussions on the DMARC list leading to use of the PSL in RFC 7843 is that organizational domain != administrative domain, and that's why the existing whois wasn't sufficient to replace the PSL. Did I miss something?
Thank you for sharing your insights
And you too!
I will take a look at the DBOUNDS discussion mentioned by John, as well as tracking the progress of RDAP.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

In article <e5e88792-ba9b-5c68-d5d0-93dc435cc792@tana.it> you write:
Sorry, that is wrong. It was refering to the DBOUND working group which failed to produce anything despite having two reasonable drafts from Casey Deccio and me.
RDAP has nothing to do with PSL style boundaries.
R's, John

Alessandro Vesely writes:
I see nothing in a quick look at the RDAP spec to suggest that an organizational/administrative domain (AD) field has been defined. It seems like it's just intended to be a replacement for whois, of course allowing extensions like delegating the AD to subdomains (or however that would work -- it's not obvious to me). That presumably would either be registered in the RDAP extensions registry or as a separate RFC. I've seen no discussion of this on DMARC channels either.
Surprisingly, the publisuffix package itself is not upgraded as frequently as the PSL.
I'm not surprised. Most users of the package won't be upgrading that frequently either, I suppose, but will rather be downloading it from the source.
In any case, this isn't a problem for Mailman to deal with; it's easy enough to access the public suffix list. A site could do that as a cron job once a day and almost all Mailman subscribers would be protected due to our "count bounces once per day" algorithm -- only sites with an extremely low bounce threshold would have a problem. I suppose there is a backscatter issue, but it's not clear to me that that is such a big deal.
This isn't a big deal for us at the moment, and my assessment is that it will not be one for the forseeable future. With the exception of WePublished1.3BillionAddressBooksToSpammers!.com and WeDidToo.com, I haven't heard of anybody publishing p=reject except for domains that produce only transactional mailflows. I'm sure there are many others, but I expect that most people will be subscribing to lists with mailboxes whose domains either have their own _dmarc TXT record or have an "obvious" administrative domain, or are "p=none" per default.
Do you have a reason to believe otherwise?
Steve

On Thu 02/Nov/2017 03:31:46 +0100 Stephen J. Turnbull wrote:
Yes, RDAP is WHOIS with a machine-readable format. In both cases, a name retrieved from a public domain name registry (DNR) is an "organizational" domain. Rfc7843 defines a publicIds.type to recognize DNRs. That way, one could work out the PSL based on authoritative sources. To make the method practical, RDAP addresses the bootstrap problem (rfc7484), whereby a client could learn DNR info in one shot.
To get a feeling of the state of the project, compare the contents of the following URLs --neither of which is usually up to date:
https://www.iana.org/domains/root/db http://data.iana.org/rdap/dns.json
The second one is mentioned here: https://github.com/arineng/nicinfo/wiki/RDAP-FAQ
PSL take the bother of writing "have your app download the list no more than once per day", so a cron job for each client would do. It is more difficult to coordinate apps, so as to download a single copy for all of them, in the unlikely case that more than one app run on the same host. That way, it becomes a potential packaging problem.
:-)
No, not really. If anything, I see less and less authenticated mail, both ham and spam. (Possibly an effect of that p=reject?) I know that it is a good thing that DMARC and PSL bring the possibility to learn domain names, but I don't know why.
Thank you for sharing your insights Ale

Alessandro Vesely writes:
Yes, but my understanding of the discussions on the DMARC list leading to use of the PSL in RFC 7843 is that organizational domain != administrative domain, and that's why the existing whois wasn't sufficient to replace the PSL. Did I miss something?
Thank you for sharing your insights
And you too!
I will take a look at the DBOUNDS discussion mentioned by John, as well as tracking the progress of RDAP.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

In article <e5e88792-ba9b-5c68-d5d0-93dc435cc792@tana.it> you write:
Sorry, that is wrong. It was refering to the DBOUND working group which failed to produce anything despite having two reasonable drafts from Casey Deccio and me.
RDAP has nothing to do with PSL style boundaries.
R's, John
participants (3)
-
Alessandro Vesely
-
John Levine
-
Stephen J. Turnbull