
Am 12.05.2008 um 23:20 schrieb Mark Sapiro:
I understand what you are saying, but I wonder what the real world difference would be. As currently written, chunkify returns at most 4 partially filled chunks. Granted, 4 is significantly bigger than one, but given that the MTA is VERPing the deliveries, it may ultimately create an outgoing queue entry for each recipient anyway, so the extra 3 on the inbound side doesn't seem that significant (and it might increase parallelism in the MTA).
First of all, I just noticed that the official code does indeed only
create at most 4 partially filled buckets. That's the problem when you
have to jump in for someone else: My SMTPDirect.py contains 26 TLDs.
Two thoughts:
- Even with only four buckets, when we have a real world distribution
amongst recipient addresses, this is four times the I/O needed. The
ratio get's better with the number of list subscribers growing, but if
there are less recipients than SMTP_MAX_RCPTS, it's exactly at 1:4.
2. Why even split recipients the way it's done now at all? You have to
either add new buckets (add new TLDs) or have all recipients outside
the hard coded TLDs be thrown into the same bucket. I could understand
it if you first created a list of TLDs involved and sorted by those -
though I don't know if it's a good idea if you run a really large list
and examine all recipients...
I didn't understand what you said about VERPing and outgoing queue
entries - surely any MTA will keep track of recipients on a per
message basis? As for parallelism, I think the best way to ensure fast
delivery is to make all target destinations known to the MTA as fast
as possible.
Given your 25000 member list, and assuming SMTP_MAX_RCPTS = 500, you would have at most 54 chunks (and more likely 53 or 52) instead of 50.
In any case, If I were coding this, I would be inclined to not make it an option, but just to change chunkify so it still grouped, but continued to fill the last chunk of a group from the next group so there would be at most one partial chunk.
At the moment, I changed the code to simply return SMTP_MAX_RCPTS per
chunk - or all recipients if there are less than that. Hardcoded, not
configurable. The way it is done now I can't see any real advantages -
especially living outside the U.S. Either improve the sorting
algorithm (all TLDs, don't return partial chunks) or make it
configurable to skip sorting altogether. Or at least that's what I
feel would be an improvement. Have it default to flat chunking. It
saves CPU time, I/O operations and gives the MTAs queue manager more
time to do it's job.
Cheers Stefan