[Twisted-Python] module interface checking tool

So, I was a bit irritated recently because there is no way to test for private name usage in a module's namespace. '__all__' is a good start, but it is somewhat poor for a few reasons: 1) it doesn't affect import modname; modname.someprivatename at all so you may be using private interfaces without having any way of knowing. 2) Because it doesn't actually affect code people write, it often doesn't get updated when new public interfaces get added. 3) until recently it didn't even affect help(module), so users don't even know they're using a private name. So, I wrote a tool to do the access checking. My goal is to have it used for testing (solving problems 1&2 *within* twisted, at least), not as something that you'd run your production server under. sandbox/foom/private.py. Currently, it just spits out warnings because that way I can actually run through all the twisted tests, but that's controllable as a flag. Ideally every module in twisted would define __all__ and we could run the entire testsuite with access control enabled without raising illegal access exceptions. To run the twisted tests with this, I temporarily stuck the following in twisted/__init__.py: import private private.privatizeModulesWithPrefix('twisted') Much of the output (attached at the end) is from the IMAP tests trying to access private bits in order to test them. I'm not sure exactly what to do about that -- tests _should_ sometimes be able to access private parts, but they should also be restricted, normally, because you want to test against the public interface as well. So I may have to create a way to annotate some tests as being able to access private attributes while disallowing it by default. Also, rebuild accesses private classes, and it is supposed to, so should be given special dispensation. Besides those, the other things that show up currently are: twisted.internet.abstract.isIPAddress from twisted.internet.iocpreactor.tcp (?) twisted.internet.protocol.ConsumerToProtocolAdapter from twisted.protocols.ftp (?) twisted.internet.threads._putResultInDeferred from twisted.enterprise.adbapi (_deferToThread) Some of those perhaps should be public (?), but _putResultInDeferred, probably should not. IMO these are bugs, either of omission in __all__ or access from an inappropriate place. I dunno which. Also, only 35 out of 559 modules in twisted use __all__. That's sad, and should be fixed. I'm sure if that was done, more of these kinds of things would show up. Here's the full list of accesses to private names (I took the output and sort|uniq'd it). '=> Yes' means access was granted based on the two modules living in the same package. '=> No' means it would have raised an exception if I had that enabled. <= private twisted.internet.abstract.isIPAddress from twisted.internet.iocpreactor.tcp (?): => No <= private twisted.internet.protocol.BaseProtocol from twisted.python.rebuild (latestClass): => No <= private twisted.internet.protocol.ConsumerToProtocolAdapter from twisted.protocols.ftp (?): => No <= private twisted.internet.threads._putResultInDeferred from twisted.enterprise.adbapi (_deferToThread): => No <= private twisted.mail.imap4.Command from twisted.mail.test.test_imap (append): => No <= private twisted.mail.imap4.DontQuoteMe from twisted.mail.test.test_imap (testQuoteAvoider): => No <= private twisted.mail.imap4.FileProducer from twisted.mail.test.test_imap (testFileProducer): => No <= private twisted.mail.imap4.MessageProducer from twisted.mail.test.test_imap (testMultipleMultiPart): => No <= private twisted.mail.imap4.MessageProducer from twisted.mail.test.test_imap (testSingleMultiPart): => No <= private twisted.mail.imap4.MessageProducer from twisted.mail.test.test_imap (testSinglePart): => No <= private twisted.mail.imap4.MessageSet from twisted.mail.test.test_imap (?): => No <= private twisted.mail.imap4.MessageSet from twisted.mail.test.test_imap (testSetFlags): => No <= private twisted.mail.imap4._FetchParser from twisted.mail.test.test_imap (testFetchParserBody): => No <= private twisted.mail.imap4._FetchParser from twisted.mail.test.test_imap (testFetchParserMacros): => No <= private twisted.mail.imap4._FetchParser from twisted.mail.test.test_imap (testFetchParserSimple): => No <= private twisted.mail.imap4._formatHeaders from twisted.mail.test.test_imap (testFetchHeaders): => No <= private twisted.mail.imap4._formatHeaders from twisted.mail.test.test_imap (testHeaderFormatter): => No <= private twisted.mail.imap4.collapseNestedLists from twisted.mail.test.test_imap (testFiles): => No <= private twisted.mail.imap4.collapseNestedLists from twisted.mail.test.test_imap (testQuoteAvoider): => No <= private twisted.mail.imap4.collapseStrings from twisted.mail.test.test_imap (testStringCollapser): => No <= private twisted.mail.imap4.decoder from twisted.mail.test.test_imap (testDecode): => No <= private twisted.mail.imap4.parseIdList from twisted.mail.test.test_imap (testIdListParser): => No <= private twisted.mail.imap4.parseNestedParens from twisted.mail.test.test_imap (_searchWork): => No <= private twisted.mail.imap4.parseNestedParens from twisted.mail.test.test_imap (testLiterals): => No <= private twisted.mail.imap4.parseNestedParens from twisted.mail.test.test_imap (testParenParser): => No <= private twisted.mail.imap4.splitQuoted from twisted.mail.test.test_imap (testQuotedSplitter): => No <= private twisted.mail.imap4.wildcardToRegexp from twisted.mail.test.test_imap (testWildcard): => No <= private twisted.mail.imap4.wildcardToRegexp from twisted.mail.test.test_imap (testWildcardNoDelim): => No <= private twisted.persisted.dirdbm.ALLOW_TWISTED_REBUILD from twisted.python.rebuild (rebuild): => No <= private twisted.web.xmlrpc.addIntrospection from twisted.web.test.test_xmlrpc (?): => No <= private twisted.internet.abstract.isIPAddress from twisted.internet.base (resolve): => Yes <= private twisted.internet.abstract.isIPAddress from twisted.internet.tcp (resolveAddress): => Yes <= private twisted.internet.abstract.isIPAddress from twisted.internet.udp (connect): => Yes <= private twisted.internet.abstract.isIPAddress from twisted.internet.udp (startListening): => Yes <= private twisted.internet.base.BaseConnector from twisted.internet.ssl (?): => Yes <= private twisted.internet.base.BaseConnector from twisted.internet.ssl (__init__): => Yes <= private twisted.internet.base.BaseConnector from twisted.internet.tcp (?): => Yes James

On Thu, 2004-08-05 at 03:06 -0400, James Y Knight wrote:
So, I was a bit irritated recently because there is no way to test for private name usage in a module's namespace. '__all__' is a good start, but it is somewhat poor for a few reasons:
WOW. THank you for doing this, james. This is more useful than PyChecker output, I think. It's also a little depresing. We really need a better-documented public API! I have created 2 new issues for this, one for adding __all__ to all modules and one for testing that it's obeyed properly. #658 and #659, respectively. They are tentatively assigned to you, but I expect you can carve up Twisted and motivate others to actually do most of the work :).

On Thu, 2004-08-05 at 01:02, glyph wrote:
On Thu, 2004-08-05 at 03:06 -0400, James Y Knight wrote:
So, I was a bit irritated recently because there is no way to test for private name usage in a module's namespace. '__all__' is a good start, but it is somewhat poor for a few reasons:
WOW. THank you for doing this, james. This is more useful than PyChecker output, I think. It's also a little depresing. We really need a better-documented public API!
I have created 2 new issues for this, one for adding __all__ to all modules and one for testing that it's obeyed properly. #658 and #659, respectively. They are tentatively assigned to you, but I expect you can carve up Twisted and motivate others to actually do most of the work :).
On the subject of adbapi using t.i.threads._putResultInDeferred, the latter function is useful for anyone managing their own threads. Anyone object if it is renamed to putResultInDeferred and made public? dave

On Mon, 2004-08-09 at 22:48, Dave Peticolas wrote:
On the subject of adbapi using t.i.threads._putResultInDeferred, the latter function is useful for anyone managing their own threads. Anyone object if it is renamed to putResultInDeferred and made public?
Actually, yes, I would object. It's a severe wart that adbapi uses t.p.threadpool directly; IIRC it's one of the major issues with doing a proper reactor shut-down. It should be doing reactor.callInThread, not its own thread management. I will note that the API which uses it is simply a deferToThread clone. All the thread-management logic in adbapi can easily be ported to use t.i.threads.deferToThread and use logical task queue identifiers instead of physical thread IDs. If you agree, please create an issue in the tracker for this. Chandler might have a legitimate use-case for putResultInDeferred, but I'd still like to try to talk them out of doing their own thread management first :).

On Mon, 2004-08-09 at 21:23, Glyph Lefkowitz wrote:
On Mon, 2004-08-09 at 22:48, Dave Peticolas wrote:
On the subject of adbapi using t.i.threads._putResultInDeferred, the latter function is useful for anyone managing their own threads. Anyone object if it is renamed to putResultInDeferred and made public?
Actually, yes, I would object.
It's a severe wart that adbapi uses t.p.threadpool directly; IIRC it's one of the major issues with doing a proper reactor shut-down. It should be doing reactor.callInThread, not its own thread management. I will note that the API which uses it is simply a deferToThread clone.
All the thread-management logic in adbapi can easily be ported to use t.i.threads.deferToThread and use logical task queue identifiers instead of physical thread IDs. If you agree, please create an issue in the tracker for this.
I agree, with the caveat that the reactor threadpool size adjustment api will need to be enhanced so that adbapi can request more threads if the max connections is high, without reducing threads if the max is low. suggestThreadPoolSize is too crude to do that. Min threads probably needs to be adjustable as well. Agreed? dave

On Mon, 2004-08-09 at 22:03, Dave Peticolas wrote:
On Mon, 2004-08-09 at 21:23, Glyph Lefkowitz wrote:
On Mon, 2004-08-09 at 22:48, Dave Peticolas wrote:
On the subject of adbapi using t.i.threads._putResultInDeferred, the latter function is useful for anyone managing their own threads. Anyone object if it is renamed to putResultInDeferred and made public?
Actually, yes, I would object.
It's a severe wart that adbapi uses t.p.threadpool directly; IIRC it's one of the major issues with doing a proper reactor shut-down. It should be doing reactor.callInThread, not its own thread management. I will note that the API which uses it is simply a deferToThread clone.
All the thread-management logic in adbapi can easily be ported to use t.i.threads.deferToThread and use logical task queue identifiers instead of physical thread IDs. If you agree, please create an issue in the tracker for this.
I agree, with the caveat that the reactor threadpool size adjustment api will need to be enhanced so that adbapi can request more threads if the max connections is high, without reducing threads if the max is low. suggestThreadPoolSize is too crude to do that. Min threads probably needs to be adjustable as well. Agreed?
On further thought, I'm not so sure about this. I have a real application that is managing multiple connection pools. If every connection pool is sharing the same threadpool, how can I prevent one connection pool from starving the others, as well as non-adbapi tasks? dave

On Tue, 2004-08-10 at 01:16, Dave Peticolas wrote:
On further thought, I'm not so sure about this. I have a real application that is managing multiple connection pools. If every connection pool is sharing the same threadpool, how can I prevent one connection pool from starving the others, as well as non-adbapi tasks?
You can easily limit how many outstanding operations can be pending from any given pool. Like I said in a previous email, virtual task != physical thread. I'll write some example code later today.

On Tue, 2004-08-10 at 06:32, Glyph Lefkowitz wrote:
On Tue, 2004-08-10 at 01:16, Dave Peticolas wrote:
On further thought, I'm not so sure about this. I have a real application that is managing multiple connection pools. If every connection pool is sharing the same threadpool, how can I prevent one connection pool from starving the others, as well as non-adbapi tasks?
You can easily limit how many outstanding operations can be pending from any given pool. Like I said in a previous email, virtual task != physical thread. I'll write some example code later today.
I'm not worried about limiting a connection pool to it's max threads, which is still necessary to do, of course, and I would be curious to see how your implementation differs from my own ideas. I want to ensure that a connection pool can actually make use of all of its connections simultaneously if it needs to (the performance half of why a connection pool is used in the first place). If adbapi were to ignore the reactor thread counts altogether, for example, it would be pointless to specify a max connections greater than 10 since the reactor won't let you use more than that anyway, the default max thread count being 10 atm. dave

On Tue, 2004-08-10 at 10:48, Dave Peticolas wrote:
I want to ensure that a connection pool can actually make use of all of its connections simultaneously if it needs to (the performance half of why a connection pool is used in the first place). If adbapi were to ignore the reactor thread counts altogether, for example, it would be pointless to specify a max connections greater than 10 since the reactor won't let you use more than that anyway, the default max thread count being 10 atm.
Aah. So what you're saying is you want an API which multiple thread-pools can communicate via to establish a suggested minimum value for the maximum thread count? I'm not opposed to such a thing, but shouldn't this be a deployment / tuning option particular to a configuration? The suggested value would have to be a fairly subtle hint... Maybe something like 'IReactorThreads.suggestAdditionalThreads(suggestingObject, addtlThreadCount)'? The reactor could track the suggesting object with a weakref to decrease the total thread count when it goes away. -- _ \ Glyph Lefkowitz |"Along the shore the cloud waves break, / \ \ glyph@divmod.com | The twin suns sink behind the lake, ` _o_ \-----------------+ The shadows lengthen, In Carcosa" ( ._\ \ - Cassilda's Song, "The King in Yellow", Act 1, Scene 2

On Tue, Aug 10, 2004 at 12:28:52PM -0400, Glyph Lefkowitz wrote:
On Tue, 2004-08-10 at 10:48, Dave Peticolas wrote:
I want to ensure that a connection pool can actually make use of all of its connections simultaneously if it needs to (the performance half of why a connection pool is used in the first place). If adbapi were to ignore the reactor thread counts altogether, for example, it would be pointless to specify a max connections greater than 10 since the reactor won't let you use more than that anyway, the default max thread count being 10 atm.
Aah. So what you're saying is you want an API which multiple thread-pools can communicate via to establish a suggested minimum value for the maximum thread count?
Yes, precisely.
I'm not opposed to such a thing, but shouldn't this be a deployment / tuning option particular to a configuration? The suggested value would have to be a fairly subtle hint...
Maybe something like 'IReactorThreads.suggestAdditionalThreads(suggestingObject, addtlThreadCount)'? The reactor could track the suggesting object with a weakref to decrease the total thread count when it goes away.
That sounds just fine. I'll add a tracker request for that. dave

On Tue, 2004-08-10 at 01:03, Dave Peticolas wrote:
I agree, with the caveat that the reactor threadpool size adjustment api will need to be enhanced so that adbapi can request more threads if the max connections is high, without reducing threads if the max is low. suggestThreadPoolSize is too crude to do that. Min threads probably needs to be adjustable as well. Agreed?
Mostly - I disagree in that I think this can be decently well implemented without actually delving into physical thread counts, again... but then, I remember the various insane ways that Oracle prices its client licenses, and other things like that... At any rate, I see no reason why the threadpool interface couldn't be improved as well, even if it's not perfectly necessary. Do you think the tasks could be separated?
participants (4)
-
Dave Peticolas
-
glyph
-
Glyph Lefkowitz
-
James Y Knight