[Twisted-Python] Fitting cred into my application

Hi. I've been trying to wrap my head around the cred implementation for a while now, but either I'm missing something, or there's some piece of documentation that could be better. Probably at least a bit of both. My application is an XMLRPC server, and an authenticated client should have rights to run some RPC methods, but not others. Some methods will give access to limited data based on authentication. The documentation for cred is clear in the case where the server has its own protocol implementation, but in the case of XMLRPC, where the protocol isn't subclassed, how to link it in is far less clear. Also, since the design suggests that it's the RPC methods that need to talk to the avatar, not the protocol, how to implement cred seems even less obvious. Are there some other examples of cred implementations floating around that I can look at, where lack of authentication does not block all access to the protocol? In particular an example combining XMLRPC and cred would make my day. Does any documentation or example code along those lines exist that I just haven't found yet? Thanks in advance for any pointers! Matt

On Sat, Sep 22, 2012 at 4:37 PM, Matthew Pounsett <matt@conundrum.com>wrote:
There's three parts to doing this: 1. When setting up the portal, in addition to the credential checker that knows about regular users, also register a twisted.cred.checkers.AllowAnonymousAccess. 2. When you have no credentials, the xml-rpc layer should login to the portal using a twisted.cred.credentials.Anonymous credential. When you do have credentials, pass them in as you normally. 3. Now as part of a login your realm gets either a username, or a twisted.cred.checkers.ANONYMOUS as the avatar id that is being requested. Based on what it gets your realm should return a different business logic implementation (aka "avatar"); the anonymous one can do less, say. twisted.web.guard does #2; the way it is implemented, the business logic object (avatar) that is returned is a web Resource, so the realm can return a different Resource depending on whether there's HTTP credentials or not (the latter case being avatar id of ANONYMOUS). It's possible you just want to use guard, if you're relying on HTTP authentication - just return different XMLRPC objects. Alternatively, the XMLRPC layer could log-in to the portal, in which case it would get back different business logic objects with different capabilities. An example of a full setup is twisted.protocols.ftp combined with twisted.tap.ftp, the ftp plugin for twistd, but the details are a bit spread out. You should be able to find all three of those parts though. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

On 2012/09/22, at 21:36, Itamar Turner-Trauring wrote:
I'm not following why I would want to do #1, probably because I can't find a sufficient explanation of exactly what an "avatar" is supposed to do. In some places it seems to just be treated as a user ID, and in other places documentation makes reference to "business logic" as you do, but nowhere do I see any examples of what exactly that means. It seems to me #1 is overkill; if I want to have methods that don't require authentication (e.g. methods for registering a user in the first place), why would I require all clients to authenticate as anonymous before using them? It would be a lot simpler to just have my xmlrpc methods check against the attributes of the current user object when called, and then return appropriately: return failures when there is no user, or when the user's attributes don't match those required by the method, and return data that a user's attributes give him/her access to when there is a user. But again, I think I'm missing some key details that just aren't in the documentation I've been able to find.
I'm not using HTTP authentication because it requires passing authentication info with each request. In my design a client can authenticate at any point in the connection to gain access to methods or data that require it. From your description of Guard it sounds like I would bind a different resource to the factory for each set of permissions a user might have, which means there would be a different Resource defined (and by extension different xmlrpc methods defined) for every possible combination of attributes a user might have. A user may have a dozen different attributes that go into defining what methods that user can call, or what data will be returned by a method; the combinatorics there could result in thousands of slightly-different Resources and methods I'd need to maintain. It seems more reasonable to have each xmlrpc method check the current user's attributes and make its own decisions about what data (if any) to return.
This sounds like it could be what I'm looking for, but again I can't find anything that discusses in any detail what "business logic" means in this context, or how to implement those different capabilities.
The FTP protocol doesn't implement any non-authenticated commands. Authentication is always done at the beginning of a session, even if that is to authenticate as the user 'anonymous', and there's no provision for changing credentials in mid-connection. I'll have a look anyway though.. perhaps I can piece together a few more things that I haven't been able to find in the API documentation, the twisted.cred how-to, or the cred.py and dbcred.py examples. I've got a bit further since my initial email, and my current approach is to extend t.w.server.Site to accept a portal. I'm currently trying to separate the useful bits from the flash in the requestAvatarID and _??Authenticate methods in dbcred.py. It would be nice to have something as straight-forward as cred.py that also implemented a realm and a credentials checker so that I could see how all those pieces fit together. Thanks for the info.. hopefully it leads me to piece this stuff together. Matt

On Sat, Sep 22, 2012 at 11:41 PM, Matthew Pounsett <matt@conundrum.com>wrote:
You don't need the clients to authenticate as anonymous; the XML RPC code can say "if there's no credentials from client, login as anonymous."
I would just add a Portal to the XML-RPC object, rather than the Site. I'll try to write some example code later today, if I have time. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

Attached find an example server, and a client demonstration - it's only very lightly tested, so likely wrong or buggy somewhere. I didn't bother to implement sessions, so you need to login with every command if you want extended access. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

On 2012/09/23, at 18:55, Itamar Turner-Trauring wrote:
Attached find an example server, and a client demonstration - it's only very lightly tested, so likely wrong or buggy somewhere. I didn't bother to implement sessions, so you need to login with every command if you want extended access.
Thanks for the examples, I'll have a look tonight. I was going to avoid the issue of session handling by putting the portal on the protocol. That way authentication, once done, can be persistent for the length of the connection without having to do any special session handling.

On 08:17 pm, matt@conundrum.com wrote:
As soon as you have a proxy between your client and server, you'll regret this. The HTTP authentication standards specifically forbid this style of authentication, and the proxy standards explicitly allow the lifetime of connections between a client and the proxy to be different from the lifetime of connections between the proxy and the server. In other words, there are reasons HTTP auth works the way it does. Also, there is a list dedicated to web topics: http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web Jean-Paul

I'm in the process of rewriting a web spider (originally in twisted circa 2005) , and keep running into an issue with deferreds releasing too many tokens. i've been playing around with this all day, and can't seem to shake this problem. i'm guessing that i designed the application logic wrong. Does the following code raise any warning signs for people ? The general setup is this: AnalyzeLink - - run-of-the-mill class that performs actual db operations and link fetching - doesn't really rely on twisted, aside from being coded to the specs of runInteraction ( accepts an adbapi txn, raises for a rollback, is generally happy for a commit ) AnalyzeLinksService- - relies on twisted - queries the database for a batch of items to update - each actionable item is wrapped into an '_AnalyzeLinksRequestWrapper' instance, all of which are tossed into a defer.DeferredList() AnalyzeLinksRequestWrapper- - relies on twisted - pushes actual work into callbacks via threads.deferToThread - uses a defer.DeferredSemaphore provided by AnalyzeLinksService to acquire locks some cleaned-up code is below : ------------------ class AnalyzeLink(object): def get_update_batch(self,txn): # returns list of ids/data/etc to process def action_for_data(self,txn,data): # processes an entry class _AnalyzeLinksRequestWrapper(RequestWrapper): dbConnectionPool = None semaphoreService = None semaphoreLock = None def __init__( self , semaphoreService = None , dbConnectionPool = None ): self.dbConnectionPool= dbConnectionPool self.semaphoreService= semaphoreService def queue_thread( self , data=None ): self.queued_data= data d = self.semaphoreService.acquire()\ .addCallback( self._T_to_thread ) return d def _T_to_thread( self , deferredSemaphore ): self.semaphoreLock= deferredSemaphore t = threads.deferToThread( self._T_thread_begin )\ .addErrback( self._T_errors )\ .addCallback( self._T_thread_end ) def _T_thread_begin( self ): log.debug("_AnalyzeLinksRequestWrapper._T_thread_begin" ) updater = AnalyzeLink() self.dbConnectionPool.runInteraction( updater.action_for_data , self.queued_data )\ .addCallback( self._T_thread_end )\ .addErrback( self._T_errors ) def _T_thread_end( self , rval=None ): self.semaphoreLock.release() def _T_errors( self , x ): self._T_thread_end() raise x class _AnalyzeLinksService(ServiceScaffold): SEMAPHORE_TOKENS = 25 def __init__( self ): self.semaphoreService= defer.DeferredSemaphore( tokens=self.SEMAPHORE_TOKENS ) def action( self ): updater= AnalyzeLink() database.get_dbPool().runInteraction( updater.get_update_batch , queued_updates )\ .addCallback( self._action_2 )\ .addErrback( self._action_error ) def _action_2( self , queued_updates ): if len( queued_updates ): updates= [] for item in queued_updates: requestWrapper= _AnalyzeLinksRequestWrapper(\ semaphoreService = self.semaphoreService , dbConnectionPool = database.get_dbPool() ) result= requestWrapper.queue_thread( data=item ) updates.append(result) finished= defer.DeferredList( updates )\ .addCallback( self.deferred_list_finish ) else: d= defer.Deferred() self.deferred_list_finish( d ) def _action_error( self , raised ): log.debug("%s._action_error" % self.__class__.__name__ ) self.set_processing_status( False ) if isinstance( raised.value , database.DbRollback ): print "DB Rollback" raise raised elif isinstance( raised.value , database.DbRollbackOk ): print "DB Rollback ok" else: raise raised AnalyzeLinksService= _AnalyzeLinksService() class AnalyzeLinksService_Service(internet.TimerService): def __init__( self , dbConfigHash=None ): internet.TimerService.__init__( self, CHECK_PERIOD__IMPORT , AnalyzeLinksService.action )

This is a better way of using DeferredSemaphore: def queue_thread( self , data=None ): self.queued_data= data return self.semaphoreService.run( self._T_to_thread ) It handles acquisition and release for you. This will avoid any code path that might result in a double-release. On Wed, Sep 26, 2012 at 1:07 PM, Jonathan Vanasco <twisted-python@2xlp.com> wrote:

On Sat, Sep 22, 2012 at 4:37 PM, Matthew Pounsett <matt@conundrum.com>wrote:
There's three parts to doing this: 1. When setting up the portal, in addition to the credential checker that knows about regular users, also register a twisted.cred.checkers.AllowAnonymousAccess. 2. When you have no credentials, the xml-rpc layer should login to the portal using a twisted.cred.credentials.Anonymous credential. When you do have credentials, pass them in as you normally. 3. Now as part of a login your realm gets either a username, or a twisted.cred.checkers.ANONYMOUS as the avatar id that is being requested. Based on what it gets your realm should return a different business logic implementation (aka "avatar"); the anonymous one can do less, say. twisted.web.guard does #2; the way it is implemented, the business logic object (avatar) that is returned is a web Resource, so the realm can return a different Resource depending on whether there's HTTP credentials or not (the latter case being avatar id of ANONYMOUS). It's possible you just want to use guard, if you're relying on HTTP authentication - just return different XMLRPC objects. Alternatively, the XMLRPC layer could log-in to the portal, in which case it would get back different business logic objects with different capabilities. An example of a full setup is twisted.protocols.ftp combined with twisted.tap.ftp, the ftp plugin for twistd, but the details are a bit spread out. You should be able to find all three of those parts though. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

On 2012/09/22, at 21:36, Itamar Turner-Trauring wrote:
I'm not following why I would want to do #1, probably because I can't find a sufficient explanation of exactly what an "avatar" is supposed to do. In some places it seems to just be treated as a user ID, and in other places documentation makes reference to "business logic" as you do, but nowhere do I see any examples of what exactly that means. It seems to me #1 is overkill; if I want to have methods that don't require authentication (e.g. methods for registering a user in the first place), why would I require all clients to authenticate as anonymous before using them? It would be a lot simpler to just have my xmlrpc methods check against the attributes of the current user object when called, and then return appropriately: return failures when there is no user, or when the user's attributes don't match those required by the method, and return data that a user's attributes give him/her access to when there is a user. But again, I think I'm missing some key details that just aren't in the documentation I've been able to find.
I'm not using HTTP authentication because it requires passing authentication info with each request. In my design a client can authenticate at any point in the connection to gain access to methods or data that require it. From your description of Guard it sounds like I would bind a different resource to the factory for each set of permissions a user might have, which means there would be a different Resource defined (and by extension different xmlrpc methods defined) for every possible combination of attributes a user might have. A user may have a dozen different attributes that go into defining what methods that user can call, or what data will be returned by a method; the combinatorics there could result in thousands of slightly-different Resources and methods I'd need to maintain. It seems more reasonable to have each xmlrpc method check the current user's attributes and make its own decisions about what data (if any) to return.
This sounds like it could be what I'm looking for, but again I can't find anything that discusses in any detail what "business logic" means in this context, or how to implement those different capabilities.
The FTP protocol doesn't implement any non-authenticated commands. Authentication is always done at the beginning of a session, even if that is to authenticate as the user 'anonymous', and there's no provision for changing credentials in mid-connection. I'll have a look anyway though.. perhaps I can piece together a few more things that I haven't been able to find in the API documentation, the twisted.cred how-to, or the cred.py and dbcred.py examples. I've got a bit further since my initial email, and my current approach is to extend t.w.server.Site to accept a portal. I'm currently trying to separate the useful bits from the flash in the requestAvatarID and _??Authenticate methods in dbcred.py. It would be nice to have something as straight-forward as cred.py that also implemented a realm and a credentials checker so that I could see how all those pieces fit together. Thanks for the info.. hopefully it leads me to piece this stuff together. Matt

On Sat, Sep 22, 2012 at 11:41 PM, Matthew Pounsett <matt@conundrum.com>wrote:
You don't need the clients to authenticate as anonymous; the XML RPC code can say "if there's no credentials from client, login as anonymous."
I would just add a Portal to the XML-RPC object, rather than the Site. I'll try to write some example code later today, if I have time. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

Attached find an example server, and a client demonstration - it's only very lightly tested, so likely wrong or buggy somewhere. I didn't bother to implement sessions, so you need to login with every command if you want extended access. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

On 2012/09/23, at 18:55, Itamar Turner-Trauring wrote:
Attached find an example server, and a client demonstration - it's only very lightly tested, so likely wrong or buggy somewhere. I didn't bother to implement sessions, so you need to login with every command if you want extended access.
Thanks for the examples, I'll have a look tonight. I was going to avoid the issue of session handling by putting the portal on the protocol. That way authentication, once done, can be persistent for the length of the connection without having to do any special session handling.

On 08:17 pm, matt@conundrum.com wrote:
As soon as you have a proxy between your client and server, you'll regret this. The HTTP authentication standards specifically forbid this style of authentication, and the proxy standards explicitly allow the lifetime of connections between a client and the proxy to be different from the lifetime of connections between the proxy and the server. In other words, there are reasons HTTP auth works the way it does. Also, there is a list dedicated to web topics: http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web Jean-Paul

I'm in the process of rewriting a web spider (originally in twisted circa 2005) , and keep running into an issue with deferreds releasing too many tokens. i've been playing around with this all day, and can't seem to shake this problem. i'm guessing that i designed the application logic wrong. Does the following code raise any warning signs for people ? The general setup is this: AnalyzeLink - - run-of-the-mill class that performs actual db operations and link fetching - doesn't really rely on twisted, aside from being coded to the specs of runInteraction ( accepts an adbapi txn, raises for a rollback, is generally happy for a commit ) AnalyzeLinksService- - relies on twisted - queries the database for a batch of items to update - each actionable item is wrapped into an '_AnalyzeLinksRequestWrapper' instance, all of which are tossed into a defer.DeferredList() AnalyzeLinksRequestWrapper- - relies on twisted - pushes actual work into callbacks via threads.deferToThread - uses a defer.DeferredSemaphore provided by AnalyzeLinksService to acquire locks some cleaned-up code is below : ------------------ class AnalyzeLink(object): def get_update_batch(self,txn): # returns list of ids/data/etc to process def action_for_data(self,txn,data): # processes an entry class _AnalyzeLinksRequestWrapper(RequestWrapper): dbConnectionPool = None semaphoreService = None semaphoreLock = None def __init__( self , semaphoreService = None , dbConnectionPool = None ): self.dbConnectionPool= dbConnectionPool self.semaphoreService= semaphoreService def queue_thread( self , data=None ): self.queued_data= data d = self.semaphoreService.acquire()\ .addCallback( self._T_to_thread ) return d def _T_to_thread( self , deferredSemaphore ): self.semaphoreLock= deferredSemaphore t = threads.deferToThread( self._T_thread_begin )\ .addErrback( self._T_errors )\ .addCallback( self._T_thread_end ) def _T_thread_begin( self ): log.debug("_AnalyzeLinksRequestWrapper._T_thread_begin" ) updater = AnalyzeLink() self.dbConnectionPool.runInteraction( updater.action_for_data , self.queued_data )\ .addCallback( self._T_thread_end )\ .addErrback( self._T_errors ) def _T_thread_end( self , rval=None ): self.semaphoreLock.release() def _T_errors( self , x ): self._T_thread_end() raise x class _AnalyzeLinksService(ServiceScaffold): SEMAPHORE_TOKENS = 25 def __init__( self ): self.semaphoreService= defer.DeferredSemaphore( tokens=self.SEMAPHORE_TOKENS ) def action( self ): updater= AnalyzeLink() database.get_dbPool().runInteraction( updater.get_update_batch , queued_updates )\ .addCallback( self._action_2 )\ .addErrback( self._action_error ) def _action_2( self , queued_updates ): if len( queued_updates ): updates= [] for item in queued_updates: requestWrapper= _AnalyzeLinksRequestWrapper(\ semaphoreService = self.semaphoreService , dbConnectionPool = database.get_dbPool() ) result= requestWrapper.queue_thread( data=item ) updates.append(result) finished= defer.DeferredList( updates )\ .addCallback( self.deferred_list_finish ) else: d= defer.Deferred() self.deferred_list_finish( d ) def _action_error( self , raised ): log.debug("%s._action_error" % self.__class__.__name__ ) self.set_processing_status( False ) if isinstance( raised.value , database.DbRollback ): print "DB Rollback" raise raised elif isinstance( raised.value , database.DbRollbackOk ): print "DB Rollback ok" else: raise raised AnalyzeLinksService= _AnalyzeLinksService() class AnalyzeLinksService_Service(internet.TimerService): def __init__( self , dbConfigHash=None ): internet.TimerService.__init__( self, CHECK_PERIOD__IMPORT , AnalyzeLinksService.action )

This is a better way of using DeferredSemaphore: def queue_thread( self , data=None ): self.queued_data= data return self.semaphoreService.run( self._T_to_thread ) It handles acquisition and release for you. This will avoid any code path that might result in a double-release. On Wed, Sep 26, 2012 at 1:07 PM, Jonathan Vanasco <twisted-python@2xlp.com> wrote:
participants (5)
-
exarkun@twistedmatrix.com
-
Itamar Turner-Trauring
-
Jonathan Vanasco
-
Matthew Pounsett
-
Stephen Thorne