[Twisted-Python] Twisted tips for designing highly concurrent twisted REST API
Hello folks,
I recently stumbled upon twisted and was wondering if it could suit my
needs. On one hand, I want to use python but on another hand there are all
these scalability concerns with this language so, I though I would pick the
brains of the community. So.. a flask based app would look something like
this.
similar_types = ['foo', 'bar', 'baz']
def long_computation(rec_type):
# some long computation
return recs
@app.route('/fetch_similar_users/
Sorry I had a typo in twisted program
@defer.inlinecallbacks
def long_computation(rec_type, data):
# some long computation
*defer.returnValue(recs)*
@defer.inlinecallbacks
def fetch_data(user_id):
r = yieldjson.loads(requests.get('url/to/fetch/%s'%user_id).text)
defer.returnValue(r)
@defer.inlinecallbacks
def fetch_recs(user_id):
data = yield fetch_data(user_id)
recs = {}
for stype in similar_types:
*d = defer.ToThread(long_computation, *(stype, data)) // typo
was here*
rec = yield d
recs[stype] = rec
defer.returnValue(recs)
On Tue, Jun 25, 2019 at 11:48 PM Waqar Khan
Hello folks, I recently stumbled upon twisted and was wondering if it could suit my needs. On one hand, I want to use python but on another hand there are all these scalability concerns with this language so, I though I would pick the brains of the community. So.. a flask based app would look something like this.
similar_types = ['foo', 'bar', 'baz']
def long_computation(rec_type): # some long computation return recs
@app.route('/fetch_similar_users/
' def fetch_similar_users(user_id) r = json.loads(requests.get('url/to/fetch/%s'%user_id).text) recs = {} for stype in similar_types: recs[stype] = long_computation(rec_type) return recs Now, I tried to "twistify" but it failed.
@defer.inlinecallbacks
def long_computation(rec_type): # some long computation *defer.returnValue(recs)*
@defer.inlinecallbacks def fetch_data(user_id): r = yieldjson.loads(requests.get('url/to/fetch/%s'%user_id).text) defer.returnValue(r)
@defer.inlinecallbacks def fetch_recs(user_id): data = yield fetch_data(user_id) recs = {} for stype in similar_types: d = defer.ToThread(fetch_data, *(stype)) rec = yield d recs[stype] = rec defer.returnValue(recs)
I wrapped all the above in twisted render_Get method.. but then I did a load test with locust ( https://docs.locust.io/en/latest/what-is-locust.html) framework. It choked. As the time progressed, the response time increased. I am guessing, things are still blocking.
Can you please help me look into the right place. Why exactly am I seeing increase in response time as the time progresses. I am guessing things are still working in "blocking" fashion but i thought the above should run things in async. Thanks
Hi, There are likely a few things wrong here. 1. You are using requests.get() to make a HTTP request. This is blocking. You might consider using Twisted's Agent https://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.htm... API instead (or treq https://github.com/twisted/treq, which puts a requests-like API atop Agent). 2. As you add load your long computations will be queued. deferToThread https://twistedmatrix.com/documents/current/api/twisted.internet.threads.htm... dispatches the long_computation to the reactor's default thread pool https://twistedmatrix.com/documents/current/api/twisted.internet.interfaces..... This poll has a maximum size and will queue work once it has spun up that many threads. Rather than using deferToThread (which we should really deprecate as it doesn't accept a reactor parameter...) I'd recommend instantiating your own ThreadPool https://twistedmatrix.com/documents/current/api/twisted.python.threadpool.Th... and using deferToThreadPool https://twistedmatrix.com/documents/current/api/twisted.internet.threads.htm.... The reactor's own thread pool is really for DNS resolution. You risk deadlocks in a system that ThreadPoolThreadPoolThreadPool 3. The specifics of what long_computation are also important. If it doesn't release the GIL you won't get real parallelism (this is a Python thing, not a Twisted thing). See this recent thread on the topic https://twistedmatrix.com/pipermail/twisted-python/2019-June/032371.html. Though the mechanisms differ athis thread on the topicny of the above would cause the response time to increase as you add load. Good luck, Tom On Tue, Jun 25, 2019, at 11:51 PM, Waqar Khan wrote:
Sorry I had a typo in twisted program
*@defer.inlinecallbacks* *def **long_computation*(rec_type, data)*:
** **# some long computation ** *defer.returnValue(recs)**
@defer.inlinecallbacks def fetch_data(user_id): r *= yield*json.*loads*(requests.*get*('url/to/fetch/%s'*%**user_id*).text) defer.returnValue(r)
@defer.inlinecallbacks def fetch_recs(user_id): data = yield fetch_data(user_id) recs = {} for stype in similar_types: *d = defer.ToThread(long_computation, *(stype, data)) // typo was here* rec = yield d recs[stype] = rec defer.returnValue(recs)
On Tue, Jun 25, 2019 at 11:48 PM Waqar Khan
wrote: Hello folks, I recently stumbled upon twisted and was wondering if it could suit my needs. On one hand, I want to use python but on another hand there are all these scalability concerns with this language so, I though I would pick the brains of the community. So.. a flask based app would look something like this.
similar_types *= *['foo', 'bar', 'baz']
*def **long_computation*(rec_type)*: ** **# some long computation ** **return *recs
*@app.route*('/fetch_similar_users/
' *def **fetch_similar_users*(*user_id*) r *= *json.*loads*(requests.*get*('url/to/fetch/%s'*%**user_id*).text) recs *= *{} *for *stype *in *similar_types*: ** *recs[stype] *= **long_computation*(rec_type) *return *recs Now, I tried to "twistify" but it failed.
*@defer.inlinecallbacks* *def **long_computation*(rec_type)*: ** **# some long computation ** *defer.returnValue(recs)** @defer.inlinecallbacks def fetch_data(user_id): r *= yield*json.*loads*(requests.*get*('url/to/fetch/%s'*%**user_id*).text) defer.returnValue(r)
@defer.inlinecallbacks def fetch_recs(user_id): data = yield fetch_data(user_id) recs = {} for stype in similar_types: d = defer.ToThread(fetch_data, *(stype)) rec = yield d recs[stype] = rec defer.returnValue(recs)
I wrapped all the above in twisted render_Get method.. but then I did a load test with locust (https://docs.locust.io/en/latest/what-is-locust.html) framework. It choked. As the time progressed, the response time increased. I am guessing, things are still blocking.
Can you please help me look into the right place. Why exactly am I seeing increase in response time as the time progresses. I am guessing things are still working in "blocking" fashion but i thought the above should run things in async. Thanks
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Tuesday, 9 July 2019 22:04:11 BST Tom Most wrote: ...snip...
The reactor's own thread pool is really for DNS resolution.
Is that still true in the default case? We are use the twisted code that talks to DNS servers as the threaded resolver adds too much latency.
You risk deadlocks in a system that ThreadPoolThreadPoolThreadPool
3. The specifics of what long_computation are also important. If it doesn't release the GIL you won't get real parallelism (this is a Python thing, not a Twisted thing). See this recent thread on the topic https://twistedmatrix.com/pipermail/twisted-python/2019-June/032371.html.
We pass out the computational work to other processes over unix-domain-sockets to avoid the GIL issues.
Though the mechanisms differ athis thread on the topicny of the above would cause the response time to increase as you add load.
Good luck, Tom
Barry
Klein and Crossbar.io seem relevant as well
https://crossbario.com/blog/Going-Asynchronous-from-Flask-to-Twisted-Klein/
On Thu, Jul 11, 2019 at 1:46 AM Scott, Barry
On Tuesday, 9 July 2019 22:04:11 BST Tom Most wrote:
...snip...
The reactor's own thread pool is really for DNS resolution.
Is that still true in the default case? We are use the twisted code that talks to DNS servers as the threaded resolver adds too much latency.
You risk deadlocks in a system that ThreadPoolThreadPoolThreadPool
3. The specifics of what long_computation are also important. If it doesn't release the GIL you won't get real parallelism (this is a Python thing, not a Twisted thing). See this recent thread on the topic < https://twistedmatrix.com/pipermail/twisted-python/2019-June/032371.html>.
We pass out the computational work to other processes over unix-domain-sockets to avoid the GIL issues.
Though the mechanisms differ athis thread on the topicny of the above
would
cause the response time to increase as you add load.
Good luck, Tom
Barry
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Am 11.07.19 um 23:34 schrieb Sean DiZazzo:
Klein and Crossbar.io seem relevant as well
https://crossbario.com/blog/Going-Asynchronous-from-Flask-to-Twisted-Klein/
yeah, klein is neat! fwiw, this might also be of interest, as it allows to scale-up twisted web (and hence also klein) on multi-core (on linux) https://github.com/crossbario/crossbar-examples/tree/master/benchmark/web combining SO_REUSEPORT with Klein results in a concurrent, async (threadless) server parallelized via processes ..
On Thu, Jul 11, 2019 at 1:46 AM Scott, Barry
mailto:barry.scott@forcepoint.com> wrote: On Tuesday, 9 July 2019 22:04:11 BST Tom Most wrote:
...snip...
> The reactor's own thread pool is really for DNS > resolution.
Is that still true in the default case? We are use the twisted code that talks to DNS servers as the threaded resolver adds too much latency.
> You risk deadlocks in a system that > ThreadPoolThreadPoolThreadPool > > 3. The specifics of what long_computation are also important. If it doesn't > release the GIL you won't get real parallelism (this is a Python thing, not > a Twisted thing). See this recent thread on the topic > https://twistedmatrix.com/pipermail/twisted-python/2019-June/032371.html.
We pass out the computational work to other processes over unix-domain-sockets to avoid the GIL issues.
> > Though the mechanisms differ athis thread on the topicny of the above would > cause the response time to increase as you add load. > > Good luck, > Tom
Barry
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com mailto:Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
-- Tobias Oberstein - phone +49 176 2375 2055 - tobias.oberstein@crossbario.com Crossbar.io GmbH - Waldstrasse 18 - 91054 Erlangen HRB 15870 - Amtsgericht Fuerth - Geschäftsfuehrer/CEO - Tobias Oberstein https://crossbar.io https://crossbario.com
On Thu, Jul 11, 2019, at 1:46 AM, Scott, Barry wrote:
On Tuesday, 9 July 2019 22:04:11 BST Tom Most wrote:
...snip...
The reactor's own thread pool is really for DNS resolution.
Is that still true in the default case? We are use the twisted code that talks to DNS servers as the threaded resolver adds too much latency.
As far as I know, yes. The higher-level APIs use getaddrinfo() at least. https://twistedmatrix.com/documents/current/api/twisted.internet._resolver.G... https://github.com/twisted/twisted/blob/c0776850e756adfcdc179a7fd9e4c8f5cbc4... TCP6ClientEndpoint also invoke getaddrinfo() directly. twisted.names is certainly more performance but it's missing some system integration features that make it unsuitable as a default: * No support for the domain or search resolv.conf directives * No NSS lookups (e.g., systemd integration) This is all on Linux, YMMV on other platforms. ---Tom
Hi,
Thank you all for your kind response.
So, I am trying to use treq library
import treq
@defer.inlinecallbacks
def long_computation(rec_type, data):
# some long computation
*defer.returnValue(recs)*
@defer.inlinecallbacks
def fetch_data(user_id):
r = yield treq.get('url/to/fetch/%s'%user_id)
text = yield r.text()
defer.returnValue(text)
@defer.inlinecallbacks
def fetch_recs(user_id):
data = yield fetch_data(user_id)
recs = {}
for stype in similar_types:
*d = defer.ToThread(long_computation, *(stype, data)) // typo
was here*
Now, I do believe that the call is happening asyncronously. So.. yay..
But then, I feel like I have a misconception on how the yield works.
data = yield fetch_data(user_id)
I was hoping data here was actual data.. But it is a deferred.. Which makes
sense.
And then.. this deferred is being passed on instead of the actual data...
My couple of questions are:
1) What is the difference between data = yield fetch_data(user_id) and data
= fetch_data(user_id) (without yield). How does twisted handle these two ?
2) How do I actually send the data to long computation rather than a
deferred.
Appreciate all the help.
Thanks
On Sat, Jul 13, 2019 at 1:57 AM Tom Most
On Thu, Jul 11, 2019, at 1:46 AM, Scott, Barry wrote:
On Tuesday, 9 July 2019 22:04:11 BST Tom Most wrote:
...snip...
The reactor's own thread pool is really for DNS resolution.
Is that still true in the default case? We are use the twisted code that talks to DNS servers as the threaded resolver adds too much latency.
As far as I know, yes. The higher-level APIs use getaddrinfo() at least.
https://twistedmatrix.com/documents/current/api/twisted.internet._resolver.G...
https://github.com/twisted/twisted/blob/c0776850e756adfcdc179a7fd9e4c8f5cbc4...
TCP6ClientEndpoint also invoke getaddrinfo() directly.
twisted.names is certainly more performance but it's missing some system integration features that make it unsuitable as a default:
* No support for the domain or search resolv.conf directives * No NSS lookups (e.g., systemd integration)
This is all on Linux, YMMV on other platforms.
---Tom
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (5)
-
Scott, Barry
-
Sean DiZazzo
-
Tobias Oberstein
-
Tom Most
-
Waqar Khan