Fast lookup of bulky "table"

Dino dino at no.spam.ar
Sun Jan 15 08:20:01 EST 2023


Thank you for your answer, Lars. Just a clarification: I am already 
doing a rough measuring of my queries.

A fresh query without any caching: < 4s.

Cached full query: < 5 micro-s (i.e. 6 orders of magnitude faster)

Desired speed for my POC: 10 <ms

Also, I didn't want to ask a question with way too many "moving parts", 
but when I talked about the "table", it's actually a 100k long list of 
IDs. I can then use each ID to invoke an API that will return those 40 
attributes. The API is fast, but still, I am bound to loop through the 
whole thing to respond to the query, that's unless I pre-load the data 
into something that allows faster access.

Also, as you correctly observed, "looking good with my colleagues" is a 
nice-to-have feature at this point, not really an absolute requirement :)

Dino

On 1/15/2023 3:17 AM, Lars Liedtke wrote:
> Hey,
> 
> before you start optimizing. I would suggest, that you measure response 
> times and query times, data search times and so on. In order to save 
> time, you have to know where you "loose" time.
> 
> Does your service really have to load the whole table at once? Yes that 
> might lead to quicker response times on requests, but databases are 
> often very good with caching themselves, so that the first request might 
> be slower than following requests, with similar parameters. Do you use a 
> database, or are you reading from a file? Are you maybe looping through 
> your whole dataset on every request? Instead of asking for the specific 
> data?
> 
> Before you start introducing a cache and its added complexity, do you 
> really need that cache?
> 
> You are talking about saving microseconds, that sounds a bit as if you 
> might be “overdoing” it. How many requests will you have in the future? 
> At least in which magnitude and how quick do they have to be? You write 
> about 1-4 seconds on your laptop. But that does not really tell you that 
> much, because most probably the service will run on a server. I am not 
> saying that you should get a server or a cloud-instance to test against, 
> but to talk with your architect about that.
> 
> I totally understand your impulse to appear as good as can be, but you 
> have to know where you really need to debug and optimize. It will not be 
> advantageous for you, if you start to optimize for optimizing's sake. 
> Additionally if you service is a PoC, optimizing now might be not the 
> first thing you have to worry about, but about that you made everything 
> as simple and readable as possible and that you do not spend too much 
> time for just showing how it could work.
> 
> But of course, I do not know the tasks given to you and the expectations 
> you have to fulfil. All I am trying to say is to reconsider where you 
> really could improve and how far you have to improve.
> 
> 


More information about the Python-list mailing list