[Python-Dev] Yet another "A better story for multi-core Python" comment

Tue Sep 8 22:56:39 CEST 2015

Maybe you just have a job for Cap'n'proto?
https://capnproto.org/

On 8 September 2015 at 11:12, Gary Robinson <garyrob at me.com> wrote:
> Folks,
>
> If it’s out of line in some way for me to make this comment on this list, let me know and I’ll stop! But I do feel strongly about one issue and think it’s worth mentioning, so here goes.
>
> I read the "A better story for multi-core Python” with great interest because the GIL has actually been a major hindrance to me. I know that for many uses, it’s a non-issue. But it was for me.
>
> My situation was that I had a huge (technically mutable, but unchanging) data structure which needed a lot of analysis. CPU time was a major factor — things took days to run. But even so, my time as a programmer was much more important than CPU time. I needed to prototype different algorithms very quickly. Even Cython would have slowed me down too much. Also, I had a lot of reason to want to make use of the many great statistical functions in SciPy, so Python was an excellent choice for me in that way.
>
> So, even though pure Python might not be the right choice for this program in a production environment, it was the right choice for me at the time. And, if I could have accessed as many cores as I wanted, it may have been good enough in production too. But my work was hampered by one thing:
>
> There was a huge data structure that all the analysis needed to access. Using a database would have slowed things down too much. Ideally, I needed to access this same structure from many cores at once. On a Power8 system, for example, with its larger number of cores, performance may well have been good enough for production. In any case, my experimentation and prototyping would have gone more quickly with more cores.
>
> But this data structure was simply too big. Replicating it in different processes used memory far too quickly and was the limiting factor on the number of cores I could use. (I could fork with the big data structure already in memory, but copy-on-write issues due to reference counting caused multiple copies to exist anyway.)
>
> So, one thing I am hoping comes out of any effort in the “A better story” direction would be a way to share large data structures between processes. Two possible solutions:
>
> 1) More the reference counts away from data structures, so copy-on-write isn’t an issue. That sounds like a lot of work — I have no idea whether it’s practical. It has been mentioned in the “A better story” discussion, but I wanted to bring it up again in the context of my specific use-case. Also, it seems worth reiterating that even though copy-on-write forking is a Unix thing, the midipix project appears to bring it to Windows as well. (http://midipix.org)
>
> 2) Have a mode where a particular data structure is not reference counted or garbage collected. The programmer would be entirely responsible for manually calling del on the structure if he wants to free that memory. I would imagine this would be controversial because Python is currently designed in a very different way. However, I see no actual risk if one were to use an @manual_memory_management decorator or some technique like that to make it very clear that the programmer is taking responsibility. I.e., in general, information sharing between subinterpreters would occur through message passing. But there would be the option of the programmer taking responsibility of memory management for a particular structure. In my case, the amount of work required for this would have been approximately zero — once the structure was created, it was needed for the lifetime of the process.
>
> Under this second solution, there would be little need to actually remove the reference counts from the data structures — they just wouldn’t be accessed. Maybe it’s not a practical solution, if only because of the overhead of Python needing to check whether a given structure is manually managed or not. In that case, the first solution makes more sense.
>
> In any case I thought this was worth mentioning,  because it has been a real problem for me, and I assume it has been a real problem for other people as well. If a solution is both possible and practical, that would be great.
>
> Thank you for listening,
> Gary
>
>
> --
>
> Gary Robinson
> garyrob at me.com
> http://www.garyrobinson.net
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br