Python script seems to stop running when handling very large dataset

Dan Stromberg drsalists at gmail.com
Fri Oct 29 19:23:28 EDT 2021


On Fri, Oct 29, 2021 at 4:04 PM dn via Python-list <python-list at python.org>
wrote:

> On 30/10/2021 11.42, Shaozhong SHI wrote:
> > Python script works well, but seems to stop running at a certain point
> when
> > handling very large dataset.
> >
> > Can anyone shed light on this?
>
> Storage space?
> Taking time to load/format/process data-set?
>

It could be many things.

What operating system are you on?

If you're on Linux, you can use strace to attach to a running process to
see what it's up to.  Check out the -p option.  See
https://stromberg.dnsalias.org/~strombrg/debugging-with-syscall-tracers.html

macOS has dtruss, but it's a little hard to enable.  dtruss is similar to
strace.

Both of these tools are better for processes doing system calls (kernel
interactions).  They do not help nearly as much with CPU-bound processes.

It could also be that you're running out of virtual memory, and the
system's virtual memory system is thrashing.

Does the load average on the system go up significantly when the process
seems to get stuck?

You could try attaching to the process with a debugger, too, EG with pudb:
https://github.com/inducer/pudb/issues/31

Barring those, you could sprinkle some print statements in your code, to
see where it's getting stuck. This tends to be an iterative process, where
you add some prints, run, observe the result, and repeat.

HTH.


More information about the Python-list mailing list