[scikit-learn] 2 million samples dataset caused python and OS crash

Liu James icefrog1950 at gmail.com
Fri Jan 8 00:33:34 EST 2021


Thanks for reply. I tested different size of data on different  distros
,and found when data is over 500 thousand rows (with 50 columns), the crash
will happened with same error message -- kernel page error.

Guillaume Lemaître <g.lemaitre58 at gmail.com> 于2021年1月6日周三 下午10:33写道:

> And it seems that the piece of traceback refer to NumPy.
>
> On Wed, 6 Jan 2021 at 12:48, Andrew Howe <ahowe42 at gmail.com> wrote:
>
>> A core dump generally happens when a process tries to access memory
>> outside it's allocated address space. You've not specified what estimator
>> you were using, but I'd guess it attempted to do something with the dataset
>> that resulted in it being duplicated or otherwise expanded beyond the
>> memory capacity. Perhaps the full stack trace would be helpful.
>>
>> Andrew
>>
>>
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>> J. Andrew Howe, PhD
>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>> Open Researcher and Contributor ID (ORCID)
>> <http://orcid.org/0000-0002-3553-1990>
>> Github Profile <http://github.com/ahowe42>
>> Personal Website <http://www.andrewhowe.com>
>> I live to learn, so I can learn to live. - me
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>
>>
>> On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1950 at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm using a medium dataset KDD99  IDS(
>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>>> for model training, and the dataset has 2 million  samples.  When using
>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>>> dumped core. Stack trace
>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>>
>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>>> unlimited.  Such crash can be reproduced.
>>>
>>> Thanks.
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210108/52c99889/attachment.html>


More information about the scikit-learn mailing list