[scikit-learn] DBScan freezes my computer !!!

Sebastian Raschka mail at sebastianraschka.com
Sun May 13 20:16:16 EDT 2018


> So I suggest that there is a test version that shows a proper message when an error occurs.

I think the freezing that happens in your case is operating system specific and it would require some weird workarounds to detect at which RAM usage the combination of machine and operating system might freeze (i.e., I never observed my system freezing when I run out of RAM, since it has a pretty swift SSD, but the sklearn process may take a very long time to finish). Plus, scikit-learn would require to know and constantly check how much memory would be used and currently available (due to the use of other apps and the OS kernel), which wouldn't be feasible. 

I am not sure if this helps (depending where the memory-usage bottleneck is), but it could maybe help providing a sparse (CSR) array instead of a dense one to the .fit() method. Another thing to try would be to pre-compute the distances and give that to the .fit() method after initializing the DBSCAN object with metric='precomputed')

Best,
Sebastian

> On May 13, 2018, at 7:23 PM, Mauricio Reis <reismc at gmail.com> wrote:
> 
> I think the problem is due to the size of my database, which has 44,000 records. When I ran a database test with reduced sizes (10,000 and 20,000 first records), the routine ran normally.
> 
> You ask me to check the memory while running the DBScan routine, but I do not know how to do that (if I did, I would have done that already).
> 
> I think the routine is not ready to work with too much data. The problem is that my computer freezes and I can not analyze the case. I've tried to figure out if any changes work (like changing routine parameters), but all alternatives with lots of data (about 40,000 records) generate error.
> 
> I believe that package routines have no exception handling to improve performance. So I suggest that there is a test version that shows a proper message when an error occurs.
> 
> To summarize: 1) How to check the memory of the computer during the execution of the routine? 2) I suggest developing test versions of routines that may have a memory error.
> 
> Att.,
> Mauricio Reis
> 
> 2018-05-13 5:34 GMT-03:00 Roman Yurchak <rth.yurchak at gmail.com>:
> Could you please check memory usage while running DBSCAN to make sure freezing is due to running out of memory and not to something else?
> Which parameters do you run DBSCAN with? Changing algorithm, leaf_size parameters and ensuring n_jobs=1 could help.
> 
> Assuming eps is reasonable, I think it shouldn't be an issue to run DBSCAN on L2 normalized data: using the default euclidean metric, this should produce somewhat similar results to clustering not normalized data with metric='cosine'.
> 
> On 13/05/18 00:20, Andrew Nystrom wrote:
> If you’re l2 norming your data, you’re making it live on the surface of a hypershere. That surface will have a high density of points and may not have areas of low density, in which case the entire surface could be recognized as a single cluster if epsilon is high enough and min neighbors is low enough. I’d suggest not using l2 norm with DBSCAN.
> On Sat, May 12, 2018 at 7:27 AM Mauricio Reis <reismc at gmail.com <mailto:reismc at gmail.com>> wrote:
> 
>     The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my
>     computer without any warning message!
> 
>     I am using WinPython 3.6.5 64 bit.
> 
>     The method works normally with the original data, but freezes when I
>     use the normalized data (between 0 and 1).
> 
>     What should I do?
> 
>     Att.,
>     Mauricio Reis
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list