[scikit-learn] is Sci_kiet-Learn the right choice for my project

Brown J.B. jbbrown at kuhp.kyoto-u.ac.jp
Sat Oct 8 07:35:30 EDT 2022


Dear Mike,

Just my two cents about your inquiry, where I strictly a user of
scikit-learn for many years.

- From your description of application context, I would say that
scikit-learn is perfectly fine. However, I would suggest the awareness that
a monolithic model incorporating all data (as is the image TV wrongfully
projects) is not a valid strategy. Stratifying data into contextually
correct subgroups and then running scikit-learn, for example to estimate
during development the extent of predictability, will be helpful.
- Duplicate checking should be easy to use using standard python objects
(set or list counting), once the context derives how the objects are
vectorized/featurized. I don't see a need to force scikit-learn for that
context.
- Missing data could be implemented by context-specific object classes that
you design, which could contain something like a __bool__()  method that
could tell if you if the object has all of the required data populated and
configured.
- Detection of errors in configuration could be either explicitly driven by
logic (of the context, again something to return a bool that an object is
configured correctly), or potentially could be statistically derived as
outliers from the given background data distribution, in which then
scikit-learn could be of help. If there are too many variates (thousands or
tens of thousands) in your data that prohibit explicit logic, then
scikit-learn's Random Forest algorithms might be perfectly fine and provide
verification through visualization of Decision Tree rules.

Hope this helps,
J.B. Brown

2022年10月8日(土) 10:59 Mike Oliver <mo at globalsaassol.com>:

> Dear Sirs,
>
>
>
> I am evaluating SciKit-Learn for a new project.  I am hoping to find a AI
> Machine Learning package that can take a large dataset of objects that have
> various object types and attributes.  These objects are typically related
> to other objects, such as a server to a Wifi device, or two network routers
> to each other, etc.  When these objects are setup data is gathered about
> where they are located, what settings there are, the device type, etc.
>
>
>
> With large organizations there can be thousands of these objects and tens
> of thousands of relationships, descriptions, settings, etc.  My hope is
> that with machine learning we can detect when an object is missing, or
> configured in error, or duplicates.
>
>
>
> The question is, will SciKit-Learn help with this problem? I understand
> that we will have to train it to identify what to look for and then act on
> what was found and predicted to be the solution algorithm. Or instructions.
>
>
>
> Thanks for your help,
>
>
>
> Great looking product and already have the tutorial up and running and
> have installed it in my Django platform.
>
>
>
> Mike
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20221008/eb4d3df9/attachment-0001.html>


More information about the scikit-learn mailing list