[scikit-learn] Using Scikit-Learn to predict magnetism in chemical systems

Henrique C. S. Junior henriquecsj at gmail.com
Tue Mar 28 12:48:14 EDT 2017


@Tommaso, this is something like Internal Coordinates[1], right?
@Bill, thanks for the hint, I'll definitely take a look at this.

[1] - https://en.wikipedia.org/wiki/Z-matrix_(chemistry)

On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <ross at cgl.ucsf.edu> wrote:

> Image processing deals with xy coordinates by (as I understand) training
> with multiple permutations of the raw data, in the form of translations and
> rotations in the 2d space. If training with 3d data, there would be that
> much more translating and rotating to do, in order to divorce the learning
> from the incidentals.
>
> Bill
>
> On 3/27/17 4:35 PM, Tommaso Costanzo wrote:
>
> Dear Henrique,
> I am sorry for the poor email I wrote before. What I was saying is simply
> the fact that if you are trying to use the coordinates as "features" from
> an .xyz file then by machine learning you will learn at wich coordinate
> certain atoms will occur so you can only make prediction on the coordinate.
> However, if I correctly understood, the "features" representing the
> coupling J are distance, angle, and electron number. Definitely this
> properties can be derived from the XYZ file format from simple geometric
> calculations and the number of electrons will depend from the type of atom.
> So, what I was trying to say is that instead of using the XYZ file as input
> for scikit-learn, I was suggesting to do the calculation of angle,
> distances, electrons' number in advance (with other software(s) or directly
> in python)  and use the new calculated matrix as input for scikit-learn. In
> this case the machine will learn how J(AB) varies as a function of angle,
> distance, number of electrons.
> For example
>
> distance     angle   n el.
> 1                  90      1
> 1                  90      1
> 2                  90      1
> ....                ...        ...
>
> If you are using a supervised learning you will have to add a 4th column (
> in reality a separate column vector) with your J(AB) on which you can train
> your model and then predict the unknown samples
>
> For example
> distance     angle   n el.    J(AB)
> 1                  90      1        1
> 1                  90      1        1
> 2                  90      1         0.5
> ....                ...        ...       ...
>
> Now if you train the model on the second matrix, and then you try to
> predict the first one you should expect a results like:
>
> 1
> 1
> 0.5
>
> Of course in this case the "features" are perfectly equal, hence the
> example is completely unrealistic. However, I hope that it will help to
> understand what I was explaining in the previous email.
> If you want you can directly contact me at this email, and I hope that you
> got additional hints from Robert, that he seems to be even more
> knowledgeable than me.
>
> Sincerely
> Tommaso
>
>
>
> 2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior <henriquecsj at gmail.com>:
>
>> Dear Tommaso, thank you for your kind reply.
>> I know I have a lot to study before actually starting any code and that's
>> why any suggestion is so valuable.
>> So, you're suggesting that a simplification of the system using only the
>> paramagnetic centers can be a good approach? (I'm not sure if I understood
>> it correctly).
>> My main idea was, at first, try to represent the systems as realistically
>> as possible (using coordinates). I know that the software will not know
>> what a bond is or what an intermolecular interaction is but, let's say,
>> after including 1000s of examples in the training, I was expecting that (as
>> an example) finding a C 0.000 and an H at 1.000 should start to "make
>> sense" because it leads to an experimental trend. And I totally agree that
>> my way to represent the system is not the better.
>>
>> Thank you so much for all the help.
>>
>> On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo <
>> tommaso.costanzo01 at gmail.com> wrote:
>>
>>> Dear Henrique,
>>>
>>>
>>> I agree with Robert on the use of a supervised algorithm and I would
>>> also suggest you to try a semisupervised one if you have trouble in
>>> labeling your data.
>>>
>>>
>>> Moreover, as a chemist I think that the input you are thinking to use is
>>> not the in the best form for machine learning because you are trying to
>>> predict coupling J(AB) but in the future space you have only coordinates
>>> (XYZ). What I suggest is to generate the pair of atoms externally and then
>>> use a matrix of the form (Mx3), where M are the pairs of atoms you want to
>>> predict your J and 3 are the features of the two atoms (distance, angle,
>>> unpaired electrons). For a supervised approach you will need a training set
>>> where the J is know so your training data will be of the form Mx4 and the
>>> fourth feature will be the J you know.
>>>
>>> Hope that this is clear, if not I will be happy to help more
>>>
>>>
>>> Sincerely
>>>
>>> Tommaso
>>>
>>> 2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior <henriquecsj at gmail.com>
>>> :
>>>
>>>> Dear Robert, thank you. Yes, I'd like to talk about some specifics on
>>>> the project.
>>>> Thank you again.
>>>>
>>>> On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater <rdslater at gmail.com>
>>>> wrote:
>>>>
>>>>> You definitely can use some of the tools in sci-kit learn for
>>>>> supervised machine learning.  The real trick will be how well your training
>>>>> system is representative of your future predictions.  All of the various
>>>>> regression algorithms would be of some value and you make even consider an
>>>>> ensemble to help generalize.  There will be some important questions to
>>>>> answer--what kind of loss function do you want to look at?  I assumed
>>>>> regression (continuous response) but it could also classify--paramagnetic,
>>>>> diamagnetic, ferromagnetic, etc...
>>>>>
>>>>> Another task to think about might be dimension reduction.
>>>>> There is no guarantee you will get fantastic results--every problem is
>>>>> unique and much will depend on exactly what you want out of the
>>>>> solution--it may be that we get '10%' accuracy at best--for some systems
>>>>> that is quite good, others it is horrible.
>>>>>
>>>>> If you'd like to talk specifics, feel free to contact me at this
>>>>> email.  I have a background in magnetism (PhD in magnetic multilayers--i
>>>>> was physics, but as you are probably aware chemisty and physics blend in
>>>>> this area) and have a fairly good knowledge of sci-kit learn and machine
>>>>> learning.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S. Junior <
>>>>> henriquecsj at gmail.com> wrote:
>>>>>
>>>>>> I'm a chemist with some rudimentary programming skills (getting
>>>>>> started with python) and in the middle of the year I'll be starting a Ph.D.
>>>>>> project that uses computers to describe magnetism in molecular systems.
>>>>>>
>>>>>> Most of the time I get my results after several simulations and
>>>>>> experiments, so, I know that one of the hardest tasks in molecular
>>>>>> magnetism is to predict the nature of magnetic interactions. That's why
>>>>>> I'll try to tackle this problem with Machine Learning (because such
>>>>>> interactions are dependent, basically, of distances, angles and number of
>>>>>> unpaired electrons). The idea is to feed the computer with a large training
>>>>>> set (with number of unpaired electrons, XYZ coordinates of each molecule
>>>>>> and experimental magnetic couplings) and see if it can predict the magnetic
>>>>>> couplings (J(AB)) of new systems:
>>>>>> (see example in the attached image)
>>>>>>
>>>>>> Can Scikit-Learn handle the task, knowing that the matrix used to
>>>>>> represent atomic coordinates will probably have a different number of atoms
>>>>>> (because some molecules have more atoms than others)? Or is this a job
>>>>>> better suited for another software/approach? ​
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Henrique C. S. Junior*
>>>>>> Industrial Chemist - UFRRJ
>>>>>> M. Sc. Inorganic Chemistry - UFRRJ
>>>>>> Data Processing Center - PMP
>>>>>> Visite o Mundo Químico <http://mundoquimico.com.br>
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Henrique C. S. Junior*
>>>> Industrial Chemist - UFRRJ
>>>> M. Sc. Inorganic Chemistry - UFRRJ
>>>> Data Processing Center - PMP
>>>> Visite o Mundo Químico <http://mundoquimico.com.br>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>>
>>> --
>>> Please do NOT send Microsoft Office Attachments:
>>> http://www.gnu.org/philosophy/no-word-attachments.html
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>>
>> --
>> *Henrique C. S. Junior*
>> Industrial Chemist - UFRRJ
>> M. Sc. Inorganic Chemistry - UFRRJ
>> Data Processing Center - PMP
>> Visite o Mundo Químico <http://mundoquimico.com.br>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
>
> --
> Please do NOT send Microsoft Office Attachments:
> http://www.gnu.org/philosophy/no-word-attachments.html
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
*Henrique C. S. Junior*
Industrial Chemist - UFRRJ
M. Sc. Inorganic Chemistry - UFRRJ
Data Processing Center - PMP
Visite o Mundo Químico <http://mundoquimico.com.br>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170328/4b9378e3/attachment-0001.html>


More information about the scikit-learn mailing list