[scikit-learn] Using Scikit-Learn to predict magnetism in chemical systems
Bill Ross
ross at cgl.ucsf.edu
Tue Mar 28 13:07:57 EDT 2017
I think I saw it in the Deep Learning book: http://www.deeplearningbook.org/
Bill
On 3/28/17 9:48 AM, Henrique C. S. Junior wrote:
> @Tommaso, this is something like Internal Coordinates[1], right?
> @Bill, thanks for the hint, I'll definitely take a look at this.
>
> [1] - https://en.wikipedia.org/wiki/Z-matrix_(chemistry)
> <https://en.wikipedia.org/wiki/Z-matrix_%28chemistry%29>
>
> On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <ross at cgl.ucsf.edu
> <mailto:ross at cgl.ucsf.edu>> wrote:
>
> Image processing deals with xy coordinates by (as I understand)
> training with multiple permutations of the raw data, in the form
> of translations and rotations in the 2d space. If training with 3d
> data, there would be that much more translating and rotating to
> do, in order to divorce the learning from the incidentals.
>
> Bill
>
>
> On 3/27/17 4:35 PM, Tommaso Costanzo wrote:
>> Dear Henrique,
>> I am sorry for the poor email I wrote before. What I was saying
>> is simply the fact that if you are trying to use the coordinates
>> as "features" from an .xyz file then by machine learning you will
>> learn at wich coordinate certain atoms will occur so you can only
>> make prediction on the coordinate. However, if I correctly
>> understood, the "features" representing the coupling J are
>> distance, angle, and electron number. Definitely this properties
>> can be derived from the XYZ file format from simple geometric
>> calculations and the number of electrons will depend from the
>> type of atom. So, what I was trying to say is that instead of
>> using the XYZ file as input for scikit-learn, I was suggesting to
>> do the calculation of angle, distances, electrons' number in
>> advance (with other software(s) or directly in python) and use
>> the new calculated matrix as input for scikit-learn. In this case
>> the machine will learn how J(AB) varies as a function of angle,
>> distance, number of electrons.
>> For example
>>
>> distance angle n el.
>> 1 90 1
>> 1 90 1
>> 2 90 1
>> .... ... ...
>>
>> If you are using a supervised learning you will have to add a 4th
>> column ( in reality a separate column vector) with your J(AB) on
>> which you can train your model and then predict the unknown samples
>>
>> For example
>> distance angle n el. J(AB)
>> 1 90 1 1
>> 1 90 1 1
>> 2 90 1 0.5
>> .... ... ... ...
>>
>> Now if you train the model on the second matrix, and then you try
>> to predict the first one you should expect a results like:
>>
>> 1
>> 1
>> 0.5
>>
>> Of course in this case the "features" are perfectly equal, hence
>> the example is completely unrealistic. However, I hope that it
>> will help to understand what I was explaining in the previous email.
>> If you want you can directly contact me at this email, and I hope
>> that you got additional hints from Robert, that he seems to be
>> even more knowledgeable than me.
>>
>> Sincerely
>> Tommaso
>>
>>
>>
>> 2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior
>> <henriquecsj at gmail.com <mailto:henriquecsj at gmail.com>>:
>>
>> Dear Tommaso, thank you for your kind reply.
>> I know I have a lot to study before actually starting any
>> code and that's why any suggestion is so valuable.
>> So, you're suggesting that a simplification of the system
>> using only the paramagnetic centers can be a good approach?
>> (I'm not sure if I understood it correctly).
>> My main idea was, at first, try to represent the systems as
>> realistically as possible (using coordinates). I know that
>> the software will not know what a bond is or what an
>> intermolecular interaction is but, let's say, after including
>> 1000s of examples in the training, I was expecting that (as
>> an example) finding a C 0.000 and an H at 1.000 should start
>> to "make sense" because it leads to an experimental trend.
>> And I totally agree that my way to represent the system is
>> not the better.
>>
>> Thank you so much for all the help.
>>
>> On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo
>> <tommaso.costanzo01 at gmail.com
>> <mailto:tommaso.costanzo01 at gmail.com>> wrote:
>>
>> Dear Henrique,
>>
>>
>> I agree with Robert on the use of a supervised algorithm
>> and I would also suggest you to try a semisupervised one
>> if you have trouble in labeling your data.
>>
>>
>> Moreover, as a chemist I think that the input you are
>> thinking to use is not the in the best form for machine
>> learning because you are trying to predict coupling J(AB)
>> but in the future space you have only coordinates (XYZ).
>> What I suggest is to generate the pair of atoms
>> externally and then use a matrix of the form (Mx3), where
>> M are the pairs of atoms you want to predict your J and 3
>> are the features of the two atoms (distance, angle,
>> unpaired electrons). For a supervised approach you will
>> need a training set where the J is know so your training
>> data will be of the form Mx4 and the fourth feature will
>> be the J you know.
>>
>> Hope that this is clear, if not I will be happy to help more
>>
>>
>> Sincerely
>>
>> Tommaso
>>
>>
>> 2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior
>> <henriquecsj at gmail.com <mailto:henriquecsj at gmail.com>>:
>>
>> Dear Robert, thank you. Yes, I'd like to talk about
>> some specifics on the project.
>> Thank you again.
>>
>> On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater
>> <rdslater at gmail.com <mailto:rdslater at gmail.com>> wrote:
>>
>> You definitely can use some of the tools in
>> sci-kit learn for supervised machine learning.
>> The real trick will be how well your training
>> system is representative of your future
>> predictions. All of the various regression
>> algorithms would be of some value and you make
>> even consider an ensemble to help generalize.
>> There will be some important questions to
>> answer--what kind of loss function do you want to
>> look at? I assumed regression (continuous
>> response) but it could also
>> classify--paramagnetic, diamagnetic,
>> ferromagnetic, etc...
>>
>> Another task to think about might be dimension
>> reduction.
>> There is no guarantee you will get fantastic
>> results--every problem is unique and much will
>> depend on exactly what you want out of the
>> solution--it may be that we get '10%' accuracy at
>> best--for some systems that is quite good, others
>> it is horrible.
>>
>> If you'd like to talk specifics, feel free to
>> contact me at this email. I have a background in
>> magnetism (PhD in magnetic multilayers--i was
>> physics, but as you are probably aware chemisty
>> and physics blend in this area) and have a fairly
>> good knowledge of sci-kit learn and machine
>> learning.
>>
>>
>>
>> On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S.
>> Junior <henriquecsj at gmail.com
>> <mailto:henriquecsj at gmail.com>> wrote:
>>
>> I'm a chemist with some rudimentary
>> programming skills (getting started with
>> python) and in the middle of the year I'll be
>> starting a Ph.D. project that uses computers
>> to describe magnetism in molecular systems.
>>
>> Most of the time I get my results after
>> several simulations and experiments, so, I
>> know that one of the hardest tasks in
>> molecular magnetism is to predict the nature
>> of magnetic interactions. That's why I'll try
>> to tackle this problem with Machine Learning
>> (because such interactions are dependent,
>> basically, of distances, angles and number of
>> unpaired electrons). The idea is to feed the
>> computer with a large training set (with
>> number of unpaired electrons, XYZ coordinates
>> of each molecule and experimental magnetic
>> couplings) and see if it can predict the
>> magnetic couplings (J(AB)) of new systems:
>>
>> (see example in the attached image)
>>
>> Can Scikit-Learn handle the task, knowing
>> that the matrix used to represent atomic
>> coordinates will probably have a different
>> number of atoms (because some molecules have
>> more atoms than others)? Or is this a job
>> better suited for another software/approach?
>>
>>
>> --
>> *Henrique C. S. Junior*
>> Industrial Chemist - UFRRJ
>> M. Sc. Inorganic Chemistry - UFRRJ
>> Data Processing Center - PMP
>> Visite o Mundo Químico
>> <http://mundoquimico.com.br>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> --
>> *Henrique C. S. Junior*
>> Industrial Chemist - UFRRJ
>> M. Sc. Inorganic Chemistry - UFRRJ
>> Data Processing Center - PMP
>> Visite o Mundo Químico <http://mundoquimico.com.br>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> --
>> Please do NOT send Microsoft Office Attachments:
>> http://www.gnu.org/philosophy/no-word-attachments.html
>> <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> --
>> *Henrique C. S. Junior*
>> Industrial Chemist - UFRRJ
>> M. Sc. Inorganic Chemistry - UFRRJ
>> Data Processing Center - PMP
>> Visite o Mundo Químico <http://mundoquimico.com.br>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> --
>> Please do NOT send Microsoft Office Attachments:
>> http://www.gnu.org/philosophy/no-word-attachments.html
>> <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
> _______________________________________________ scikit-learn
> mailing list scikit-learn at python.org
> <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
> --
> *Henrique C. S. Junior* Industrial Chemist - UFRRJ
> M. Sc. Inorganic Chemistry - UFRRJ Data Processing Center - PMP
> Visite o Mundo Químico <http://mundoquimico.com.br>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170328/b75a3b31/attachment-0001.html>
More information about the scikit-learn
mailing list