<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">@Tommaso, this is something like Internal Coordinates[1], right?</div><div class="gmail_default" style="font-family:monospace,monospace">@Bill, thanks for the hint, I'll definitely take a look at this.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">[1] - <a href="https://en.wikipedia.org/wiki/Z-matrix_(chemistry)">https://en.wikipedia.org/wiki/Z-matrix_(chemistry)</a></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <span dir="ltr"><<a href="mailto:ross@cgl.ucsf.edu" target="_blank">ross@cgl.ucsf.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Image processing deals with xy coordinates by (as I understand)
training with multiple permutations of the raw data, in the form
of translations and rotations in the 2d space. If training with 3d
data, there would be that much more translating and rotating to
do, in order to divorce the learning from the incidentals.</p><span class="HOEnZb"><font color="#888888">
<p>Bill<br>
</p></font></span><div><div class="h5">
<br>
<div class="m_-7569063688226978064moz-cite-prefix">On 3/27/17 4:35 PM, Tommaso Costanzo
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Dear Henrique,<br>
</div>
I am sorry for the poor email I wrote before.
What I was saying is simply the fact that if you
are trying to use the coordinates as "features"
from an .xyz file then by machine learning you
will learn at wich coordinate certain atoms will
occur so you can only make prediction on the
coordinate. However, if I correctly understood,
the "features" representing the coupling J are
distance, angle, and electron number. Definitely
this properties can be derived from the XYZ file
format from simple geometric calculations and
the number of electrons will depend from the
type of atom. So, what I was trying to say is
that instead of using the XYZ file as input for
scikit-learn, I was suggesting to do the
calculation of angle, distances, electrons'
number in advance (with other software(s) or
directly in python) and use the new calculated
matrix as input for scikit-learn. In this case
the machine will learn how J(AB) varies as a
function of angle, distance, number of
electrons. <br>
</div>
For example <br>
</div>
<br>
distance angle n el.<br>
1 90 1<br>
1 90 1<br>
2 90 1<br>
.... ... ...<br>
<br>
</div>
If you are using a supervised learning you will have
to add a 4th column ( in reality a separate column
vector) with your J(AB) on which you can train your
model and then predict the unknown samples<br>
<br>
</div>
For example <br>
distance angle n el. J(AB)<br>
1 90 1 1<br>
1 90 1 1<br>
2 90 1 0.5<br>
.... ... ... ...<br>
<br>
</div>
<div>Now if you train the model on the second matrix, and
then you try to predict the first one you should expect
a results like:<br>
<br>
1<br>
1<br>
0.5<br>
<br>
</div>
Of course in this case the "features" are perfectly equal,
hence the example is completely unrealistic. However, I
hope that it will help to understand what I was explaining
in the previous email.<br>
If you want you can directly contact me at this email, and
I hope that you got additional hints from Robert, that he
seems to be even more knowledgeable than me.<br>
</div>
<br>
</div>
Sincerely <br>
</div>
Tommaso<br>
<div>
<div><br>
<div><br>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2017-03-27 18:44 GMT-04:00 Henrique C.
S. Junior <span dir="ltr"><<a href="mailto:henriquecsj@gmail.com" target="_blank">henriquecsj@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_default" style="font-family:monospace,monospace">Dear Tommaso,
thank you for your kind reply.</div>
<div class="gmail_default" style="font-family:monospace,monospace">I know I have a
lot to study before actually starting any code and
that's why any suggestion is so valuable.</div>
<div class="gmail_default" style="font-family:monospace,monospace">So, you're
suggesting that a simplification of the system using
only the paramagnetic centers can be a good approach?
(I'm not sure if I understood it correctly).</div>
<div class="gmail_default" style="font-family:monospace,monospace">My main idea
was, at first, try to represent the systems as
realistically as possible (using coordinates). I know
that the software will not know what a bond is or what
an intermolecular interaction is but, let's say, after
including 1000s of examples in the training, I was
expecting that (as an example) finding a C 0.000 and an
H at 1.000 should start to "make sense" because it leads
to an experimental trend. And I totally agree that my
way to represent the system is not the better.</div>
<div class="gmail_default" style="font-family:monospace,monospace"><br>
</div>
<div class="gmail_default" style="font-family:monospace,monospace">Thank you so
much for all the help.</div>
</div>
<div class="m_-7569063688226978064HOEnZb">
<div class="m_-7569063688226978064h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Mar 27, 2017 at 4:15
PM, Tommaso Costanzo <span dir="ltr"><<a href="mailto:tommaso.costanzo01@gmail.com" target="_blank">tommaso.costanzo01@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">Dear
Henrique,</span></p>
<p style="margin:0px;text-indent:0px;font-family:"ubuntu";font-size:12pt"><br>
</p>
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">I
agree with Robert on the use of a supervised
algorithm and I would also suggest you to
try a semisupervised one if you have trouble
in labeling your data. </span></p>
<p style="margin:0px;text-indent:0px;font-family:"ubuntu";font-size:12pt"><br>
</p>
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">Moreover,
as a chemist I think that the input you are
thinking to use is not the in the best form
for machine learning because you are trying
to predict coupling J(AB) but in the future
space you have only coordinates (XYZ). What
I suggest is to generate the pair of atoms
externally and then use a matrix of the form
(Mx3), where M are the pairs of atoms you
want to predict your J and 3 are the
features of the two atoms (distance, angle,
unpaired electrons). For a supervised
approach you will need a training set where
the J is know so your training data will be
of the form Mx4 and the fourth feature will
be the J you know.</span></p>
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">Hope
that this is clear, if not I will be happy
to help more</span></p>
<p style="margin:0px;text-indent:0px;font-family:"ubuntu";font-size:12pt"><br>
</p>
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">Sincerely</span></p>
<p style="margin:0px;text-indent:0px"><span style="font-family:"ubuntu";font-size:12pt">Tommaso</span></p>
</div>
<div class="gmail_extra">
<div>
<div class="m_-7569063688226978064m_-419284271361902240h5"><br>
<div class="gmail_quote">2017-03-27 13:46
GMT-04:00 Henrique C. S. Junior <span dir="ltr"><<a href="mailto:henriquecsj@gmail.com" target="_blank">henriquecsj@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_default" style="font-family:monospace,monospace">Dear
Robert, thank you. Yes, I'd like to
talk about some specifics on the
project.</div>
<div class="gmail_default" style="font-family:monospace,monospace">Thank
you again.</div>
</div>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579HOEnZb">
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon,
Mar 27, 2017 at 2:25 PM, Robert
Slater <span dir="ltr"><<a href="mailto:rdslater@gmail.com" target="_blank">rdslater@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">You definitely
can use some of the tools in
sci-kit learn for supervised
machine learning. The real
trick will be how well your
training system is
representative of your
future predictions. All of
the various regression
algorithms would be of some
value and you make even
consider an ensemble to help
generalize. There will be
some important questions to
answer--what kind of loss
function do you want to look
at? I assumed regression
(continuous response) but it
could also
classify--paramagnetic,
diamagnetic, ferromagnetic,
etc...
<div><br>
</div>
<div>Another task to think
about might be dimension
reduction.</div>
<div>There is no guarantee
you will get fantastic
results--every problem is
unique and much will
depend on exactly what you
want out of the
solution--it may be that
we get '10%' accuracy at
best--for some systems
that is quite good, others
it is horrible.<br>
</div>
<div><br>
</div>
<div>If you'd like to talk
specifics, feel free to
contact me at this email.
I have a background in
magnetism (PhD in magnetic
multilayers--i was
physics, but as you are
probably aware chemisty
and physics blend in this
area) and have a fairly
good knowledge of sci-kit
learn and machine
learning. </div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828h5">On
Mon, Mar 27, 2017 at
10:50 AM, Henrique C.
S. Junior <span dir="ltr"><<a href="mailto:henriquecsj@gmail.com" target="_blank">henriquecsj@gmail.com</a>></span>
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828h5">
<div dir="ltr">
<div class="gmail_default" style="font-family:monospace,monospace">
<p style="margin:0cm 0cm 12pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(36,39,41)" lang="EN-US">I'm a chemist with some
rudimentary
programming
skills
(getting
started with
python) and in
the middle of
the year I'll
be starting a
Ph.D. project
that uses
computers to
describe
magnetism in
molecular
systems.<span></span></span></p>
<p style="margin:0cm 0cm 12pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(36,39,41)" lang="EN-US">Most of the time I get my
results after
several
simulations
and
experiments,
so, I know
that one of
the hardest
tasks in
molecular
magnetism is
to predict the
nature of
magnetic
interactions.
That's why
I'll try to
tackle this
problem with
Machine
Learning
(because such
interactions
are dependent,
basically, of
distances,
angles and
number of
unpaired
electrons).
The idea is to
feed the
computer with
a large
training set
(with number
of unpaired
electrons, XYZ
coordinates of
each molecule
and
experimental
magnetic
couplings) and
see if it can
predict the
magnetic
couplings
(J(AB)) of new
systems:<span></span></span></p>
</div>
<div>
<div class="gmail_default" style="font-family:monospace,monospace">(see example in the attached
image)</div>
<div class="gmail_default" style="font-family:monospace,monospace"><br>
</div>
<div class="gmail_default" style="font-family:monospace,monospace">Can Scikit-Learn handle the
task, knowing
that the
matrix used to
represent
atomic
coordinates
will probably
have a
different
number of
atoms (because
some molecules
have more
atoms than
others)? Or is
this a job
better suited
for another
software/approach?
</div>
<span class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828m_-1717598575983325084HOEnZb"><font color="#888888"><br>
</font></span></div>
<span class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828m_-1717598575983325084HOEnZb"><font color="#888888">
<div><br>
</div>
-- <br>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828m_-1717598575983325084m_-4201444065020757644gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace"><b><font color="#808080">Henrique C. S. Junior</font></b><br>
Industrial
Chemist -
UFRRJ</font></span></div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace">M. Sc.
Inorganic
Chemistry -
UFRRJ<br>
Data
Processing
Center - PMP</font><br>
</span></div>
</div>
<div><span style="color:rgb(139,139,139)"><font face="monospace, monospace">Visite o <a href="http://mundoquimico.com.br" target="_blank">Mundo Químico</a></font></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</font></span></div>
<br>
</div>
</div>
<span>______________________________<wbr>_________________<br>
scikit-learn mailing
list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</span></blockquote>
</div>
<br>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579m_6033336047822367828gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace"><b><font color="#808080">Henrique C. S. Junior</font></b><br>
Industrial
Chemist -
UFRRJ</font></span></div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace">M. Sc.
Inorganic
Chemistry -
UFRRJ<br>
Data
Processing
Center - PMP</font><br>
</span></div>
</div>
<div><span style="color:rgb(139,139,139)"><font face="monospace, monospace">Visite o <a href="http://mundoquimico.com.br" target="_blank">Mundo Químico</a></font></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
</div>
</div>
<div class="m_-7569063688226978064m_-419284271361902240m_-8383123951498439579gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr"><span></span><span>Please
do NOT send Microsoft Office Attachments:</span><br>
<div>
<a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="m_-7569063688226978064m_-419284271361902240gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace,
monospace"><b><font color="#808080">Henrique C. S. Junior</font></b><br>
Industrial Chemist
- UFRRJ</font></span></div>
<div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace,
monospace">M. Sc.
Inorganic
Chemistry - UFRRJ<br>
Data Processing
Center - PMP</font><br>
</span></div>
</div>
<div><span style="color:rgb(139,139,139)"><font face="monospace,
monospace">Visite o
<a href="http://mundoquimico.com.br" target="_blank">Mundo Químico</a></font></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="m_-7569063688226978064gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr"><span></span><span>Please do NOT send Microsoft
Office Attachments:</span><br>
<div>
<a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
</div>
</div>
</div>
<br>
<fieldset class="m_-7569063688226978064mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
scikit-learn mailing list
<a class="m_-7569063688226978064moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<a class="m_-7569063688226978064moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace"><b><font color="#808080">Henrique C. S. Junior</font></b><br>Industrial Chemist - UFRRJ</font></span></div><div dir="ltr"><span style="color:rgb(139,139,139)"><font face="monospace, monospace">M. Sc. Inorganic Chemistry - UFRRJ<br>Data Processing Center - PMP</font><br></span></div></div><div><span style="color:rgb(139,139,139)"><font face="monospace, monospace">Visite o <a href="http://mundoquimico.com.br" target="_blank">Mundo Químico</a></font></span></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
</div>