From stefan_ml at behnel.de  Tue Jan  2 10:23:31 2018
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 2 Jan 2018 16:23:31 +0100
Subject: [Cython] Type Inference: Inter Procedural Analysis
In-Reply-To: <CACF6G9jQyMJ5V0ojrtyd_SnkBy_GbOcF0hVus0ncWoY5DYbALg@mail.gmail.com>
References: <CACF6G9jQyMJ5V0ojrtyd_SnkBy_GbOcF0hVus0ncWoY5DYbALg@mail.gmail.com>
Message-ID: <d423e804-99bb-0af7-edbf-a97eba277a98@behnel.de>

Hi!

Thanks for working on this!

usama hameed schrieb am 29.12.2017 um 19:31:
> I recently suggested implementing Inter-Procedural Analysis to infer
> function types and made the following Github issue
> <https://github.com/cython/cython/issues/1893>, and I was advised to
> communicate on this channel.
> 
> I went through the code base, and have implemented a rudimentary type
> inference system with inter procedural analysis of function types and
> arguments, and have handled recursive cases. However, the code base needs
> to be cleaned up a lot and is quite buggy right now. Furthermore, I am
> pretty sure a lot of edge cases need to be handled, i.e. closures etc.

I guess you are referring to this repository:

https://github.com/usama54321/Cython/commits/master


> The reason I am sending out this email is to get some suggestions. Right
> now, the code I have written is pretty hacky, since the current code base
> of the project does not accommodate much flexibility to perform inter
> procedural analysis.

Interesting. Could you elaborate on what you found missing or badly
designed? Would be interesting to know for us.

Here are a couple of comments on your changes:

1) The functionality looks really nice. Since you weren't accustomed with
the code base before, it's understandable that things aren't perfectly
integrated with the existing architecture. That can be cleaned up.

2) I was surprised to see that you didn't git-clone the existing repository
but created a new one from a source copy instead. But that's probably ok
for getting started because (I think) you wrote the code experimentally and
didn't focus that much on ready-to-merge commits anyway. Also, you
accidentally added .pyc and .so files. Those shouldn't be under version
control. It would probably be best to start over from a fresh clone and
apply your changes as patch.

3) The commits are a bit difficult to follow because the commit comments
are essentially free of information. It would help if you had used them to
explain the steps you took and what your intentions were.

4) Is there a reason why you didn't merge the Graph building with the
control flow analysis in FlowControl.py?

5) I can't see any test code, but since you implemented this in multiple
iterations, I'm sure that you had test code on your side that you tried to
compile. Could you add some examples that show how this change improves
things? There are hints on writing tests in the hacker guide:

https://github.com/cython/cython/wiki/HackerGuide#getting-started

Specifically, look at the "*infer*.pyx" file tests in tests/run/. I think
it would be best to add a new one.


> I found an enhancement suggestion
> <https://github.com/cython/cython/wiki/enhancements-typeinference> on the
> GitHub project, and was wondering whether this should be done first in
> order to make a more flexible type inference system before trying to
> properly implement inter procedural analysis into the project.

Type inference was implemented long ago and has been improved a couple of
times since then. It's not perfect, but it's actually quite good and can
further be improved in gradual steps. Inter-procedural analysis seems like
one such improvement.


> I just started on this as part of a university course project, but I want
> to continue working on this. I am not really familiar with the project's
> development ecosystem, and it would be really helpful if I'm given some
> guidance.

It would certainly be great to have this feature added. Could you explain
some of your design decisions? That would help me understand what you did
and why, so that I can start giving advice on where to go from here.

Generally speaking, I think it would be good if we could make it reuse more
of the existing infrastructure for type inference and control flow
analysis. I would only want to diverge from those if you could convince me
that this feature is fundamentally independent from what's there, but that
would surprise me. Correct me if I'm wrong, but what I would expect is
basically an incremental type inference that (partially) re-infers
functions when their dependencies (i.e. the functions that they call)
change their return type. Was the incremental part here something that you
found difficult?

Stefan

From drsalists at gmail.com  Sun Jan  7 01:57:17 2018
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 6 Jan 2018 22:57:17 -0800
Subject: [Cython] Segfault with large cdef'd list
Message-ID: <CAGGBd_oud58FEd1c=L+LwZfM+qMQuLAcMXZJxHYP1MnmmnoBkg@mail.gmail.com>

I'm getting a weird segfault from a tiny function (SSCCE) using cython
with python 2.7.  I'm seeing something similar with cython and python
3.5, though I did not create an SSCCE for 3.5.

This same code used to work with slightly older cythons and pythons,
and a slightly older version of Linux Mint.

The code is at http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk
(more specifically at
http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk/tst.pyx )

In short, cdef'ing a list of doubles with about a million elements,
and using only the 0th element once, segfaults - but cdef'ing a
slightly smaller array does not segfault under otherwise identical
conditions.

Any suggestions?  Does Cython have a limit on the max size of a stack frame?

Thanks!  I quite like cython.

From robertwb at math.washington.edu  Sun Jan  7 03:48:43 2018
From: robertwb at math.washington.edu (Robert Bradshaw)
Date: Sun, 7 Jan 2018 00:48:43 -0800
Subject: [Cython] Segfault with large cdef'd list
In-Reply-To: <CAGGBd_oud58FEd1c=L+LwZfM+qMQuLAcMXZJxHYP1MnmmnoBkg@mail.gmail.com>
References: <CAGGBd_oud58FEd1c=L+LwZfM+qMQuLAcMXZJxHYP1MnmmnoBkg@mail.gmail.com>
Message-ID: <CADiQ+QDbEeYCSHG6_JyhVRpLkQcBVaiBCwKMPMQTx4gcWmT6QA@mail.gmail.com>

Cython itself doesn't impose any limits, but it does inherit whatever
limit exists in the C complier and runtime. The variance may be due to
whatever else happens to be placed on the stack.

On Sat, Jan 6, 2018 at 10:57 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> I'm getting a weird segfault from a tiny function (SSCCE) using cython
> with python 2.7.  I'm seeing something similar with cython and python
> 3.5, though I did not create an SSCCE for 3.5.
>
> This same code used to work with slightly older cythons and pythons,
> and a slightly older version of Linux Mint.
>
> The code is at http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk
> (more specifically at
> http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk/tst.pyx )
>
> In short, cdef'ing a list of doubles with about a million elements,
> and using only the 0th element once, segfaults - but cdef'ing a
> slightly smaller array does not segfault under otherwise identical
> conditions.
>
> Any suggestions?  Does Cython have a limit on the max size of a stack frame?
>
> Thanks!  I quite like cython.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> https://mail.python.org/mailman/listinfo/cython-devel

From stefan_ml at behnel.de  Sun Jan  7 05:18:09 2018
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 7 Jan 2018 11:18:09 +0100
Subject: [Cython] Segfault with large cdef'd list
In-Reply-To: <CADiQ+QDbEeYCSHG6_JyhVRpLkQcBVaiBCwKMPMQTx4gcWmT6QA@mail.gmail.com>
References: <CAGGBd_oud58FEd1c=L+LwZfM+qMQuLAcMXZJxHYP1MnmmnoBkg@mail.gmail.com>
 <CADiQ+QDbEeYCSHG6_JyhVRpLkQcBVaiBCwKMPMQTx4gcWmT6QA@mail.gmail.com>
Message-ID: <463c1b55-7264-e680-4bb5-4b509aa60b3b@behnel.de>

Robert Bradshaw schrieb am 07.01.2018 um 09:48:
> On Sat, Jan 6, 2018 at 10:57 PM, Dan Stromberg wrote:
>> I'm getting a weird segfault from a tiny function (SSCCE) using cython
>> with python 2.7.  I'm seeing something similar with cython and python
>> 3.5, though I did not create an SSCCE for 3.5.
>>
>> This same code used to work with slightly older cythons and pythons,
>> and a slightly older version of Linux Mint.
>>
>> The code is at http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk
>> (more specifically at
>> http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk/tst.pyx )
>>
>> In short, cdef'ing a list of doubles with about a million elements,
>> and using only the 0th element once, segfaults - but cdef'ing a
>> slightly smaller array does not segfault under otherwise identical
>> conditions.
>>
>> Any suggestions?  Does Cython have a limit on the max size of a stack frame?
>>
> Cython itself doesn't impose any limits, but it does inherit whatever
> limit exists in the C complier and runtime. The variance may be due to
> whatever else happens to be placed on the stack.

Let me add that I wouldn't consider it a good idea to allocate large chunks
of memory on the stack. If it's meant to hold substantial amounts of data
(which also suggests that there is a substantial amount of processing
and/or copying involved), it's probably also worth a [PyMem_]malloc() call.
Heap allocation allows you to respond to allocation failures with a
MemoryError rather than a crash, as you get now. How much stack space you
have left is user controlled through call depth and recursion, which makes
it a somewhat easy target.

Stefan

From usama54321 at gmail.com  Mon Jan  8 01:21:34 2018
From: usama54321 at gmail.com (usama hameed)
Date: Mon, 8 Jan 2018 11:21:34 +0500
Subject: [Cython] Type Inference: Inter Procedural Analysis
In-Reply-To: <d423e804-99bb-0af7-edbf-a97eba277a98@behnel.de>
References: <CACF6G9jQyMJ5V0ojrtyd_SnkBy_GbOcF0hVus0ncWoY5DYbALg@mail.gmail.com>
 <d423e804-99bb-0af7-edbf-a97eba277a98@behnel.de>
Message-ID: <CACF6G9gADCf2QGmLPSuDdEnJqgdhZCZEYhZ14=Y47BasM8qAPQ@mail.gmail.com>

Hey!

Thank you for the reply.

Interesting. Could you elaborate on what you found missing or badly
designed? Would be interesting to know for us.

I'll elaborate a bit on what I had in mind while implementing inter
procedural analysis, and what I found to be a bit inflexible. Also, I am
attaching a progress report I wrote as part of the coursework, which
elaborates my overall strategy in a bit more depth.

Right now, I'm storing incoming and outgoing callsites of a function at the
function scope level. I think it makes more sense to store this information
at the AST node, however, that results in some limitations later on during
the Type Inference stage, as only information about the scope is passed to
the Inference System. Furthermore, I think I needed to add a transform
between the infer_types and the analyse_expressions stage, which are done
in a single Transform, but I think that's because my way of doing things
was hackish, and could be done in a lot better way.

After storing the callsites information at the scope level, I made a
separate Inter Procedural Inferer, that handles recursive functions
separately from non recursive functions. The non recursive case is handled
by traversing the call graph to the first function with no incoming nodes
(I have still not handled cycles in the graph, except recursion), and
starting type inference from there, and traversing down the graph until all
the descendants of this ancestor function have been inferred. The recursive
case is handled a bit separately, where I take all the return statements in
the function before the first recursive call, infer their type if it's
consistent, and then re-run type inference on the whole function. If the
result is consistent, i.e. all the return nodes have the same type, then
the type is inferred. Otherwise, the code falls back to PyObjects.

My current code breaks the compiler in certain cases, and I'm working on
fixing that. Furthermore, I think I'll need to mark return statements in
the FlowControl stage too in addition to the assignment statements, as they
can contain expressions too which I'll need to infer.

My overall plan had been to change as little of the core code as possible
to get a solution up and running, so as not to break anything. I think I
have commented out about 4,5 lines in the original repo only.

1) The functionality looks really nice. Since you weren't accustomed with
the code base before, it's understandable that things aren't perfectly
integrated with the existing architecture. That can be cleaned up.

2) I was surprised to see that you didn't git-clone the existing repository
but created a new one from a source copy instead. But that's probably ok
for getting started because (I think) you wrote the code experimentally and
didn't focus that much on ready-to-merge commits anyway. Also, you
accidentally added .pyc and .so files. Those shouldn't be under version
control. It would probably be best to start over from a fresh clone and
apply your changes as patch.

3) The commits are a bit difficult to follow because the commit comments
are essentially free of information. It would help if you had used them to
explain the steps you took and what your intentions were.

That's the repository I was working on. However, the code base is pretty
hacky now, and the commits aren't really consistent with their
descriptions. I was just developing experimentally, as I was not really
familiar with the code base. I'll fork the repo, and make some commits with
comments, and clean up the code. Once I've done that, and my code is a bit
more understandable, I'll

4) Is there a reason why you didn't merge the Graph building with the
control flow analysis in FlowControl.py?

My overall strategy was to change/edit as little of the core code as
possible. I'll merge it in FlowControl in the updated commits

5) I can't see any test code, but since you implemented this in multiple
iterations, I'm sure that you had test code on your side that you tried to
compile. Could you add some examples that show how this change improves
things? There are hints on writing tests in the hacker guide:

I'll add some of the test files I used locally in the tests in the new
commits.

Lastly, the Type Inference System I implemented is pretty simple, and might
have limitations that I'm not aware of. However, I would love to work on
this further, to fix/improve on the whole system. I do not know about
Incremental Type Inference, but if that's the way to go and my above
strategy is overly simplistic, I'll be happy to work on that too.

Usama

On Tue, Jan 2, 2018 at 8:23 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Hi!
>
> Thanks for working on this!
>
> usama hameed schrieb am 29.12.2017 um 19:31:
> > I recently suggested implementing Inter-Procedural Analysis to infer
> > function types and made the following Github issue
> > <https://github.com/cython/cython/issues/1893>, and I was advised to
> > communicate on this channel.
> >
> > I went through the code base, and have implemented a rudimentary type
> > inference system with inter procedural analysis of function types and
> > arguments, and have handled recursive cases. However, the code base needs
> > to be cleaned up a lot and is quite buggy right now. Furthermore, I am
> > pretty sure a lot of edge cases need to be handled, i.e. closures etc.
>
> I guess you are referring to this repository:
>
> https://github.com/usama54321/Cython/commits/master
>
>
> > The reason I am sending out this email is to get some suggestions. Right
> > now, the code I have written is pretty hacky, since the current code base
> > of the project does not accommodate much flexibility to perform inter
> > procedural analysis.
>
> Interesting. Could you elaborate on what you found missing or badly
> designed? Would be interesting to know for us.
>
> Here are a couple of comments on your changes:
>
> 1) The functionality looks really nice. Since you weren't accustomed with
> the code base before, it's understandable that things aren't perfectly
> integrated with the existing architecture. That can be cleaned up.
>
> 2) I was surprised to see that you didn't git-clone the existing repository
> but created a new one from a source copy instead. But that's probably ok
> for getting started because (I think) you wrote the code experimentally and
> didn't focus that much on ready-to-merge commits anyway. Also, you
> accidentally added .pyc and .so files. Those shouldn't be under version
> control. It would probably be best to start over from a fresh clone and
> apply your changes as patch.
>
> 3) The commits are a bit difficult to follow because the commit comments
> are essentially free of information. It would help if you had used them to
> explain the steps you took and what your intentions were.
>
> 4) Is there a reason why you didn't merge the Graph building with the
> control flow analysis in FlowControl.py?
>
> 5) I can't see any test code, but since you implemented this in multiple
> iterations, I'm sure that you had test code on your side that you tried to
> compile. Could you add some examples that show how this change improves
> things? There are hints on writing tests in the hacker guide:
>
> https://github.com/cython/cython/wiki/HackerGuide#getting-started
>
> Specifically, look at the "*infer*.pyx" file tests in tests/run/. I think
> it would be best to add a new one.
>
>
> > I found an enhancement suggestion
> > <https://github.com/cython/cython/wiki/enhancements-typeinference> on
> the
> > GitHub project, and was wondering whether this should be done first in
> > order to make a more flexible type inference system before trying to
> > properly implement inter procedural analysis into the project.
>
> Type inference was implemented long ago and has been improved a couple of
> times since then. It's not perfect, but it's actually quite good and can
> further be improved in gradual steps. Inter-procedural analysis seems like
> one such improvement.
>
>
> > I just started on this as part of a university course project, but I want
> > to continue working on this. I am not really familiar with the project's
> > development ecosystem, and it would be really helpful if I'm given some
> > guidance.
>
> It would certainly be great to have this feature added. Could you explain
> some of your design decisions? That would help me understand what you did
> and why, so that I can start giving advice on where to go from here.
>
> Generally speaking, I think it would be good if we could make it reuse more
> of the existing infrastructure for type inference and control flow
> analysis. I would only want to diverge from those if you could convince me
> that this feature is fundamentally independent from what's there, but that
> would surprise me. Correct me if I'm wrong, but what I would expect is
> basically an incremental type inference that (partially) re-infers
> functions when their dependencies (i.e. the functions that they call)
> change their return type. Was the incremental part here something that you
> found difficult?
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> https://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20180108/faa9c80d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: report2.odt
Type: application/vnd.oasis.opendocument.text
Size: 42030 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20180108/faa9c80d/attachment-0001.odt>

From drsalists at gmail.com  Mon Jan  8 15:36:24 2018
From: drsalists at gmail.com (Dan Stromberg)
Date: Mon, 8 Jan 2018 12:36:24 -0800
Subject: [Cython] Segfault with large cdef'd list
In-Reply-To: <463c1b55-7264-e680-4bb5-4b509aa60b3b@behnel.de>
References: <CAGGBd_oud58FEd1c=L+LwZfM+qMQuLAcMXZJxHYP1MnmmnoBkg@mail.gmail.com>
 <CADiQ+QDbEeYCSHG6_JyhVRpLkQcBVaiBCwKMPMQTx4gcWmT6QA@mail.gmail.com>
 <463c1b55-7264-e680-4bb5-4b509aa60b3b@behnel.de>
Message-ID: <CAGGBd_r8L5oQU-4YT6rv6gFNbXRjAGU1GSJVPD+ACx5Cho20xg@mail.gmail.com>

On Sun, Jan 7, 2018 at 2:18 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw schrieb am 07.01.2018 um 09:48:
>> Cython itself doesn't impose any limits, but it does inherit whatever
>> limit exists in the C complier and runtime. The variance may be due to
>> whatever else happens to be placed on the stack.
>
> Let me add that I wouldn't consider it a good idea to allocate large chunks
> of memory on the stack. If it's meant to hold substantial amounts of data
> (which also suggests that there is a substantial amount of processing
> and/or copying involved), it's probably also worth a [PyMem_]malloc() call.
> Heap allocation allows you to respond to allocation failures with a
> MemoryError rather than a crash, as you get now. How much stack space you
> have left is user controlled through call depth and recursion, which makes
> it a somewhat easy target.

Thanks - it's working now with malloc() and free().

Code at:
http://stromberg.dnsalias.org/svn/why-is-python-slow/trunk/cython3_types_t.pyx

It turns out the 2.x and 3.x versions are identical.  :)

From J.Demeyer at UGent.be  Thu Jan 25 06:12:14 2018
From: J.Demeyer at UGent.be (Jeroen Demeyer)
Date: Thu, 25 Jan 2018 12:12:14 +0100
Subject: [Cython] Multiple inheritance with old-style classes in Python 2?
Message-ID: <5A69BB8E.50601@UGent.be>

Do we want to support Python 2 old-style classes for multiple 
inheritance? Personally, I don't think that we should, but it's 
something that has to be decided.

The reason I ask is that my code from 
https://github.com/cython/cython/pull/2033 is actually broken when given 
an old-style class.

From elizabeth.fischer at columbia.edu  Thu Jan 25 12:40:41 2018
From: elizabeth.fischer at columbia.edu (Elizabeth A. Fischer)
Date: Thu, 25 Jan 2018 17:40:41 +0000
Subject: [Cython] Multiple inheritance with old-style classes in Python
 2?
In-Reply-To: <5A69BB8E.50601@UGent.be>
References: <5A69BB8E.50601@UGent.be>
Message-ID: <CAC_jL3yxBvkF7dcKT9ZayH9=wgGHPE-nRy_4=wOVyXADCx-PVQ@mail.gmail.com>

No.  Python2 is obsolete, old style classes even more so.

On Thu, Jan 25, 2018 at 12:22 PM Jeroen Demeyer <J.Demeyer at ugent.be> wrote:

> Do we want to support Python 2 old-style classes for multiple
> inheritance? Personally, I don't think that we should, but it's
> something that has to be decided.
>
> The reason I ask is that my code from
> https://github.com/cython/cython/pull/2033 is actually broken when given
> an old-style class.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> https://mail.python.org/mailman/listinfo/cython-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20180125/c14befbc/attachment.html>

From robertwb at gmail.com  Thu Jan 25 12:54:21 2018
From: robertwb at gmail.com (Robert Bradshaw)
Date: Thu, 25 Jan 2018 09:54:21 -0800
Subject: [Cython] Multiple inheritance with old-style classes in Python
 2?
In-Reply-To: <CAC_jL3yxBvkF7dcKT9ZayH9=wgGHPE-nRy_4=wOVyXADCx-PVQ@mail.gmail.com>
References: <5A69BB8E.50601@UGent.be>
 <CAC_jL3yxBvkF7dcKT9ZayH9=wgGHPE-nRy_4=wOVyXADCx-PVQ@mail.gmail.com>
Message-ID: <CADiQ+QA9rFTJdmDCxU4cVSB1Hf=ag3oDZtrYm3mhCwDJj7RdVw@mail.gmail.com>

No, we don't care about supporting this, but we should detect and
reject it informatively when possible.

On Thu, Jan 25, 2018 at 9:40 AM, Elizabeth A. Fischer
<elizabeth.fischer at columbia.edu> wrote:
> No.  Python2 is obsolete, old style classes even more so.
>
> On Thu, Jan 25, 2018 at 12:22 PM Jeroen Demeyer <J.Demeyer at ugent.be> wrote:
>>
>> Do we want to support Python 2 old-style classes for multiple
>> inheritance? Personally, I don't think that we should, but it's
>> something that has to be decided.
>>
>> The reason I ask is that my code from
>> https://github.com/cython/cython/pull/2033 is actually broken when given
>> an old-style class.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> https://mail.python.org/mailman/listinfo/cython-devel
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> https://mail.python.org/mailman/listinfo/cython-devel
>

From J.Demeyer at UGent.be  Thu Jan 25 15:25:48 2018
From: J.Demeyer at UGent.be (Jeroen Demeyer)
Date: Thu, 25 Jan 2018 21:25:48 +0100
Subject: [Cython] Multiple inheritance with old-style classes in Python
 2?
In-Reply-To: <6271c8c05e154bf087c39050d85e2ef8@xmail201.UGent.be>
References: <5A69BB8E.50601@UGent.be>
 <CAC_jL3yxBvkF7dcKT9ZayH9=wgGHPE-nRy_4=wOVyXADCx-PVQ@mail.gmail.com>
 <6271c8c05e154bf087c39050d85e2ef8@xmail201.UGent.be>
Message-ID: <5A6A3D4C.4000605@UGent.be>

On 2018-01-25 18:54, Robert Bradshaw wrote:
> No, we don't care about supporting this, but we should detect and
> reject it informatively when possible.

Good. I'll create a PR.