[Cython] CF based type inference

Wed May 9 10:28:31 CEST 2012

On 9 May 2012 09:02, Vitja Makarov <vitja.makarov at gmail.com> wrote:
> 2012/5/9 Stefan Behnel <stefan_ml at behnel.de>:
>> Robert Bradshaw, 09.05.2012 00:12:
>>> On Tue, May 8, 2012 at 6:47 AM, Vitja Makarov wrote:
>>>> 2012/5/8 Stefan Behnel:
>>>>> Vitja has rebased the type inference on the control flow, so I wonder if
>>>>> this will enable us to properly infer this:
>>>>>
>>>>>  def partial_validity():
>>>>>    """
>>>>>    >>> partial_validity()
>>>>>    ('Python object', 'double', 'str object')
>>>>>    """
>>>>>    a = 1.0
>>>>>    b = a + 2   # definitely double
>>>>>    a = 'test'
>>>>>    c = a + 'toast'  # definitely str
>>>>>    return typeof(a), typeof(b), typeof(c)
>>>>>
>>>>> I think, what is mainly needed for this is that a NameNode with an
>>>>> undeclared type should not report its own entry as dependency but that of
>>>>> its own cf_assignments. Would this work?
>>>>>
>>>>> (Haven't got the time to try it out right now, so I'm dumping it here.)
>>>>>
>>>>
>>>> Yeah, that might work. The other way to go is to split entries:
>>>>
>>>>  def partial_validity():
>>>>   """
>>>>   >>> partial_validity()
>>>>   ('str object', 'double', 'str object')
>>>>   """
>>>>   a_1 = 1.0
>>>>   b = a_1 + 2   # definitely double
>>>>   a_2 = 'test'
>>>>   c = a_2 + 'toast'  # definitely str
>>>>   return typeof(a_2), typeof(b), typeof(c)
>>>>
>>>> And this should work better because it allows to infer a_1 as a double
>>>> and a_2 as a string.
>>>
>>> This already works, right?
>>
>> It would work if it was implemented. *wink*
>>
>>
>>> I agree it's nicer in general to split
>>> things up, but not being able to optimize a loop variable because it
>>> was used earlier or later in a different context is a disadvantage of
>>> the current system.
>>
>> Absolutely. I was considering entry splitting more of a "soon, maybe not
>> now" type of thing because it isn't entire clear to me what needs to be
>> done. It may not even be all that hard to implement, but I think it's more
>> than just a local change in the scope implementation because the current
>> lookup_here() doesn't know what node is asking.
>>
>
> That could be done the following way:
>  - Before running type inference find independent assignment groups
> and split entries
>  - Run type infrerence
>  - Join entries of the same type or of PyObject base type
>  - Then change names to private ones "{old_name}.{index}"

Sounds like a good approach. Do you think it would be useful if a
variable can be type inferred at some point, but at no other point in
the function, to specialize for both the first type you find and
object? i.e.

i = 0
while something:
    use i
    i = something_not_inferred()

and specialize on 'i' being an int? Bonus points maybe :)

If these entries are different depending on control flow, it's
basically a form of ssa, which is cool. Then optimizations like
none-checking, boundschecking, wraparound etc can, for each new
variable insert a single check (for bounds checking it depends on the
entire expression, but...). The only thing I'm not entirely sure about
is this when the user eliminates your check through try/finally or
try/except, e.g.

try:
    buf[i]
except IndexError:
    print "no worries"

buf[i]

Here you basically want a new (virtual) reference of "i". Maybe that
could just be handled in the optimization transform though, where it
invalidates the previous check (especially since there is no
assignment here).

> --
> vitja.
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel