[Cython] New function (pointer) syntax.

Sat Nov 8 01:23:29 CET 2014

On Fri, Nov 7, 2014 at 11:29 AM, C Blake <cblake at pdos.csail.mit.edu> wrote:
> Robert Bradshaw robertwb at gmail.com wrote:
>>Quick: is that a pointer to an array or 10 pointers to ints? Yes, I
>>know what it is, but the thing is without knowing C (well) it's not
>>immediately obvious what the precedence should be.
>
> If you're gonna pick something e.g like that, it should not be something
> people see all the time like int main(int argc, char *argv[]).  ;-)
> *That* I recognized correctly in far less time than it took me to read
> your question text.

Yeah, it (coincidentally) parses nicely as (char *) argv[], or an
array of char_ptrs :-P.

> Here's a counter: printf("%s", *a[i]) - what does it do?  I submit that
> if *a[i] is hard or not "real quick" then, eh, what the real problem
> here may be that you've got enough syntaxes floating in your brain that
> you haven't internalized all the operator rules of this one.  In that
> case, nothing "operator-oriented" is going to make you truly happy in
> the long-run.
>
> I think this whole anti-declarator campaign is misguided.  I agree with
> Greg that experienced C programmers, such as C library writers know it.
> In trying to "fix it", you are just squeezing complexity jello maybe to
> oil the squeakiest wheel of casual users.  You can eek out a level or
> maybe two of simplicity, but you double up syntax.  You can make ()s
> better and make []s harder..middle-scale complexity better/full scale
> a little worse.  That kind of thing.  The net result of these attempts
> doesn't strike me as better or worse except that different is worse.

I strongly disagree that C declarators are at even a close to an
optimal point in a relatively flat parameter space. Googling "function
pointer syntax" or "c declarator" reinforces the point that this is
one of the most confusing and obtuse aspects of C(++) syntax. But I
admit it's hard to come up with an objective measure for how good a
syntax is...if it's natural to you than that's great.

> The closer to exactly C types you can get the more coherent the overall
> system is unless/until Python land has a commonly used type setup that
> is adequate.

Yes, being like C is an advantage.

> The ctypes module has a way to do this and Numba can use
> that, though I have little experience doing so.  The context is different
> in that in Cython you have more opportunity to create new syntax, but
> should you do so?
>
> That brings up a totally other note, Could you maybe compile-time eval()
> ctypes stuff if you really hate C decls and want to be more pythonic?
> If the answer to ctypes is "Err..  not terse/compact/part of the syntax
> enough", well, declarators are a fine answer to terseness, that's for
> sure.  Mostly operator text. ;-)

We've considered that, primarily to support as single codebase that
runs un-compiled, but if you cythonize it everything is faster (and,
possibly, more typesafe).

>>Cython's target audience includes lots of people who don't know C well.
>
> One leans differently based on what you think Cython "mostly" is..Bridge
> to libs, Its own thing, Compiled python, etc. (I know it's all the above).
>
>>If they were good, they would be easy to learn, no expert teaching required.
>
> All syntax requires learning..e.g, Stefan expressed a harder time than you
> visually unpacking some of the "->" exprs.  All teaching can be messed up.

Teacher: "So to declare a variable of type int, write "int x;"
Stundet: "int x;"
Teacher: "And to declare a variable of type char* write "char *s."
Student: "Oh, I think I see the pattern."
Teacher: "Good, let's move on..."

The fact that there's such an obvious, but erroneous, pattern is (one)
flaw that leads to such confusion.

> Teaching methods can fall into bad ruts.  In this case some teaching/practice
> actively blocks assimilation..possibly in a long-term sense like mishearing
> the lyrics of song or a person's name and then the actual case never sounding
> quite right for a long time.  Or people with strong accents of your spoken
> past dragging you back into accented speech yourself.  (Mis-)reinforced
> language is *tough* and can transcend good teaching.

Very true, but I think it's deeper than that. The problem is that we
*talk* about a variable having type "char*", and the compiler
*reasons* about these more complex types, but the syntax doesn't let
us actually say this directly.

Base types vs. complex types. Empty declarators that hold a place in
the syntax tree but are not seen in the text.

> Part of this thread
> was you & Stefan both having an implied question of "why do I read it one
> way when I darn well know it's the other".  I was trying to help answer
> that question.  Maybe I'm wrong about your individual case(s).  In my
> experience with people's trouble is that it's not just "tokens being on
> both sides".  Most people are used to [] and () being to the right.  It's
> active mis-reinforcement stuff like spacing/thinking const char instead of
> char const,.. that makes it hard.  Unless you can avoid declarator style
> 100%, it's better to make people confront it sooner than do half-measures.

I'm hoping we can avoid it 100% :-) for anyone who doesn't have to
actually interact with C.

>> syntax != semantics => baddness
>
> It's only "misperceived/taught/assimilated semantics"-syntax divergence.
> I do consider the misperception unfortunate and unnecessary at the outset.
> The "pairing semantics" that "feel" like they diverge from the syntax for
> you are 'weak/vague' and *should* have lower priority in your head.
> It's only really 1-type, 1-var pairs in function sigs.  Even you like
> "type varlist" in var or struct decls.  I mean, c'mon: "int i, j" rocks. :)

Note that the "i, j" is not an expression here, the comma is a
delimiter not an operator. Consider

    float *a = NULL

which is valid but subsequent

    *a = NULL

is not. Yep, the assignment in a declaration ignores the declarators,
pretending they were attached elsewhere.

> So, it's not always 1-1.  Sometimes it's 1-1, sometimes 1-many aka 1 to
> a non-trival expr.  If you go with a full expression instead of just a
> list, you get stuff in return as a bonus..

Yes, you can now declare variables of different type but the same base
type on a single line... To me, "I want a set of variables of the same
type" is much more common (and easy to explain) than "I want a set of
variables of same basetype."

> You get stuff and lose stuff.
> Declarators are not some hateful, hateful thing to always avoid - there
> are pros and cons like anything, not "zero pros except for history" as
> you seem to say.

There may be some pros, but excluding history I think they're greatly
outweighed by the cons...

If, hypothetically, C used "(int, int) -> int foo" syntax to declare
function references, do you think anyone would have proposed
"int(*foo)(int, int)" as a better alternative that we should consider?

> Anyway, "close but different" is the order of the day and is subjective.
> Keeping in mind how all three function type cases look - the def/cdef,
> call site, type cast/type spec - should be ever present in your syntax
> evaluations of all you guys, and so should less trivial cases like
> returning a function pointer from a lookup.  *Dropping/swapping* parts
> is arguably closer than changing how operators work.  *New* operators
> are better than making old ones multi-personality.  I don't like that
> cdef char *(..) approach on those grounds.  -1 is my $.02.
>
> The reason the lambda approach felt interesting is because the function
> value cases and type cases are close in the same sort of way and Python
> core syntax has func values if not pointer types.  This was also partly
> why you liked your (type1,type2)->type notation - it was just dropping py3
> annotation parts.  I think the biggest issue with that is py3 -> notations
> aren't in a whole lot of code and feel new in context rather than like
> just token dropping.  The lambda approach at least uses a sort of new
> operator (new in that context).
>
> Another whole wrinkle on this/way of thinking about close but diff, if
> this helps anyone, is error message interpretability.  If you define a
> method like cdef int foo(int bar, float baz) and you get back an expecting
> int (*)(int, float), you can compare the message and definition pretty
> easily.  If it was (int, float) -> int, it's harder..*unless* you were
> mandating 'def foo(a: int, b: float)->int:'.  Just another sort of
> angle to think at it from.  The lambda spelling of that might seem
> kinda confusing.

Yes, placing the return type on the rhs is a downside of the
arrow/lambda approach given what we've grown accustomed to in C (and
adopted in Cython).

- Robert