[capi-sig] Syntax for parsing tuples into C arrays

Yingjie Lan lanyjie at yahoo.com
Fri May 15 17:49:41 CEST 2009


Parsing tuples into C arrays is quite painful 
with current C API. Below is is an example
when you don't know the tuple size before hand:

....
//ref: http://docs.python.org/c-api/arg.html
PyObject* pyObj;
if(!PyArg_ParseTuple(args,"O",&pyObj) 
        || !pyTuple_Check(pyObj))
   return NULL;

tupleSize = PyTuple_Size(pyObj);
array = (double*) malloc(sizeof(double)*tupleSize);
for(i=0; i < tupleSize; i++ ) {
   tupleItem = PyTuple_GetItem(pyObj, i);
   if( PyFloat_Check(tupleItem) ) {
      array[i] = PyFloat_AsDouble(tupleItem);
   }else{
      printf("Error: tuple contains a non-float value");
      exit(1);
   }
}
....

I'd like to propose a syntax to facilitate that kind 
of tasks. There is already syntax for parsing strings:

{
const char * s; int i;
PyArg_ParseTyple(args, "s#", &s, &i);
}

Consider the string as a sequence of char's.
Then note that in python, a tuple of int's,
is also a sequence of int's. This 
perspective provides a natural way to 
extend the the use of "#" symbol. 
First, to denote a tuple as a sequence 
of a given type of objects, let's use
"(<type>:)". For example, to denote a tuple
as a sequence of integers: "(i:)". 
The ":" is chosen to denote possible 
repetition of the whole pattern inside
the parentheses. As for why choose ":", 
here is a bad story, but a good example:

Once upone a time there was a temple on
a hill, and there lived an old and a
young monk. One day the old monk told the
the young a story:
Once upone a time there was a temple on
a hill, and there lived an old and a
young monk. One day the old monk told the
the young a story:
...

OK, enough...but you got the idea :)

Now consider 3 different scenarios of parsing
tuples into arrays.

(1) Unkown size. Here are some examples:

{
//parsing a tuple and its size
int i, *t; 
PyArg_ParseTuple(args, "(i:)#", &t, &i);
}

{
//parsing a matrix
int i,j,**t;
PyArg_ParseTuple(args, "((i:):)##", &t, &i, &j);
for(a=0; a<i; a++){
   for(b=0; b<j; b++)
      printf("%d, ", t[a][b]);
   printf("\n");
}

}

Here is an informal explanation of "((i:):)##":
the '#' can be seen as a peeling operator, which
peels away the outer most ':'-parentheses pair "(.:)" 
that denote a "free sequence" (i.e. sequence of 
unknown size) with some pattern ".". 

For types like 's' or its similar types, pretend 
as if there is an invisible pair of parentheses 
around it. For example, "(s:)##" matches a tuple
of strings of some fixed length. Here the first
'#' peels away the parentheses, leaving only "s#",
which is equivalent to "(<char>:)#" in concept, 
where "<char>" denote the character type. 
As the second '#' is outside the tuple,
it is shared by all strings inside, so they
must have the same length.

If there is nothing to peel for a '#', it is 
considered a dangling '#', an error.

Now consider some more interesting examples:
{
int i, *t, *v; 
PyArg_ParseTuple(args, "(ii:)#", &t, &v, &i);
//args=(1,2,3,4,5,6)
//t={1,3,5}
//v={2,4,6}
//i=3
}

In that example, a pattern is provided inside the sequence "(ii:)",
which can repeat itself multiple times. The repeated times is 
extracted by the '#' symbol. Below is another example:

{
int i,j, *t, **v; 
PyArg_ParseTuple(args, "(i(i:):)##", &t, &v, &i, &j);
//args=(1,(3,),5,(7,))
//t={1,5}
//v={{3},{7}}
//i=2, j=1
}

And another one:

{
int i,j, **t, **v; 
PyArg_ParseTuple(args, "((i:)(i:):)##", &t, &v, &i, &j);
//args=((1,),(3,),(5,),(7,))
//t={{1},{5}}
//v={{3},{7}}
//i=2, j=1
}

Note the second '#' is applied to both the free sequences inside the
outer tuple. This indicates that all the inside tuples must be of 
the same length. What if they have their own common lengths? We
provide a separate matching '#' for each inside tuple:

{
int i,j,k, **t, **v; 
PyArg_ParseTuple(args, "((i:)(i:):)#(##)", &t, &v, &i, &j,&k);
//args=((1,),(3,4),(5,),(7,8))
//t={{1},{5}}
//v={{3,4},{7,8}}
//i=2, j=1, k=2
}

The syntax can also deal with more complicated situations, 
such as an irregular 2-dimensional array, where each row
has a different row length:

{
int i, *j, **t;
PyArg_ParseTuple(args, "((i:)#:)#", &t, &j, &i);
for(a=0; a<i; a++){
   for(b=0; b<j[i]; b++)
      printf("%d, ", t[a][b]);
   printf("\n");
}
//args=((1,2), (1,2,3))
//t={{1,2},{1,2,3}}
//i=2, j={2,3}
}

Here is a mixed example: three dimensional 
array 't' has fixed length on the third 
dimention 'k', but has variable length on the 
second dimension 'j':

{
int i, *j, k, ***t;
PyArg_ParseTuple(args, "(((i:):)#:)##", &t, &j, &i, &k);
//args=(((1,2),(1,2)),((1,2),))
//t={{{1,2},{1,2}},{{1,2}}}
//i=2, k=2
//j={2,1}
}

This syntax can be used for parsing a tuple of strings:

{
int i, *j; 
const char ** s;
PyArg_ParseTuple(args, "(s#:)#", &s, &j, &i);
//args=("ab", "cde")
//s={"ab", "cde"}
//j={2, 3}
//i=2
}

(2) If you know the size of a tuple before hand, 
use this syntax: "(<type>integer)", meaning there
are exactly 'integer' number of <type> objects in
the tuple.

{
int *t;
//
PyArg_ParseTuple(args, "(i5)", &t);
//args = (1,2,3,4,5)
//t = {1,2,3,4,5}
}

A complicated case:
{
int **t;
PyArg_ParseTuple(args, "((i5)2)", &t);
//args=((1,2,3,4,5),(5,6,7,8,9))
//t={{1,2,3,4,5},{5,6,7,8,9}}
}

Another complicated case:

{
int **t, **v, i;
PyArg_ParseTuple(args, "(i3i2:)#", &t,&v,&i);
//args=(1,2,3,4,5,6,7,8,9,10)
//t={{1,2,3},{6,7,8}}
//v={{4,5},{9,10}}
//i=2
}

One more thing: the difference between 
"(ii)" and "(i2)" is that the former
yields two single integers, while the 
latter gives an array of length two.

(3) Concatenating arrays while parsing. 
Sometimes you would like to have a matrix 
parsed into a single array. Let's use a "(+" 
to start a tuple, such that the arrays 
parsed from the elements in that tuple 
should be concatenated into a single array.

{
int i,j, *t;
PyArg_ParseTuple(args, "(+(i))##", &t, &i, &j);
//args=((1,2,3),(4,5,6))
//t={1,2,3,4,5,6}
//i=2, j=3
}

Another example:
{
int i, *t;
PyArg_ParseTuple(args, "(+(i2))#", &t, &i);
}

Still another example, with strings:
{
int i, *j;
const char * t; //string
PyArg_ParseTuple(args, "(+(+s#2))#", &t, &j, &i);
//args=(("ab","c"),("d","efg"))
//t="abcdefg"
//i=2
////notice the size arrays are contatenated similarly
//j={2,1,1,3} 

const char ** sa;
PyArg_ParseTuple(args, "(+(s2))", &sa);
//sa = {"ab","c","d","efg"}

PyArg_ParseTuple(args, "((+s2))", &sa);
//sa = { "abc", "defg" }
}

And finally, an example with multi-item pattern:

{
int *t, *v, i;
PyArg_ParseTuple(args, "(+i1i2:)#", &t, &v, &i);
//args=(1,2,3,4,5,6)
//t={1,4}
//v={2,3,5,6}
//i=2
}

Epilogue:

This syntax can be applied to lists: 
just use "[.:]" instead of "(.:)".
It can also be used for building tuples
or lists out of arrays. 

That's it. Comments?

-Yingjie



      


More information about the capi-sig mailing list