The correct way to incrementally build list in RPython
Hi, On an attempt to fix the hash function of numpy.dtype, I came up with the patch in the attachment translated from hashdescr.c in the original numpy source. The patch passed the test when running in pyinteractive.py. However, when started to translate pypy, I get an ListChangeUnallowed error. I read in the Coding Guide that list.append should be allowed (which also said there's a lot of special rules) and I have seen usage of append in other RPython modules. So what is the rule of using list.append and what is the best way of doing this? (I guess I can always using application level lists but is that the only/best way?) Other comments on this patch (style/naming etc) are also welcome. Yichao Yu
Hi Yichao, On 27 June 2014 04:09, Yichao Yu <yyc1992@gmail.com> wrote:
On an attempt to fix the hash function of numpy.dtype, I came up with the patch in the attachment translated from hashdescr.c in the original numpy source. The patch passed the test when running in pyinteractive.py. However, when started to translate pypy, I get an ListChangeUnallowed error.
That errors comes from space.newtuple(lst): it must be passed a list "lst" that is "not resized", that is, a list on which we never do an append() or pop() or similar. The standard fix is to do "lst = [None] * length" and then fill "lst[index] = ...". This only works if you know the final length in advance. A more heavy way to fix it is simply to call space.newtuple(lst[:]), making a fresh copy of the list. But in this case, a better fix would be to avoid the list and the space.newtuple() and space.hash() completely, and instead compute the hash value incrementally, by copying the logic in pypy/objspace/std/tupleobject.py:descr_hash(). Similarly, I'm not sure you need to wrap self.kind, 0, self.elsize and self.alignment and put them inside a tuple of length 4, when all you could do is some random computation with these three numbers. A bientôt, Armin.
On Fri, Jun 27, 2014 at 3:07 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Yichao,
On 27 June 2014 04:09, Yichao Yu <yyc1992@gmail.com> wrote:
On an attempt to fix the hash function of numpy.dtype, I came up with the patch in the attachment translated from hashdescr.c in the original numpy source. The patch passed the test when running in pyinteractive.py. However, when started to translate pypy, I get an ListChangeUnallowed error.
That errors comes from space.newtuple(lst): it must be passed a list "lst" that is "not resized", that is, a list on which we never do an append() or pop() or similar.
Hmmm. So the error actually happens when the compiler traces back function calls? I was trying to figure out what's wrong from inside my function T_T ...
The standard fix is to do "lst = [None] * length" and then fill "lst[index] = ...". This only works if you know the final length in advance. A more heavy way to fix it is simply to call space.newtuple(lst[:]), making a fresh copy of the list. But in this case, a better fix would be to avoid the list and the space.newtuple() and space.hash() completely, and instead compute the hash value incrementally, by copying the logic in pypy/objspace/std/tupleobject.py:descr_hash().
Thank you for your advices. I would definitely try that.
Similarly, I'm not sure you need to wrap self.kind, 0, self.elsize and self.alignment and put them inside a tuple of length 4, when all you could do is some random computation with these three numbers.
I agree. As I said, this is basically copied from the c implementation and this is what they do over there..... (btw, it is actually a tuple of length 5 with endian being the second element ;P ).
A bientôt,
Armin.
Yichao Yu
On Fri, Jun 27, 2014 at 3:27 PM, Yichao Yu <yyc1992@gmail.com> wrote:
On Fri, Jun 27, 2014 at 3:07 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Yichao,
On 27 June 2014 04:09, Yichao Yu <yyc1992@gmail.com> wrote:
On an attempt to fix the hash function of numpy.dtype, I came up with the patch in the attachment translated from hashdescr.c in the original numpy source. The patch passed the test when running in pyinteractive.py. However, when started to translate pypy, I get an ListChangeUnallowed error.
That errors comes from space.newtuple(lst): it must be passed a list "lst" that is "not resized", that is, a list on which we never do an append() or pop() or similar.
Hmmm. So the error actually happens when the compiler traces back function calls? I was trying to figure out what's wrong from inside my function T_T ...
The standard fix is to do "lst = [None] * length" and then fill "lst[index] = ...". This only works if you know the final length in advance. A more heavy way to fix it is simply to call space.newtuple(lst[:]), making a fresh copy of the list. But in this case, a better fix would be to avoid the list and the space.newtuple() and space.hash() completely, and instead compute the hash value incrementally, by copying the logic in pypy/objspace/std/tupleobject.py:descr_hash().
Thank you for your advices. I would definitely try that.
I've made a new version[1] and it seems to work. I am using the hash function from `compute_hash` instead of the improved one in tupleobject.py:descr_hash since the length is not known in advance. [1] https://bitbucket.org/pypy/pypy/pull-request/242/fix-hash-function-for-compl...
Similarly, I'm not sure you need to wrap self.kind, 0, self.elsize and self.alignment and put them inside a tuple of length 4, when all you could do is some random computation with these three numbers.
I am still creating a tuple out of these 5 numbers/strings since I don't want to duplicate the hash functions for all of them. Can the compiler deal with this case efficiently?
I agree. As I said, this is basically copied from the c implementation and this is what they do over there..... (btw, it is actually a tuple of length 5 with endian being the second element ;P ).
A bientôt,
Armin.
Yichao Yu
Hi Yichao, On 27 June 2014 12:17, Yichao Yu <yyc1992@gmail.com> wrote:
I am still creating a tuple out of these 5 numbers/strings since I don't want to duplicate the hash functions for all of them. Can the compiler deal with this case efficiently?
If by "deal with this case efficiently" you mean, will the compiler remove the tuple allocation and inline the 5-elements hash, then it might just be possible depending on a number of details, but I doubt it. Better to be explicit. I had in mind something like: my_hash = num1 * 1000003 my_hash = (my_hash ^ num2) * 1000003 my_hash = (my_hash ^ compute_hash(string3)) my_hash = intmask(my_hash) # at the end A bientôt, Armin.
On Fri, Jun 27, 2014 at 6:37 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Yichao,
On 27 June 2014 12:17, Yichao Yu <yyc1992@gmail.com> wrote:
I am still creating a tuple out of these 5 numbers/strings since I don't want to duplicate the hash functions for all of them. Can the compiler deal with this case efficiently?
If by "deal with this case efficiently" you mean, will the compiler remove the tuple allocation and inline the 5-elements hash, then it might just be possible depending on a number of details, but I doubt it. Better to be explicit. I had in mind something like:
my_hash = num1 * 1000003 my_hash = (my_hash ^ num2) * 1000003 my_hash = (my_hash ^ compute_hash(string3)) my_hash = intmask(my_hash) # at the end
Done
A bientôt,
Armin.
participants (2)
-
Armin Rigo
-
Yichao Yu