Hi, First, I wanted to thank everybody who helped me to clarify many points concerning memory layout of numpy arrays; I think I have a much clearer idea of the way numpy arrays behave at the C level. I've used all those informations to correct my initial implementation of clip to improve the clip function for common cases: it speeds up things only for native endianness, and scalar min and max (both contiguous and non contiguous cases). I've attached the new version code (only for one type to avoid too big emails; you have to dl the archive to actually compile the implementation); the whole package with tests + profiling script is there: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/fastclip.tgz If this looks Ok, I will prepare a patch against current numpy, with the C sources being generated by numpy.distutils instead of the tool I am using now (autogen) Now, to improve other cases (mainly implementing an in-place clip function + non scalar min/max), there are some clarifications needed, mainly related to broadcast rules the current clip implementation which seems to break numpy conventions: 1: the old implementation returns an array which has the same endianness than the input array. This is a bit odd, because when the input is byte swapped, the returned array is still byte swapped, which seems to be against numpy convention. Here is some code which seem odd to me (code assumes little endian machine) a = numpy.random.randn(3, 2) b = a.astype(a.dtype.newbyteorder('>')) c = b.copy() assert a.dtype.isnative assert not b.dtype.isnative assert not c.dtype.isnative # Endianness behaviour of basic operation with numpy arrays print (a + b).dtype.isnative #one arg is non native -> returns native print (b + c).dtype.isnative # both args not native -> returns native # Now, what's happening endian-wise with clip: print numpy.clip(a, -0.5, 0.5).dtype.isnative # everything native -> returns native print numpy.clip(b, -0.5, 0.5).dtype.isnative # input array non native -> returns non native print numpy.clip(b, a, 0.5).dtype.isnative # input array non native, native array min -> returns native The fact that the output's endianness depends on min/max arguments being arrays or not does not seem really coherent ? 2: the old implementation does not upcast the input array. If the input is int32, and min/max are float32, the function fails; if input is float32, and min/max float64, the output is still float32. Again, this seems against the expected numpy behaviour ? 3: the old implementation supports clipping with complex arrays. I don't see any obvious meaningful implementation of clipping in those cases (using the module to compare them ?) If breaking those oddities is allowed, this would make the improvements much simpler to code, cheers, David