[Numpy-discussion] array slicing questions

Mon Jul 30 17:59:12 EDT 2012

Hi,

A partial answer to your questions:

On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom
<vlastimil.brom at gmail.com>wrote:

> Hi all,
> I'd like to ask for some hints or advice regarding the usage of
> numpy.array and especially  slicing.
>
> I only recently tried numpy and was impressed by the speedup in some
> parts of the code, hence I suspect, that I might miss some other
> oportunities in this area.
>
> I currently use the following code for a simple visualisation of the
> search matches within the text, the arrays are generally much larger
> than the sample - the texts size is generally hundreds of kilobytes up
> to a few MB - with an index position for each character.
> First there is a list of spans(obtained form the regex match objects),
> the respective character indices in between these slices should be set
> to 1:
>
> >>> import numpy
> >>> characters_matches = numpy.zeros(10)
> >>> matches_spans = numpy.array([[2,4], [5,9]])
> >>> for start, stop in matches_spans:
> ...     characters_matches[start:stop] = 1
> ...
> >>> characters_matches
> array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])
>
> Is there maybe a way tu achieve this in a numpy-only way - without the
> python loop?
> (I got the impression, the powerful slicing capabilities could make it
> possible, bud haven't found this kind of solution.)
>
>
> In the next piece of code all the character positions are evaluated
> with their "neighbourhood" and a kind of running proportions of the
> matched text parts are computed (the checks_distance could be
> generally up to the order of the half the text length, usually less :
>
> >>>
> >>> check_distance = 1
> >>> floating_checks_proportions = []
> >>> for i in numpy.arange(len(characters_matches)):
> ...     lo = i - check_distance
> ...     if lo < 0:
> ...         lo = None
> ...     hi = i + check_distance + 1
> ...     checked_sublist = characters_matches[lo:hi]
> ...     proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0))
> ...     floating_checks_proportions.append(proportion)
> ...
> >>> floating_checks_proportions
> [0.0, 0.33333333333333331, 0.66666666666666663, 0.66666666666666663,
> 0.66666666666666663, 0.66666666666666663, 1.0, 1.0,
> 0.66666666666666663, 0.33333333333333331]
> >>>
>
Define a function for proportions:

from numpy import r_

from numpy.lib.stride_tricks import as_strided as ast

def proportions(matches, distance= 1):

    cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0]

    # pad

    m= r_[[0.]* cd, matches, [0.]* cd]

    # create a suitable view

    m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s))

    # average
    return m[:-2* cd].sum(1)/ cd2p1
and use it like:
In []: matches
Out[]: array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])

In []: proportions(matches).round(2)
Out[]: array([ 0.  ,  0.33,  0.67,  0.67,  0.67,  0.67,  1.  ,  1.  ,
 0.67,  0.33])
In []: proportions(matches, 5).round(2)
Out[]: array([ 0.27,  0.36,  0.45,  0.55,  0.55,  0.55,  0.55,  0.55,
 0.45,  0.36])

>
> I'd like to ask about the possible better approaches, as it doesn't
> look very elegant to me, and I obviously don't know the implications
> or possible drawbacks of numpy arrays in some scenarios.
>
> the pattern
> for i in range(len(...)): is usually considered inadequate in python,
> but what should be used in this case as the indices are primarily
> needed?
> is something to be gained or lost using (x)range or np.arange as the
> python loop is (probably?) inevitable anyway?
>
Here np.arange(.) will create a new array and potentially wasting memory if
it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you
really need to loop ;).

> Is there some mor elegant way to check for the "underflowing" lower
> bound "lo" to replace with None?
>
> Is it significant, which container is used to collect the results of
> the computation in the python loop - i.e. python list or a numpy
> array?
> (Could possibly matplotlib cooperate better with either container?)
>
> And of course, are there maybe other things, which should be made
> better/differently?
>
> (using Numpy 1.6.2, python 2.7.3, win XP)
>

My 2 cents,
-eat

> Thanks in advance for any hints or suggestions,
>    regards,
>   Vlastimil Brom
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120731/d4ff93f1/attachment.html>