LaTeX version of boolean indexing

Hello, I am documenting some code, translating the core of the algorithm to LaTeX. The style I have currently is very similar to the einsum syntax (which is awesome btw). Here <https://gist.github.com/mattharrigan/68b292e64381bba6b78a06a6f1762fa2> is an example of some of the basic operations in NumPy. One part I do not know how to capture well is boolean indexing, ie: mask = np.array([1, 0, 1]) x = np.array([1, 2, 3]) y = x[mask] Any suggestions on how to clearly, formally, and concisely show that operation? Also, are there any guides on translating NumPy to LaTeX? It might be helpful for documenting algorithms and also for people learning NumPy. Thank you, Matt

On Thu, Oct 11, 2018 at 6:54 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
That is fancy indexing with an index array rather than boolean indexing. That's why the result is [2, 1, 2] rather than [1, 3]. In case this is really what you need, it's the case of your indices originating from another sequence: `y_i = x_{m_i}` where `m_i` is your indexing sequence. For proper boolean indexing you lose the one-to-one correspondence between input and output (due to the size almost always changing), so you might not be able to formalize it this nicely with an index appearing in both sides. But something with an indicator might work... András

On Thu, Oct 11, 2018 at 7:45 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
What do you mean by indicator?
I mostly meant what wikipedia seems to call "set-builder notation" (https://en.wikipedia.org/wiki/Set-builder_notation#Sets_defined_by_a_predica...). Since your "input" is `{x_i | i in [0,1,2]}` but your output is a `y_j for j in [0,1]`, the straightforward thing I could think of was defining the set of valid `y_j` values (with an implicit assumption of the order being preserved, I guess). This would mean you can say something like `y_i \in {x_j | m_j}` (omitting the \left/\right/\vert fluff for simplicity here) where `m_j` are the elements of the boolean mask (say, `m = [True, False, True]`). In this context I'd understand it that `m_j` is the predicate and `x_j` are the corresponding values, however the notation isn't entirely ambiguous (see also a remark on the above wikipedia page) so you can't really get away with omitting further explanation in order to resolve ambiguity. Though I guess calling `m_j` elements of a mask would do the same thing. The other option that comes to mind is to define the auxiliary indices `n_i` for which `m_j` are True, then you of course denote the result with integer indices: `y_i = x_{n_i}` where `i` goes from 0 to the number of `True`s in `m_j`. But then you have the same difficulty defining `n_i`. All in all I'm not sure there's an elegant and concise notation for boolean masking. András

On Thu, Oct 11, 2018 at 6:54 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
That is fancy indexing with an index array rather than boolean indexing. That's why the result is [2, 1, 2] rather than [1, 3]. In case this is really what you need, it's the case of your indices originating from another sequence: `y_i = x_{m_i}` where `m_i` is your indexing sequence. For proper boolean indexing you lose the one-to-one correspondence between input and output (due to the size almost always changing), so you might not be able to formalize it this nicely with an index appearing in both sides. But something with an indicator might work... András

On Thu, Oct 11, 2018 at 7:45 PM Matthew Harrigan <harrigan.matthew@gmail.com> wrote:
What do you mean by indicator?
I mostly meant what wikipedia seems to call "set-builder notation" (https://en.wikipedia.org/wiki/Set-builder_notation#Sets_defined_by_a_predica...). Since your "input" is `{x_i | i in [0,1,2]}` but your output is a `y_j for j in [0,1]`, the straightforward thing I could think of was defining the set of valid `y_j` values (with an implicit assumption of the order being preserved, I guess). This would mean you can say something like `y_i \in {x_j | m_j}` (omitting the \left/\right/\vert fluff for simplicity here) where `m_j` are the elements of the boolean mask (say, `m = [True, False, True]`). In this context I'd understand it that `m_j` is the predicate and `x_j` are the corresponding values, however the notation isn't entirely ambiguous (see also a remark on the above wikipedia page) so you can't really get away with omitting further explanation in order to resolve ambiguity. Though I guess calling `m_j` elements of a mask would do the same thing. The other option that comes to mind is to define the auxiliary indices `n_i` for which `m_j` are True, then you of course denote the result with integer indices: `y_i = x_{n_i}` where `i` goes from 0 to the number of `True`s in `m_j`. But then you have the same difficulty defining `n_i`. All in all I'm not sure there's an elegant and concise notation for boolean masking. András
participants (2)
-
Andras Deak
-
Matthew Harrigan