[Numpy-discussion] relational join

Wes McKinney wesmckinn at gmail.com
Wed Feb 2 17:24:03 EST 2011


On Wed, Feb 2, 2011 at 4:46 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Feb 2, 2011 at 21:42, Ilya Shlyakhter <ilya_shl at alum.mit.edu> wrote:
>> Does numpy have a relational join operation for joining recordarrays?
>
> [~]
> |1> from numpy.lib import recfunctions
>
> [~]
> |2> recfunctions.join_by?
> Type:           function
> Base Class:     <type 'function'>
> String Form:    <function join_by at 0x17bba30>
> Namespace:      Interactive
> File:
> /Library/Frameworks/Python.framework/Versions/6.3/lib/python2.6/site-packages/numpy/lib/recfunctions.py
> Definition:     recfunctions.join_by(key, r1, r2, jointype='inner',
> r1postfix='1', r2postfix='2', defaults=None, usemask=True,
> asrecarray=False)
> Docstring:
>    Join arrays `r1` and `r2` on key `key`.
>
>    The key should be either a string or a sequence of string corresponding
>    to the fields used to join the array.
>    An exception is raised if the `key` field cannot be found in the two input
>    arrays.
>    Neither `r1` nor `r2` should have any duplicates along `key`: the presence
>    of duplicates will make the output quite unreliable. Note that duplicates
>    are not looked for by the algorithm.
>
>    Parameters
>    ----------
>    key : {string, sequence}
>        A string or a sequence of strings corresponding to the fields used
>        for comparison.
>    r1, r2 : arrays
>        Structured arrays.
>    jointype : {'inner', 'outer', 'leftouter'}, optional
>        If 'inner', returns the elements common to both r1 and r2.
>        If 'outer', returns the common elements as well as the elements of r1
>        not in r2 and the elements of not in r2.
>        If 'leftouter', returns the common elements and the elements of r1 not
>        in r2.
>    r1postfix : string, optional
>        String appended to the names of the fields of r1 that are present in r2
>        but absent of the key.
>    r2postfix : string, optional
>        String appended to the names of the fields of r2 that are present in r1
>        but absent of the key.
>    defaults : {dictionary}, optional
>        Dictionary mapping field names to the corresponding default values.
>    usemask : {True, False}, optional
>        Whether to return a MaskedArray (or MaskedRecords is `asrecarray==True`)
>        or a ndarray.
>    asrecarray : {False, True}, optional
>        Whether to return a recarray (or MaskedRecords if `usemask==True`) or
>        just a flexible-type ndarray.
>
>    Notes
>    -----
>    * The output is sorted along the key.
>    * A temporary array is formed by dropping the fields not in the key for the
>      two arrays and concatenating the result. This array is then sorted, and
>      the common entries selected. The output is constructed by
> filling the fields
>      with the selected entries. Matching is not preserved if there are some
>      duplicates...
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

It might also be worth your while to check out Keith Goodman's la
(larry) library or my pandas library, which are both designed with
relational data in mind.

- Wes



More information about the NumPy-Discussion mailing list