![](https://secure.gravatar.com/avatar/b63ff91ef2c613d6b04c609c0fc8eaec.jpg?s=120&d=mm&r=g)
Does numpy have a relational join operation for joining recordarrays? thanks, ilya
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Wed, Feb 2, 2011 at 21:42, Ilya Shlyakhter <ilya_shl@alum.mit.edu> wrote:
Does numpy have a relational join operation for joining recordarrays?
[~] |1> from numpy.lib import recfunctions [~] |2> recfunctions.join_by? Type: function Base Class: <type 'function'> String Form: <function join_by at 0x17bba30> Namespace: Interactive File: /Library/Frameworks/Python.framework/Versions/6.3/lib/python2.6/site-packages/numpy/lib/recfunctions.py Definition: recfunctions.join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False) Docstring: Join arrays `r1` and `r2` on key `key`. The key should be either a string or a sequence of string corresponding to the fields used to join the array. An exception is raised if the `key` field cannot be found in the two input arrays. Neither `r1` nor `r2` should have any duplicates along `key`: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm. Parameters ---------- key : {string, sequence} A string or a sequence of strings corresponding to the fields used for comparison. r1, r2 : arrays Structured arrays. jointype : {'inner', 'outer', 'leftouter'}, optional If 'inner', returns the elements common to both r1 and r2. If 'outer', returns the common elements as well as the elements of r1 not in r2 and the elements of not in r2. If 'leftouter', returns the common elements and the elements of r1 not in r2. r1postfix : string, optional String appended to the names of the fields of r1 that are present in r2 but absent of the key. r2postfix : string, optional String appended to the names of the fields of r2 that are present in r1 but absent of the key. defaults : {dictionary}, optional Dictionary mapping field names to the corresponding default values. usemask : {True, False}, optional Whether to return a MaskedArray (or MaskedRecords is `asrecarray==True`) or a ndarray. asrecarray : {False, True}, optional Whether to return a recarray (or MaskedRecords if `usemask==True`) or just a flexible-type ndarray. Notes ----- * The output is sorted along the key. * A temporary array is formed by dropping the fields not in the key for the two arrays and concatenating the result. This array is then sorted, and the common entries selected. The output is constructed by filling the fields with the selected entries. Matching is not preserved if there are some duplicates... -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
![](https://secure.gravatar.com/avatar/2c08a3eed709a9d1a2654cea45aa466f.jpg?s=120&d=mm&r=g)
On Wed, Feb 2, 2011 at 4:46 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Wed, Feb 2, 2011 at 21:42, Ilya Shlyakhter <ilya_shl@alum.mit.edu> wrote:
Does numpy have a relational join operation for joining recordarrays?
[~] |1> from numpy.lib import recfunctions
[~] |2> recfunctions.join_by? Type: function Base Class: <type 'function'> String Form: <function join_by at 0x17bba30> Namespace: Interactive File: /Library/Frameworks/Python.framework/Versions/6.3/lib/python2.6/site-packages/numpy/lib/recfunctions.py Definition: recfunctions.join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False) Docstring: Join arrays `r1` and `r2` on key `key`.
The key should be either a string or a sequence of string corresponding to the fields used to join the array. An exception is raised if the `key` field cannot be found in the two input arrays. Neither `r1` nor `r2` should have any duplicates along `key`: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm.
Parameters ---------- key : {string, sequence} A string or a sequence of strings corresponding to the fields used for comparison. r1, r2 : arrays Structured arrays. jointype : {'inner', 'outer', 'leftouter'}, optional If 'inner', returns the elements common to both r1 and r2. If 'outer', returns the common elements as well as the elements of r1 not in r2 and the elements of not in r2. If 'leftouter', returns the common elements and the elements of r1 not in r2. r1postfix : string, optional String appended to the names of the fields of r1 that are present in r2 but absent of the key. r2postfix : string, optional String appended to the names of the fields of r2 that are present in r1 but absent of the key. defaults : {dictionary}, optional Dictionary mapping field names to the corresponding default values. usemask : {True, False}, optional Whether to return a MaskedArray (or MaskedRecords is `asrecarray==True`) or a ndarray. asrecarray : {False, True}, optional Whether to return a recarray (or MaskedRecords if `usemask==True`) or just a flexible-type ndarray.
Notes ----- * The output is sorted along the key. * A temporary array is formed by dropping the fields not in the key for the two arrays and concatenating the result. This array is then sorted, and the common entries selected. The output is constructed by filling the fields with the selected entries. Matching is not preserved if there are some duplicates...
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
It might also be worth your while to check out Keith Goodman's la (larry) library or my pandas library, which are both designed with relational data in mind. - Wes
participants (3)
-
Ilya Shlyakhter
-
Robert Kern
-
Wes McKinney