Here is a non-trivial example of multiple dispatch. I want to convert data between container types, i.e. given
into(a, b)
I want to return something with the information content of b in a container like a, e.g.
In [24]: into([], (1, 2, 3))
Out[24]: [1, 2, 3]
We use this abstraction pretty heavily in Blaze, a project that tries to map relational algebra onto a variety of projects that might possibly be used to do relational-algebra-like tasks. Projects in this scope include sqlalchemy, pandas, numpy, pyspark, pytables, etc..
In [26]: from blaze import into
A dataframe with some test data
In [25]: df = DataFrame([[1, 'Alice', 100],
[2, 'Bob', -200],
[3, 'Charlie', 300],
[4, 'Dennis', 400],
[5, 'Edith', -500]],
columns=['id', 'name', 'amount'])
migrate list <- DataFrame
In [27]: into([], df)
Out[27]:
[[1, 'Alice', 100],
[2, 'Bob', -200],
[3, 'Charlie', 300],
[4, 'Dennis', 400],
[5, 'Edith', -500]]
migrate numpy array <- DataFrame
In [28]: into(np.ndarray(0), df)
Out[28]:
rec.array([(1, 'Alice', 100), (2, 'Bob', -200), (3, 'Charlie', 300),
(4, 'Dennis', 400), (5, 'Edith', -500)],
dtype=[('id', '<i8'), ('name', 'O'), ('amount', '<i8')])
In [29]: x = into(np.ndarray(0), df) # store for later
connect to local pymongo database
In [30]: import pymongo
In [31]: db = pymongo.MongoClient().db
In [34]: into(db.my_collection, df) # migrate mongo <- pandas
Out[34]: Collection(Database(MongoClient('localhost', 27017), u'db'), u'my_collection')
In [35]: into(db.my_collection2, x) # migrate mongo <- numpy
Out[35]: Collection(Database(MongoClient('localhost', 27017), u'db'), u'my_collection2')
In [36]: list(db.my_collection2.find()) # verify that things transferred well
Out[36]:
[{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e2'),
u'amount': 100,
u'id': 1,
u'name': u'Alice'},
{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e3'),
u'amount': -200,
u'id': 2,
u'name': u'Bob'},
{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e4'),
u'amount': 300,
u'id': 3,
u'name': u'Charlie'},
{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e5'),
u'amount': 400,
u'id': 4,
u'name': u'Dennis'},
{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e6'),
u'amount': -500,
u'id': 5,
u'name': u'Edith'}]
migrate bcolz <- mongo
In [37]: into(bcolz.ctable(), db.my_collection)
Out[37]:
ctable((5,), [('amount', '<i8'), ('id', '<i8'), ('name', '<U7')])
nbytes: 220; cbytes: 63.99 KB; ratio: 0.00
cparams := cparams(clevel=5, shuffle=True, cname='blosclz')
[(100, 1, u'Alice') (-200, 2, u'Bob') (300, 3, u'Charlie')
(400, 4, u'Dennis') (-500, 5, u'Edith')]
Note in this last case that the two libraries, bcolz (a compressed on-disk storage library) and pymongo know absolutely nothing about each other.
Many of these into definitions are very simple
@dispatch(np.ndarray, DataFrame)
def into(a, df):
return df.to_records(index=False)
While some of them rely on others, or on inheritance
@dispatch(Collection, np.ndarray)
def into(coll, x, **kwargs):
return into(coll, into(DataFrame(), x), **kwargs)
But remembering all of the appropriate .to_foo and .from_bar methods can be a real pain. Collecting them all into a single abstraction cuts down significantly on the administrative burden of data migrations.