[Chicago] dedupe approaching alpha release!

Derek Eder derek.eder at gmail.com
Sat Sep 15 02:21:40 CEST 2012


ChiPy-ers,

Forest and I have made significant progress over the past few months on our
open source dedupe library.

You can see the latest code here: https://github.com/open-city/dedupe

We have a great working
example<https://github.com/open-city/dedupe/blob/master/examples/csv_example.py>that
takes in a CSV file and outputs a CSV of clustered groups, so if you
are interested in using dedupe now for your own data, this is a great
pattern to follow.

Over the next few weeks, we will develop an example that uses a sqlite
database for larger datasets (10,000+ rows). The completion of this, along
with some additional documentation, will mark the completion of alpha.

Join our Google groups for more frequent updates:
https://groups.google.com/forum/?fromgroups=#!forum/open-source-deduplication

Stay tuned!

Derek

-- 
Derek Eder
@derek_eder <https://twitter.com/#!/derek_eder>
derekeder.com
derek.eder at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20120914/ed29aee5/attachment.html>


More information about the Chicago mailing list