[IPython-dev] Pandawash: extension to conveniently & transparently clean up data
Thomas Kluyver
takowl at gmail.com
Mon Apr 21 12:30:45 EDT 2014
The result of a quick bit of hacking yesterday, pandawash is an IPython
extension to help clean up messy data in pandas dataframes.
The key feature is that it generates plain Python code which you modify to
do the data cleanup. For instance, you can use it to check that the values
in a numeric column are within a specified range. If any values are outside
that, it will create a new cell with the necessary code to replace them;
you just set the replacement values and run the cell. This is more
convenient than finding those values and writing the code yourself, but it
leaves you with full control and a clear record of the changes, unlike more
automatic data cleaning.
Demo:
http://nbviewer.ipython.org/github/takluyver/pandawash/blob/master/Pandawash%20Demo.ipynb
Source code:
https://github.com/takluyver/pandawash
Thanks,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140421/d3575402/attachment.html>
More information about the IPython-dev
mailing list