[Baypiggies] Data Mining , Predictive modeling packages

Stephen McInerney spmcinerney at hotmail.com
Sat Jul 7 07:53:19 CEST 2012


Hi Meenal,

The best and most expressive language for Data Mining/Predictive modeling is and continues to be R,
by a long shot. (I'm a longtime Python user who subsequently learned R so I'm unbiased.)

Python + pandas + scikit-learn is a small subset of corresponding R functionality. Wes McKinney (pandas lead)
is actively migrating R functionality into Python, but they have ways to go, and need contributors.
pandas basically gives you the R constructs data.frame + timeseries (timeseries is mainly for financial people)
plus slicing, indexing and subsetting.
pandas is aiming for performance and scalability.
Also, the excellent ggplot2 visualization library is being ported from R to Python, expected this fall(? I was told?)
(Hadley Wickham, the creator of the outstanding packages plyr & ggplot2 gave a great talk in SF
last week btw.)
Wes McKinney presents some solid "Why not R?" arguments on pandas.pydata.org
(performance, scalability, no copyleft licensing, Python is by far a better general-purpose language for production systems)
pandas major release 0.8.0 was just released last week so I'm curious as to experiences from anyone who's used it yet.

Your question got a good response so I propose "Python for Predictive Analytics/Data Mining"

would be a good meeting topic.


(Btw next time Wes McKinney comes out west we should invite him to talk. He was at StrataConf this spring.)

PS If by any chance you're asking because you're competing on Kaggle.com, drop me a line privately.

Best regards,
Stephen
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20120706/ce0c0ed4/attachment.html>


More information about the Baypiggies mailing list