[Pythonmac-SIG] Crossfold Validation

Robert Kern robert.kern at gmail.com
Sat Feb 20 02:12:41 CET 2010


On 2010-02-19 18:29 PM, Aahz wrote:
> On Fri, Feb 19, 2010, Mark Livingstone wrote:
>>
>> I am looking for suggestions! I am doing some experimentation and want
>> to know if there are any utilities available that will take a file as
>> input, get the num folds and num times, and do the slice and dice file
>> operation ready then for training / testing?
>
> You will need to expand your jargon if you want anyone unfamiliar with
> this specific operation to provide assistance.  (I.e. I have no clue what
> you're talking about.)

It's really off-topic for this list, but K-fold cross-validation is a way of 
testing how well some prediction method will perform. Roughly, you split up the 
data into K chunks. You use K-1 chunks to train your method and test on the 
remaining chunk. You then repeat this K times with each chunk playing the role 
of the test chunk exactly once. Then you average the performance of your 
prediction method over each of the K tests.

Mark, I recommend that you join the scipy-users mailing list. We'll be happy to 
field your data analysis questions over there. These kinds of questions really 
are unrelated to the Apple platform even if you intend to do the analysis on an 
Apple machine.

   http://www.scipy.org/Mailing_Lists

You may also want to check the SpamBayes project. Their validation framework 
might be applicable to your problem set.

   http://spambayes.sourceforge.net/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco



More information about the Pythonmac-SIG mailing list