[Tutor] decomposing a problem

Avi Gross avigross at verizon.net
Sat Dec 29 07:39:24 EST 2018


Steven,

As I head out the door, I will sketch it.

Given a data.frame populated with N rows and columns you want to break it
into training and test data sets.

In a data.frame, you can refer to a row by using an index like 5 or 2019.
You can ask for the number of rows currently in existence. You can also
create an array/vector of length N consisting of instructions that can tell
which random rows of the N you want and which you don't. For the purposes of
this task, you choose random numbers in the range of N and either keep the
numbers as indices or as a way to mark True/False in the vector. You then
ask for a new data.frame made by indexing the existing one using the vector.
You can then negate the vector and ask for a second new data.frame indexing
it.

Something close to that.

Or, you can simply add the vector as a new column in the data.frame in some
form. It would then mark which rows are to be used for which purpose. Later,
when using the data, you include a CONDITION that row X is true, or
whatever.



-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Steven D'Aprano
Sent: Friday, December 28, 2018 11:12 PM
To: tutor at python.org
Subject: Re: [Tutor] decomposing a problem

On Fri, Dec 28, 2018 at 10:39:53PM -0500, Avi Gross wrote:
> I will answer this question then head off on vacation.

You wrote about 140 or more lines, but didn't come close to answering the
question: how to randomly split data from a dictionary into training data
and reserved data.



--
Steve
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list