[Tutor] decomposing a problem

Thu Dec 27 18:32:51 EST 2018

[Mark Lawrence please press DELETE now in case the rest of this message is
all about you.]
[[If that is not working, if on Windows, try Control-ALT-DELETE as that will
really get rid of my message.]]

Back to replying to Steven,

Of course I want to be corrected when wrong.

I think everyone here knows I tend to be quite expansive in my thoughts and
sometimes to the point where they suggest I am free-associating. I am trying
to get to the point faster and stay there.

So if what I write is not wrong as a general point and you want to bring up
every exception, fine. I reserve the right not to follow you there,
especially not on the forum. I may continue a discussion with you in
private, of course.

I often have a problem in real life (not talking about you, let alone
whoever Mark is) where I think I said something clearly by using phrases
like "if" and find the other person simply acts as if I had left that out.
You know, we can go to the park IF it is not raining tomorrow. Reply is to
tell me the weather report says it will rain so why am I suggesting we go to
the park. Duh. I was not aware of the weather report directly BUT clearly
suggested it was a consideration we should look at before deciding. 

Now a more obvious error should be pointed out. EXAMPLE, I am driving to
Pennsylvania this weekend not far from a National Park and will have some
hours to kill. I suggested we might visit Valley Forge National Historic
Park and did not say only if it was open. Well, in the U.S. we happen to
have the very real possibility the Park will be closed due to it being
deemed optional during a so-called Government Shutdown so such a reply IS
reasonable. I did not consider that and stand corrected.

But Chris, you point out I reacted similarly to what you said. Indeed, you
said that sometimes we don't need to focus on efficiency as compared to
saying we should always ignore it or something like that. I think we
actually are in relative agreement in how we might approach a problem like
this. We might try to solve it in a reasonable way first and not worry at
first about efficiency especially now that some equipment runs so fast and
with so much memory that results appear faster than we can get to them. But,
with experience, and need, we may fine tune code that is causing issues. As
I have mentioned, I have applications that regularly need huge samples taken
at random so a list of millions being created millions of times and the
above being done thousands of times, adds up. Many cheaper methods might
then be considered including, especially, just switching to a better data
structure ONCE.

I will stop this message here as I suspect Mark is still reading and fuming.
Note, I do not intend to mention Mark again in future messages. I do not
actually want to annoy him and wish he would live and let live.

-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Steven D'Aprano
Sent: Thursday, December 27, 2018 5:38 PM
To: tutor at python.org
Subject: Re: [Tutor] decomposing a problem

On Wed, Dec 26, 2018 at 11:02:07AM -0500, Avi Gross wrote:

> I often find that I try to make a main point ad people then focus on 
> something else, like an example.

I can't speak for others, but for me, that could be because of a number of
reasons:

- I agree with what you say, but don't feel like adding "I agree!!!!" 
after each paragraph of yours;

- I disagree, but can't be bothered arguing;

- I don't understand the point you intend to make, so just move on.

But when you make an obvious error, I tend to respond. This is supposed to
be a list for teaching people to use Python better, after all.

> So, do we agree on the main point that choosing a specific data structure
or
> algorithm (or even computer language) too soon can lead to problems that
can
> be avoided if we first map out the problem and understand it better?

Sure, why not? That's vague and generic enough that it has to be true.

But if its meant as advice, you don't really offer anything concrete. 
How does one decide what is "too soon"? How does one avoid design 
paralysis?

> I do not concede that efficiency can be ignored because computers are
fast.

That's good, but I'm not sure why you think it is relevant as I never 
suggested that efficiency can be ignored. Only that what people *guess* 
is "lots of data" and what actually *is* lots of data may not be the 
same thing.

> I do concede that it is often not worth the effort or that you can
> inadvertently make things worse and there are tradeoffs.

Okay.

> Let me be specific. The side topic was asking how to get a random key from
> an existing dictionary. If you do this ONCE, it may be no big deal to make
a
> list of all keys, index it by a random number, and move on. I did supply a
> solution that might(or might not) run faster by using a generator to get
one
> item at a time and stopping when found. Less space but not sure if less
> time.

Why don't you try it and find out?

> But what I often need to do is to segment lots of data into two piles. One
> is for training purposes using some machine learning algorithm and the
> remainder is to be used for verifications. The choice must be random or
the
> entire project may become meaningless. So if your data structure was a
> dictionary with key names promptly abandoned, you cannot just call pop()
> umpteen times to get supposedly random results as they may come in a very
> specific order.

Fortunately I never suggested doing that.

> If you want to have 75% of the data in the training section,
> and 25% reserved, and you have millions of records, what is a good way to
> go? 

The obvious solution:

keys = list(mydict.keys())
random.shuffle(keys)
index = len(keys)*3//4
training_data = keys[:index]
reserved = keys[index:]

Now you have the keys split into training data and reserved data. To 
extract the value, you can just call mydict[some_key]. If you prefer, 
you can generate two distinct dicts:

training_data = {key: mydict[key] for key in training_data}

and similarly for the reserved data, and then mydict becomes redundant 
and you are free to delete it (or just ignore it).

Anything more complex than this solution should not even be attempted 
until you have tried the simple, obvious solution and discovered that it 
isn't satisfactory.

Keep it simple. Try the simplest thing that works first, and don't add 
complexity until you know that you need it.

By the way, your comments would be more credible if you had actual 
working code that demonstrates your point, rather than making vague 
comments that something "may" be faster. Sure, anything "may" be faster. 
We can say that about literally anything. Walking to Alaska from the 
southernmost tip of Chile while dragging a grand piano behind you "may" 
be faster than flying, but probably isn't. Unless you have actual code 
backing up your assertions, they're pretty meaningless.

And the advantage of working code is that people might actually learn 
some Python too.

-- 
Steve
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor