[Tutor] if statement issue

DL Neil PyTutor at danceswithmice.info
Sat Apr 25 17:44:05 EDT 2020


On 25/04/20 7:25 PM, shubham sinha wrote:
> question:
> The City class has the following attributes: name, country (where the city
> is located), elevation (measured in meters), and population (approximate,
> according to recent statistics). Fill in the blanks of the
> max_elevation_city function to return the name of the city and its country
> (separated by a comma), when comparing the 3 defined instances for a
> specified minimal population. For example, calling the function for a
> minimum population of 1 million: max_elevation_city(1000000) should return
> "Sofia, Bulgaria".

However, it is not apparent if this is your first-attempt at code, or if 
it has been provided for you:

 > # Evaluate the 1st instance to meet the requirements:
 > # does city #1 have at least min_population and
 > # is its elevation the highest evaluated so far?
 > if city1.population >= min_population and city1.elevation >
 > return_city.elevation:
 > return_city = city1
etc

Imagine if there were hundreds of cities. @Alan has already mentioned 
the need for a class definition. Is this code sustainable? The code 
would require the above set of lines (plus its definition) for every 
single city. "I don't think so, Tim!"

At the very least, "generalise" the above, and put it into a function, 
in the same way that you will "generalise" the data with a class 
definition and __init__(). (This idea might also be helpful a little 
later in this response...)


What are we doing here?
This is a classic "Database" (now becoming "Data Science") problem. 
First of all, it is a good idea to visualise the problem - and the data. 
Whilst easy-enough with 'toy problems', it's not a good idea to dive 
straight into coding!

(I tend to start with 'the data', because that is my background - not 
just because I like to tease @Alan - mercilessly)

Visualising the data:
Because there are (apparently) only a few cities with three "dependent" 
data-items, it is easy to write the whole problem onto a sheet of paper, 
eg a table with one line/row for each city, and columns for cityNM, 
countryNM, elevation, and population. (one assumes the elevations all 
use the same metric, eg feet or meters). Now we can 'see' that a 
'solution' should work for our sub-set (or "test-data") AND for a "live" 
situation involving many more data-items (and possibly, other 
complications).

Now that you are no longer looking at a 'wall of text' in the 
code-description given by your book/trainer, do you have a 'picture' in 
your mind?

Visualising the problem:
If you would like to keep working on-paper, at this time it would be a 
good idea to either rip your table cross-wise, to that each line/row is 
an independent 'strip', or transfer each city's data to a 'sticky-note', 
file-card, or similar - one for each line/row of your pretty table. This 
is why many dev.team offices have white-boards for many of the walls!

The problem becomes one of sequencing the data, filtering-out those with 
over-large populations, then re-sequencing according to elevation. NB 
other terms for "sequencing" are "ORDER BY" (in the DB-world) and "sort" 
(in Python).

In the DB-world, we start by reducing the amount of data to be 
considered by discarding the irrelevant - we call it SELECT, or in 
English: "selection". @Alan's advice called it "filtering". If you had a 
data-set of thousands of cities, this would (hopefully) reduce the 
volume of (applicable) data considerably. (and speed-up the next stage 
of processing!)

So back to our "model", re-assemble your 'strips' as a new 'table', ie 
in a 'sequence' with the least-populous city at the top, and the 
most-crowded, at the bottom. Now, starting 'at the top', smile at every 
strip where the city's population is smaller than the specification and 
remove all of those that are 'too big'. Hint: once a single city fails 
to fulfill the criteria, all the others 'below' will too (so you can 
remove them en-masse - that's why we sequenced them!

Thus, we can now move to the second criteria: elevation. Re-sequence the 
'strips', this time so that the 'highest' city is at the 'top' and the 
city with lowest-elevation is at the bottom.
(apologies for potentially confusing the sequence of our table's 
top/bottom, with city-elevation or height!)

The required answer is now self-evident, or as my mother used to say 
when offering me a plate of treats and observing my greedy eyes: "take one"!


Coding the solution:
So, how do we do this in Python? As @Alan has observed, some of the code 
as listed is not good style/technique. However, it is not clear what has 
been provided (which we are not allowed to (even appear to) criticise, 
are we?) and what is yours; so I shall +1 his comment and ignore from there.

Take the data-classes (cities) and assemble them into a list (you can 
imagine the list as the lines/rows of our illustration (above) and each 
class's attributes as the columns!

Now, if you haven't already, you should study Python's ability to sort 
data-structures, and sort the list ("in-place") according to each 
class's population attribute.

Once sorted by population, we don't want to continue to consider any 
cities which are too 'large'. So, create another list, and considering 
each element in-turn, copy those cities/classes which fulfill the 
maximum-population criteria. Hint, a "break" in this loop on the first 
occasion when the criteria fails, will save you/the computer from 
considering any other list-members.

Be careful at this point! What would happen if NO city fulfills the 
specification? Is there any point in executing further 'calculations'?

Moving-on, sort the remaining-list, this time by elevation.

"Take one" (learn from me, don't be greedy!)


If the 'visualisation' of the data and/or the problem was your 
difficulty, or if you are new to Python and specifically sort-ing, 
please ignore the next paragraph!

Python lists offer a neat method: .pop(). Normally, (with no parameter) 
this will return the last ("right-most") element of a list. However, 
that doesn't suit the visualisation described earlier. We could use 
list.pop( 0 ) to grab the first/'top' city-class though!


> def max_elevation_city(min_population):
> # Initialize the variable that will hold
> # the information of the city with
> # the highest elevation
> return_city = City()
...
 > #Format the return string
 > if return_city.name:
 > return ("{}, {}".format(return_city.name, return_city.country))
 > else:
 > return ""

I assume this is a 'device' by your teacher/trainer to ensure that 
assignments can be machine-graded.

Using the above, we have decided which city to return as return_city.


By the way, you/Python (and many of our fellow list-members!) could code 
a solution without any sorting operations. I (only) recommended sort() 
(above), in order to maintain the 'visualisation' developed on-paper!


To complete the (DB) picture: SELECTion was mentioned earlier - in our 
'paper-computer' this was working with rows of data at a time. The other 
basic analysis is "projection". This is analysing the table by 
column(s), eg not bothering with what the city is called, but only being 
'interested in it' if its *population* meets a certain criteria.

Using Python classes is a good idea. You will gain practice 'pulling' a 
single "attribute" from the class and analysing/sorting on that, ie 
"projection".

The 'bad news' is that we DB-people could have accomplished the whole 
assignment (assuming the table) in a single line of SQL - but where's 
the fun in using SQL and hard-drives, when we could be using Python and 
working at RAM-speed!?
-- 
Regards =dn


More information about the Tutor mailing list