Optimization for real-world data.

Morse code is a good one; which leads to the entropy of real-world English letters: how probable are the letters A, E, and Z? How probable is the 3-gram of letters 'AEZ'? A well-balanced tree could take this knowledge into account and sparsely add 'nodes' without needing to rebalance as often.

Pairtree with hashes is an interesting real-world problem. Filesystems have a max number of file entries per directory. A good cryptographic hash has equal probabilities of e.g. hexadecimal character occurrence for each digit: each digit character has an equal probability of following any other digit.

Shannon used these probabilities for something; I can't remember what it was?

One (TED-Ed) YouTube video I found mentioned sorting library books; what's the fastest way?

Sorting is not strictly necessary for top-k N-dimensional optimization: tracking high and low points doesn't require sorting the whole (then ordered or partially ordered) set.

Here's an example from https://wrdrd.github.io/docs/consulting/data-science#optimization :

Optimization⬅
https://en.wikipedia.org/wiki/Mathematical_optimization

Find local and global optima (maxima and minima) within an n-dimensional field which may be limited by resource constraints.

# Global optima of a 1-dimensional list
points = [10, 20, 100, 20, 10]
global_max, global_min = max(points), min(points)
assert global_max == 100
assert global_min == 10

# Local optima of a 1-dimensional list
sample = points[:1]
local_max, local_min = max(sample), min(sample)
assert local_max == 20
assert local_min == 10

# A 2-dimensional list ...
points = [(-0.5, 0),
          (0,  0.5),
          (0.5,  0),
          (0, -0.5)]

https://en.wikipedia.org/wiki/Optimization_(disambiguation)
https://en.wikipedia.org/wiki/Metaheuristic
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/
http://scikit-learn.org/stable/tutorial/machine_learning_map/
https://en.wikipedia.org/wiki/Firefly_algorithm

On Thursday, August 16, 2018, Tobias Kohn <kohnt@tobiaskohn.ch> wrote:

I couldn't agree more!  Such interactive programmes are often so much more valuable than hours of talking (even though doing interactive programmes all the time will wear the students out -- as always, there is golden middle here ^_^).

Another idea in that direction is to make two or three groups, and then have a little competition of who can sort a pile of post-its (or whatever) fastest.  And I would recommend to surprise the students.  Once they have learned how to sort numbers, say, give them post-its with picture of animals, say, or melodies, or whatever you fancy.  That forces them to think outside of the box, and suddenly computer science class becomes so much more meaningful and memorable!

Yet another idea: how about sorting Morse code?  Usually, we get it sorted from 'A' to 'Z'.  But there are other ways, and before you know it, you are discussing trees...

Cheers,
Tobias


Quoting kirby urner <kirby.urner@gmail.com>:

 
I'm glad Tobias took the bull by the horns and didn't eschew  a deeper look into the sorting algorithms.
 
As a big fan of animations, my reflex is to scour Youtube for graphical renderings of the different strategies, but then another thought crops up: lets get out of our seats and do choreography, Math is an Outdoor Sport! (PR poster). I should explain.
 
The relationships between programming and scripting theater "programmes" (old spelling) are deep. Give each student a postit with a number and have them *enact* the sorting algorithm.  E.g. starting in a row, turn to person on your left (if there is one) and swap places if your postit number is higher...  have directors  and make a video.  Now choreograph (enact, dance) in a different way.
 
Symphonies, plays, musicals, are so multi-track, so parallel, and yet we fail to exploit those intuitions sometimes.
 
Let the Theater Department handle it, in consultation with CS.  Could go under the heading of "Unplugged".  Likely the Hobbits are already doing this in New Zealand (NZ has pioneered unplugged more than most, plus has Hobbits).
 
Seriously, having lived in the Philippines where people routinely learn group dancing, I'm worried about only acting as teams in three capacities (a) cheerleader (b) athlete on the field (c) band.  Theater is being eliminated in favor of competitive sports.  Perhaps CS could come to the rescue and say "wait, we need Theater for our simulations".
 
More cerebral team-based activities might go a long way towards fighting the stereotype that computer programmers  only live in artificially lit basements eating pizza.  That's a physically damaging lifestyle, nothing to do with writing code or even doing math.
 
===
 
Regarding last night's tele-class (real time, zoom.us), I worked through "cloning a github repo" as the core exercise, a repeat from last week, then went through the process of updating a notebook live, and pushing the changes back to the repo.
 
The repo was of course the one with yesterday's Jupyter Notebook about Ordering Polyhedrons by volume.
 
 
I tell them "cloning a github repo" is going to be important for when they want like a Jake Vanderplas tutorial, i.e. when they want to study a topic in more depth and the workshop is (A) on Youtube or similar service and (B) the workshop materials are free via github (a common enough pattern). 
 
I also reminded them how Google shares TensorFlow in the form of Colab exercises (Jupyter Notebooks).
 
One students asked "R or Python, which is winning in Data Science"? 
 
My answer:  pandas is a relatively recent development and there's already lots of excellent curriculum based around R, which is likewise free / open source. The business world is rushing into Python because of DL / ML and so gains an R-like tool in the process, which enables better town-gown relations i.e. Harvard can talk to Facebook about Pytorch thanks to already teaching R in the stats department. 
 
In other words, it's not either/or, more like two communities forming a bridge via the common language of data science, which R and Python both reflect (as do some other languages / ecosystems, not aiming for comprehensivity here).
 
Kirby
 
 

On Wed, Aug 15, 2018 at 10:04 AM, Tobias Kohn <kohnt@tobiaskohn.ch> wrote:

Hi Jurgis,