I've been researching python's sorting algorithm, Timsort, and know that
it's a hybrid between insertion sort (best case time complexity O(n)) and
merge sort (O(n log(n))), and has an overall time complexity of O(n
log(n)). What I'm trying to figure out is, when the list is very short,
let's say 2 or 3 units in length, does the time complexity of Timsort
become O(n)? I was thinking that the insertion-sort portion of Timsort
would be able to sort a list of length=3 (such as [8,7,9] or [9,8,7]) in
O(3) time, therefore O(n). Thoughts?

On 7/2/19 1:48 AM, Jordan Baltes wrote:
> I've been researching python's sorting algorithm, Timsort, and know that
> it's a hybrid between insertion sort (best case time complexity O(n)) and
> merge sort (O(n log(n))), and has an overall time complexity of O(n
> log(n)). What I'm trying to figure out is, when the list is very short,
> let's say 2 or 3 units in length, does the time complexity of Timsort
> become O(n)? I was thinking that the insertion-sort portion of Timsort
> would be able to sort a list of length=3 (such as [8,7,9] or [9,8,7]) in
> O(3) time, therefore O(n). Thoughts?

Try timing it?

Hi All,
My python version is 3.6.7
I need help to understand this piece of code?

rainbow ={"Green": "G", "Red": "R", "Blue": "B"}
# ennumerate with index, and key value
fori, (key, value) inenumerate(rainbow.items()):
print(i, key, value)

This gives a output as always:-

0 Green G
1 Red R
2 Blue B

Does this means that the Dict is ordered? or it is implementation dependent?

Thanks & Regards,

On 04/07/2019 18:02, Animesh Bhadra wrote:
> Hi All,
> My python version is 3.6.7
> I need help to understand this piece of code?
> rainbow ={"Green": "G", "Red": "R", "Blue": "B"}
> # ennumerate with index, and key value
> for i, (key, value) in enumerate(rainbow.items()):

Lets unpick t from the inside out.
What does items() return?

>>> rainbow.items()
dict_items([('Green', 'G'), ('Red', 'R'), ('Blue', 'B')])

For our purposes dict_items is just a fancy kind of list.
In this case it is a list of tuples. Each tuple being
a key,value pair.

What does enumerate do to a list?
It returns the index and the value of each item in the list.
So if we try

>>> for index, val in enumerate(rainbow.items()):
	 print( index, val)
0 ('Green', 'G')
1 ('Red', 'R')
2 ('Blue', 'B')

We get the 0,1,2 as the index of the 'list' and the
corresponding tuples as the values.

Now when we replace val with {key,value) Python performs
tuple unpacking to populate key and value for us.

> print(i, key, value)

So now we get all three items printed without the tuple parens.
Which is what you got.

> This gives a output as always:-
> 0 Green G
> 1 Red R
> 2 Blue B

> Does this means that the Dict is ordered? or it is implementation dependent?

Neither, it means the items in a list always have indexes
starting at zero.

By pure coincidence dictionaries in recent Python versions (since 3.6
or 3.7???) retain their insertion order. But that was not always the
case, but the result would have been the same so far as the 0,1,2 bit goes.

I am a student at an university. Currently I was working on an algorithm
using python. It is based on scheduling the teachers to their nearest
venues. And at the venues there can be atmost 400 teachers and these are to
be divided into the Batches of 40 i.e. ten batches. All the batches will
have teachers having same group number assigned to them. and al the batches
should get only the two days from the working days in the month.
This is only the last part of the complete algorithm. I am sending the
files associated to it and the code that I have made till now. Please help
me in making it as I need it urgently.

Thanks in advance.

import math
import csv
import pdb
import pandas as pd 
import numpy as np 
from math import radians, sin, cos, acos

def distanceCalculator(latitude1,longitude1,latitude2,longitude2):
        slat = radians(latitude1)
        slon = radians(longitude1)
        elat = radians(latitude2)
        elon = radians(longitude2)
        dist = 6371.01 * acos(sin(slat)*sin(elat) + cos(slat)*cos(elat)*cos(slon - elon))
        return dist
#df = pd.read_csv("venueData.csv",header=None)

df2 = pd.read_csv("mtdata.csv",header=None)

df2.columns = ['name','location','latitude','longitude','subject']

df = pd.read_csv("venueData.csv",header=None)


#df2 = pd.read_csv("mtdata.csv",header=None)

#df2.columns = ['name','location','latitude','longitude','subject']

teacher = pd.read_csv("teachers.csv")

teacher['Latitude'] = teacher['Latitude'].apply(lambda x: x.rstrip(",") if type(x)  == str else x )

teacher['Longitude'] = teacher['Longitude'].apply(lambda x: x.rstrip(",") if type(x)  == str else x )

listEmpty = []

dictionaryTeacher = {}

for i,ex in teacher.iterrows():
    lat1 = ex['Latitude']
    lon1 = ex['Longitude']
    Id = i 
    for b,c in df.iterrows():
        lat2 = c['latitude']
        lon2 = c['longitude']
        nameVen = c['name']
    demian = []    
    demian = listEmpty[0]
    dictionaryTeacher[ex['Name']] = demian
    listEmpty = []
    demian = []
DataTeacher = pd.DataFrame(columns=['Teacher','Distance','Venue','Eng','Hindi','Maths'])

number = 3 

for ex in dictionaryTeacher:
    DataTeacher= DataTeacher.append({'Teacher':ex,'Distance':dictionaryTeacher[ex][0],'Venue':df.loc[dictionaryTeacher[ex][2]]['name'],'Eng':teacher.loc[dictionaryTeacher[ex][1]]['Eng'],'Hindi':teacher.loc[dictionaryTeacher[ex][1]]['Hindi'],'Maths':teacher.loc[dictionaryTeacher[ex][1]]['Maths']},ignore_index=True)
days = pd.read_csv("days.csv")

days.columns = ['January', 'January:Days', 'February', 'Feburary:Days', 'March',
       'March:Days', 'April', 'April:Days', 'May', 'May:Days', 'June',
       'June:Days', 'July', 'July:Days', 'August', 'August:Days',
       'September', 'September:Days', 'October', 'October:Days', 'November',
       'November:Days', 'December', 'December:Days']


df['name'] = df['name'].apply(lambda x: x.rstrip())

df2['name'] =df2['name'].apply(lambda x: x.rstrip())

venue = {}

for i,k in enumerate(df['name']):
    if (k not in venue):
        venue[k] = {'January': 0 ,'February':0 , 'March': 0 , 'April':0, 'May':0,'June ':0 ,'July':0 , 'August':0,'September':0,'October':0,'November':0,'December':0}
teacher = {}

for i,k in enumerate(df2['name']):
    if (k not in teacher):
        teacher[k] = {'January': 0 ,'February':0 , 'March': 0 , 'April':0, 'May':0,'June ':0 ,'July':0 , 'August':0,'September':0,'October':0,'November':0,'December':0}
dictionary = {}

liste = []

for i,ex in df2.iterrows():
    nameT = ex['name']
    lat1  = ex['latitude']
    lon1  = ex['longitude']
    sub   = ex['subject'] 
    Id = i 
    for b , x in df.iterrows():
        nameM = x['name']
        lat2  = x['latitude']
        lon2  = x['longitude']
        id2 = b

    dictionary[ex['name']] = liste[0:3]
    liste = []
Data = pd.DataFrame(columns=['Trainer','Venue','Distance','Subjects','Location'])

for ex in dictionary:
    for i , k in enumerate(dictionary[ex]):
        Data = Data.append({'Trainer':ex ,'Venue': df.loc[dictionary[ex][i][2]]['name'],'Distance': dictionary[ex][i][0],'Subjects':df2.loc[dictionary[ex][i][1]]['subject'],'Location':df2.loc[dictionary[ex][i][1]]['location']},ignore_index=True)
Data['Month'] = -1

for i,ex in Data.iterrows():
    Train = ex['Trainer']
    Venue = ex['Venue']
    for ex in teacher[Train]:
            if(venue[Venue][ex] == 0):
                teacher[Train][ex] = 1
                venue[Venue][ex] = 1
                Data.loc[i,"Month"] = ex
strings = " "

for ex in days[days["January"][:].str.contains("Tue")]['January'].index:
    strings += ","+ str(ex)
strings = strings.rstrip(",")

strings = strings.lstrip()

Data['Month'] = Data['Month'].apply(lambda x : x.rstrip())

listeDays = []

Cal = 0 

num = 0 

strings = ""

for i,ex in Data.iterrows():
    ay = ex['Month']
    indexOfDays = days[days[ay][:].str.contains("Mon") |  days[ay][:].str.contains("Tue") | days[ay][:].str.contains("Wed") | days[ay][:].str.contains("Thu") | days[ay][:].str.contains("Fri")][ay].index
    for ex in days[days[ay][:].str.contains("Mon") |  days[ay][:].str.contains("Tue") | days[ay][:].str.contains("Wed") | days[ay][:].str.contains("Thu") | days[ay][:].str.contains("Fri")][ay]:
        num = indexOfDays[Cal]
        num +=1
        Cal +=1
        strings += str(num) + "," + ex + ","
    Data.loc[i,"Days"] = strings
    num = 0 
    Cal = 0 
    strings = ""
Data['Month'] = Data['Month'].apply(lambda x: x.rstrip())

Data['Days'] = Data['Days'].apply(lambda x : x.replace("\n",","))

DfSub = pd.read_csv("backupMasterTrainers.csv")

copyDfSub = DfSub

ab = DfSub.sample(n=1)

ab.columns = ['Name',"Location","Latitude","Longitude","Subject"]

Id = ""

VenID = 0 

newName = ""

dist = ""

VenLat =""

VenLot =""

newLat = ""

newLon = ""

newSub = ""

newLoc = ""

for i,ex in Data.iterrows():
    name = ex['Trainer']
    month = ex['Month']
    lecture = ex['Subjects']    
    print(name,"Do you want to give ",lecture,"in this month :",month)
    answer = input("For Yes : Y For No N") 
    if answer == 'Y' or answer == 'y':
    elif answer == 'N' or answer =='n':
        ab = copyDfSub.sample(n=1)
        ab.columns = ['Name',"Location","Latitude","Longitude","Subject"]
        Id = ab.index[0]
        newName = ab.loc[Id,"Name"]
        newLat = ab.loc[Id,"Latitude"]
        newLon = ab.loc[Id,"Longitude"]
        newSub = ab.loc[Id,"Subject"]
        newLoc = ab.loc[Id,"Location"]
        VenueName = ex['Venue']
        for z,b in df.iterrows():
            if(VenueName in b['name']):
                VenID = z 
        VenLat = df['latitude'][z]
        VenLon = df['longitude'][z]
        dist = distanceCalculator(newLat,newLon,VenLat,VenLon)
        Data.loc[i,"Trainer"] = newName
        Data.loc[i,"Distance"] = dist
        Data.loc[i,"Subjects"] = newSub
        Data.loc[i,"Location"] = newLoc

df5 = pd.merge(Data,DataTeacher,on="Venue",how='inner')

On 05/07/2019 07:25, Suhit Kumar wrote:

> I am a student at an university. Currently I was working on an algorithm
> using python. It is based on scheduling the teachers to their nearest
> venues. And at the venues there can be atmost 400 teachers and these are to
> be divided into the Batches of 40 i.e. ten batches. All the batches will
> have teachers having same group number assigned to them. 

Who assigns the group numbers? Is that the program or part of the data?

> should get only the two days from the working days in the month.

I have no idea what that means. How does time enter into
the calculation? You allocate teachers to their *nearest* venue.
Where does time enter into that? There must be some other
criteria? You schedule to them to the nearest venue that
they have not already visited within the last week/month
or something?

> This is only the last part of the complete algorithm. I am sending the
> files associated to it and the code that I have made till now. 

The code got through but the data didn't. The server only permits
text based attachments. However your code is over 300 lines long
and poorly structured (hint create some well named functions!)

Meanwhile you have not given us any clues about what kind of help you
need. Does it work? Does it fail with an error - what error?
Does it fail to load the data correctly - in what way?

You cannot seriously expect us to wade through 300+ lines of
code that, by your own admission, only partially describes
the problem with nothing more than a loosely defined problem

> Please help me in making it as I need it urgently.

We won't do your homework for you. Tell us what you are
having difficulties with and we will try to help.
But first, for all our sakes, go back and restructure your
code into functions with reasonable names. Each one
performing one clearly defined part of your solution.

Then perhaps it will be easier to see where the issue(s)
lie and certainly easier to describe them.

On 7/4/19 3:53 PM, Alan Gauld via Tutor wrote:

>> Does this means that the Dict is ordered? or it is implementation dependent?
> Neither, it means the items in a list always have indexes
> starting at zero.
> By pure coincidence dictionaries in recent Python versions (since 3.6
> or 3.7???) retain their insertion order. But that was not always the
> case, but the result would have been the same so far as the 0,1,2 bit goes.

To be a little more precise, in 3.6 CPython insertion order was
preserved as an artefact of the new implementation of dicts, but not
promised to be that way. Since 3.7 it is guaranteed (it is actually in
the language specification, so other Pythons have to do this too now).

It's still not the same as a collections.OrderedDict, which has some
useful additional features in case you care a lot about ordering.

From at  Fri Jul  5 12:32:59 2019
From: at (Animesh Bhadra)
Date: Fri, 5 Jul 2019 22:02:59 +0530
Subject: [Tutor] enumerate over Dictionaries
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks Alan and Mats for the explanation.

On 05/07/19 19:57, Mats Wichmann wrote:
> On 7/4/19 3:53 PM, Alan Gauld via Tutor wrote:
>>> Does this means that the Dict is ordered? or it is implementation dependent?
>> Neither, it means the items in a list always have indexes
>> starting at zero.
>> By pure coincidence dictionaries in recent Python versions (since 3.6
>> or 3.7???) retain their insertion order. But that was not always the
>> case, but the result would have been the same so far as the 0,1,2 bit goes.
> To be a little more precise, in 3.6 CPython insertion order was
> preserved as an artefact of the new implementation of dicts, but not
> promised to be that way. Since 3.7 it is guaranteed (it is actually in
> the language specification, so other Pythons have to do this too now).
> It's still not the same as a collections.OrderedDict, which has some
> useful additional features in case you care a lot about ordering.
I have made the complete program but when I am compiling the program it is
showing errors. Can you please help to resolve this?
The code is in the file attached with this mail.

import math
import csv
import pdb
import pandas as pd 
import numpy as np 
from math import radians, sin, cos, acos

def distanceCalculator(latitude1,longitude1,latitude2,longitude2):
        slat = radians(latitude1)
        slon = radians(longitude1)
        elat = radians(latitude2)
        elon = radians(longitude2)
        dist = 6371.01 * acos(sin(slat)*sin(elat) + cos(slat)*cos(elat)*cos(slon - elon))
        return dist

df = pd.read_csv("venueData.csv",header=None)

df2 = pd.read_csv("mtdata.csv",header=None)
df2.columns = ['name','location','latitude','longitude','subject']

teacher = pd.read_csv("teachers.csv")



teacher['Latitude'] = teacher['Latitude'].apply(lambda x: x.rstrip(",") if type(x)  == str else x )
teacher['Longitude'] = teacher['Longitude'].apply(lambda x: x.rstrip(",") if type(x)  == str else x )

listEmpty = []

dictionaryTeacher = {}

for i,ex in teacher.iterrows():
    lat1 = ex['Latitude']
    lon1 = ex['Longitude']
    Id = i 
    for b,c in df.iterrows():
        lat2 = c['latitude']
        lon2 = c['longitude']
        nameVen = c['name']
    demian = []    
    demian = listEmpty[0]
    dictionaryTeacher[ex['Name']] = demian
    listEmpty = []
    demian = []

DataTeacher = pd.DataFrame(columns=['Teacher','Distance','Venue','Eng','Hindi','Maths','TeacherId'])

number = 3 

for ex in dictionaryTeacher:
    DataTeacher= DataTeacher.append({'Teacher':ex,'Distance':dictionaryTeacher[ex][0],'Venue':df.loc[dictionaryTeacher[ex][2]]['name'],'Eng':teacher.loc[dictionaryTeacher[ex][1]]['Eng'],'Hindi':teacher.loc[dictionaryTeacher[ex][1]]['Hindi'],'Maths':teacher.loc[dictionaryTeacher[ex][1]]['Maths'],'TeacherId':dictionaryTeacher[ex][1]},ignore_index=True)

days = pd.read_csv("days.csv")

days.columns = ['January', 'January:Days', 'February', 'Feburary:Days', 'March',
       'March:Days', 'April', 'April:Days', 'May', 'May:Days', 'June',
       'June:Days', 'July', 'July:Days', 'August', 'August:Days',
       'September', 'September:Days', 'October', 'October:Days', 'November',
       'November:Days', 'December', 'December:Days']


df['name'] = df['name'].apply(lambda x: x.rstrip())

df2['name'] =df2['name'].apply(lambda x: x.rstrip())

venue = {}

for i,k in enumerate(df['name']):
    if (k not in venue):
        venue[k] = {'January': 0 ,'February':0 , 'March': 0 , 'April':0, 'May':0,'June ':0 ,'July':0 , 'August':0,'September':0,'October':0,'November':0,'December':0}

teacher = {}

for i,k in enumerate(df2['name']):
    if (k not in teacher):
        teacher[k] = {'January': 0 ,'February':0 , 'March': 0 , 'April':0, 'May':0,'June ':0 ,'July':0 , 'August':0,'September':0,'October':0,'November':0,'December':0}


dictionary = {}
liste = []

for i,ex in df2.iterrows():
    nameT = ex['name']
    lat1  = ex['latitude']
    lon1  = ex['longitude']
    sub   = ex['subject'] 
    Id = i 
    for b , x in df.iterrows():
        nameM = x['name']
        lat2  = x['latitude']
        lon2  = x['longitude']
        id2 = b

    dictionary[ex['name']] = liste[0:3]
    liste = []
Data = pd.DataFrame(columns=['Trainer','Venue','Distance','Subjects','Location','VenDistrict','VenBlocks'])

for ex in dictionary:
    for i , k in enumerate(dictionary[ex]):
        Data = Data.append({'Trainer':ex ,'Venue': df.loc[dictionary[ex][i][2]]['name'],'Distance': dictionary[ex][i][0],'Subjects':df2.loc[dictionary[ex][i][1]]['subject'],'Location':df2.loc[dictionary[ex][i][1]]['location'],'VenDistrict':df.loc[dictionary[ex][i][2]]['district'],'VenBlocks':df.loc[dictionary[ex][i][2]]['block']},ignore_index=True)


Data['Month'] = -1

for i,ex in Data.iterrows():
    Train = ex['Trainer']
    Venue = ex['Venue']
    for ex in teacher[Train]:
            if(venue[Venue][ex] == 0):
                teacher[Train][ex] = 1
                venue[Venue][ex] = 1
                Data.loc[i,"Month"] = ex


strings = " "

for ex in days[days["January"][:].str.contains("Tue")]['January'].index:
    strings += ","+ str(ex)

strings = strings.rstrip(",")

strings = strings.lstrip()

#days = days.apply(lambda x : x.rstrip())

Data['Month'] = Data['Month'].apply(lambda x : x.rstrip())

listeDays = []
Cal = 0 

num = 0 
strings = ""

for i,ex in Data.iterrows():
    ay = ex['Month']
    indexOfDays = days[days[ay][:].str.contains("Mon") |  days[ay][:].str.contains("Tue") | days[ay][:].str.contains("Wed") | days[ay][:].str.contains("Thu") | days[ay][:].str.contains("Fri")][ay].index
    for ex in days[days[ay][:].str.contains("Mon") |  days[ay][:].str.contains("Tue") | days[ay][:].str.contains("Wed") | days[ay][:].str.contains("Thu") | days[ay][:].str.contains("Fri")][ay]:
        num = indexOfDays[Cal]
        num +=1
        Cal +=1
        strings += str(num) + "," + ex + ","

    Data.loc[i,"Days"] = strings
    num = 0 
    Cal = 0 
    strings = ""

Data['Month'] = Data['Month'].apply(lambda x: x.rstrip())

Data['Days'] = Data['Days'].apply(lambda x : x.replace("\n",","))


DfSub = pd.read_csv("backupMasterTrainers.csv")

copyDfSub = DfSub

ab = DfSub.sample(n=1)

ab.columns = ['Name',"Location","Latitude","Longitude","Subject"]


Id = ""
VenID = 0 
newName = ""

dist = ""

VenLat =""
VenLot =""

newLat = ""
newLon = ""
newSub = ""
newLoc = ""

for i,ex in Data.iterrows():
    name = ex['Trainer']
    month = ex['Month']
    lecture = ex['Subjects']
    print(name,"Do you want to give ",lecture,"in this month :",month)
    answer = input("For Yes : Y For No N") 
    if answer == 'Y' or answer == 'y':
    elif answer == 'N' or answer =='n':
        ab = copyDfSub.sample(n=1)
        ab.columns = ['Name',"Location","Latitude","Longitude","Subject"]
        Id = ab.index[0]
        newName = ab.loc[Id,"Name"]
        newLat = ab.loc[Id,"Latitude"]
        newLon = ab.loc[Id,"Longitude"]
        newSub = ab.loc[Id,"Subject"]
        newLoc = ab.loc[Id,"Location"]
        VenueName = ex['Venue']
        for z,b in df.iterrows():
            if(VenueName in b['name']):
                VenID = z 
        VenLat = ven['Latitude'][z]
        VenLon = ven['Longitude'][z]
        dist = distanceCalculator(newLat,newLon,VenLat,VenLon)

        Data.loc[i,"Trainer"] = newName
        Data.loc[i,"Distance"] = dist
        Data.loc[i,"Subjects"] = newSub
        Data.loc[i,"Location"] = newLoc

df5 = pd.merge(Data,DataTeacher,on="Venue",how='inner')


teachers = pd.read_csv("teachers.csv")

teachers.groupby(['Name','Location'],as_index=False).agg({'Distance_x': 'count'})

TeachDictMon = {}

for i,ex in teachers.iterrows():
    NameTeach = ex['Name']
    if NameTeach not in TeachDictMon:
        TeachDictMon[NameTeach] = {'January':0,'February':0,'March':0,'April':0,'May':0,'June':0,'July':0,'August':0,'September':0,'October':0,'November':0,'December':0}


df5= pd.merge(Data,DataTeacher,on='Venue',how='inner')

df5['Subjects'] = df5['Subjects'].apply(lambda x : x.title())

df5.columns = ['Trainer', 'Venue', 'Distance_x', 'Subjects', 'Location', 'VenDistrict',
       'VenBlocks', 'Month', 'Days', 'Teacher', 'Distance_y', 'Eng', 'Hindi',
       'Math', 'TeacherId']


dfNew = pd.DataFrame(columns=["Venue","Batches","Month","Block","District","Subjects","Date"])

dictBat = {}
BatchPoint = 0 

for i,ex in df5.iterrows():
    teacherId = ex['TeacherId']   
    train = ex['Trainer']
    ven = ex['Venue']
    sub = ex['Subjects']
    month = ex['Month']
    a = df5[(df5['Trainer']==train) & (df5['Venue'] == ven) & (df5['Subjects'] == sub)]
    con = a.count()
    BatchPoint = len(con)
    for c,b in a.iterrows():
        teacherId = b['TeacherId']   
        train = b['Trainer']
        ven = b['Venue']
        suber = b['Subjects']
        sub = a.loc[c][suber]
        if(sub not in dictBat):
            dictBat[sub] = []
    dfNew = dfNew.append({'Venue':ven,'Batches':dictBat,'Month':month,"District":ex['VenDistrict'],'Subjects':ex['Subjects'],'Blocks':ex['VenBlocks'],'Date':ex['Days']},ignore_index=True)
    dictBat = {}

a = dfNew[dfNew['Batches'] == {}].index







Output = pd.DataFrame(columns=['S', 'Date', 'Subject', 'Group', 'Trained', 'District', 'Block',
       'Venue', 'Month'])

GroupName = ""
string = ""
num = 0 

count = 0 

for i,ex in dfNew.iterrows():
    Ven = ex['Venue']
    Batch = ex['Batches']
    Month = ex['Month']
    Block = ex['Blocks']
    Dist = ex['District']
    Sub = ex['Subjects']
    Date = ex['Date']
    num = len(Batch)
    for x in range(num):
        if x == 0:
            trained = len(Batch[list(dfNew.loc[i,"Batches"].keys())[x]])
            string = Date.split(",")
            s = str(list(dfNew.loc[i,"Batches"].keys())[x])
            s= s[1]
            trained = len(Batch[list(dfNew.loc[i,"Batches"].keys())[x]])
            s = str(list(dfNew.loc[i,"Batches"].keys())[x])
            s = s[1]



Output = pd.DataFrame(columns = ["S","Date","Subject","Group","Trained","District","Block","Venue","Month"])


df5['Subjects'] = df5['Subjects'].apply(lambda x : x.title())

a=df5[(df5['Subjects'] == 'Eng') & (df5['Venue'] == 'Gobind pura')]

numOfMonth = 0 

for i,ex in df5.iterrows():
    TrainerName = ex['Trainer']
    Venue = ex['Venue']
    VenBlock = ex['VenBlocks']
    Month = ex['Month']

df5.columns = ['Trainer', 'Venue', 'Distance_x', 'Subjects', 'Location', 'VenDistrict',
       'VenBlocks', 'Month', 'Days', 'Teacher', 'Distance_y', 'Eng', 'Hindi',
       'Math', 'TeacherId']

On 06/07/2019 05:28, Suhit Kumar wrote:

> I have made the complete program but when I am compiling the program it is
> showing errors. Can you please help to resolve this?
> The code is in the file attached with this mail.

And where are the errors?
Do not expect us to run unknown code received over the internet.
Only a fool would do such a thing!

Show us the error messages, with full traceback text.

A quick scan of the code reveals some stylistic things
that you could do but they are not the cause of your errors,
whatever they may be:

for i,ex in teacher.iterrows():
    lat1 = ex['Latitude']
    lon1 = ex['Longitude']
    Id = i

    for b,c in df.iterrows():
        lat2 = c['latitude']
        lon2 = c['longitude']
        nameVen = c['name']

    demian = []
    demian = listEmpty[0]
    dictionaryTeacher[ex['Name']] = demian
    listEmpty = []
    demian = []

listEmpty is a terrible name choice. You should never name variables
after their data type name them for the purpose they serve. And don;t
call it empty since that only applies at one particular poit. name it
after what you intend to store in it...

secondly you initialise demian to a list. Then you throw that list
away and assign a different value to it. The initial assignment is

You initialise listEmpty outside the loop and then again at the end.
If you move the initialisation into the loop body at the top you
won't need to reset it at the end. It only saves a line of code
but if you ever need to change the initial value it means you only
need to do it in one place.

I don't have time to read through the rest. You really should
refactor your code into functions. It will make it easier
to modify, easier to debug, and much easier to read and discuss.

Hi all.


In C, you can use pointers to reference variables, arrays, ETC. In python, I
do not recall anything specifically that refers to such a capability. What I
want to do is:



I want to create different data structures such as dictionaries which
contain specific  list elements based upon specific keys. The original data
structure could look like:


Data = [

  ['2019-01-19','Fred Flintstone',23],

['2019-02-01','Scooby doo', 99]



The above structure does have 100's of elements. I want to reference
specific lists within the above structure. Using the only method I know how:


Category = {'under-50':[data[0]], 'over-50':[data[1]]}


If I understand things correctly with Python. The above will copy the value
into the list within the key. Not the memory address of the nested list I am
referencing. I am using a list within the key to permit multiple references
to different nested lists from the original data structure. The end result
of the structure for the dict could look like this (using example, not real



Category['under-50'] = [ List1 pointer, List22 pointer, List52 pointer]



I hope the above makes sense. How can this be done? 

I hope this email finds you well. I am writing to get help with the AI
program I am currently working on.

I have a python program for the project which is about developing an AI
program to identify persons. The program I have identities the first
person. It is not able to identify the 2nd person. When we run it, it is
not even showing an error. I need someone to look at it and let me know me
the issue. I can share the source code.

Thank you!

Best regards,

On 07/07/2019 03:39, mhysnm1964 at wrote:

> In C, you can use pointers to reference variables, arrays, ETC. In python, I
> do not recall anything specifically that refers to such a capability.

In Python a variable is a name that refers to an object.
Many names can refer to the same object. So in that respect
Python variables are more like pointers than regular C
variables which are a named location in memory.

> Data = [
>   ['2019-01-19','Fred Flintstone',23],
>   ['2019-02-01','Scooby doo', 99]
> ]
> Category = {'under-50':[data[0]], 'over-50':[data[1]]}
> If I understand things correctly with Python. The above will copy the value
> into the list within the key.

No, that is not correct.
It will create a reference to the same data object

So Category['under-50'][0] and Data[0] will both reference
the same list object. Modifying the data through either
variable will affect both because it will be the same
list being modified.

>  Not the memory address of the nested list I am
> referencing. 

It is best to forget all about memory addresses when thinking
about Python. They are irrelevant for the most part..

> Category['under-50'] = [ List1 pointer, List22 pointer, List52 pointer]

That is exactly what happens in Python, as standard.

The usual issue that people have with this is that they modify
the data in one place and are surprised to discover it has
been modified elsewhere too. If that is a problem then you must
explicitly create a copy. But the behaviour that you apparently
want is the default.

On 07/07/2019 03:49, Rashmi Vimalendran wrote:

> I have a python program for the project which is about developing an AI
> program to identify persons. The program I have identities the first
> person. It is not able to identify the 2nd person. When we run it, it is
> not even showing an error. I need someone to look at it and let me know me
> the issue. I can share the source code.

We will need to see code. If it is less than say 100 lines post it in
the body of your mail(not an attachment!) and if it is longer put it in
a pastebin web site and send a link.

Send us some sample data too so we can see the structures.

A little bit more detail on what exactly the output looks
like and how you identified the problem would help.

Finally, tell us the OS, Python version and any third party
libraries you are using - SciPy, Rpy, etc.

perhaps something needs to be cleared?
What do mean by identify, as in recognition or detection?

what kind of network?
What NN framework (Keras, tenser flow,caffee etc.)

Are both people in the same frame?

> Den 7. jul. 2019 kl. 04.49 skrev Rashmi Vimalendran <rashmi.vimalendran at>:
> Hi,
> I hope this email finds you well. I am writing to get help with the AI
> program I am currently working on.
> I have a python program for the project which is about developing an AI
> program to identify persons. The program I have identities the first
> person. It is not able to identify the 2nd person. When we run it, it is
> not even showing an error. I need someone to look at it and let me know me
> the issue. I can share the source code.
> Thank you!
> Best regards,
> Rashmi
> _______________________________________________
> Tutor maillist  -  Tutor at
> To unsubscribe or change subscription options:

First-off, it has to be said that "100's of elements" suggests using an 
RDBMS - particularly if 'age' (eg 23 and 99) is not the only likely 
selection mechanism.

On 7/07/19 2:39 PM, mhysnm1964 at wrote:
> Hi all.
> In C, you can use pointers to reference variables, arrays, ETC. In python, I
> do not recall anything specifically that refers to such a capability. What I
> want to do is:

Just because C has a construct does not imply that it does, nor even 
should, exist in another language! You're using Python because it is 
'better', right?

You are correct, Python does not use "pointers", and (a personal 
comment) I for one don't miss them and their many 'gotchas', eg 
out-by-one errors, preferring Python's constructs, eg for-each.

That said, Python's sequences (data structures, eg strings and lists) do 
offer indices, slicing, and striding. So, it is quite possible to 
(relatively) address the first item in a list as list_item[ 0 ]. You can 
read about these (and many other delights) in the docs...

> I want to create different data structures such as dictionaries which
> contain specific  list elements based upon specific keys. The original data
> structure could look like:
> Data = [
>    ['2019-01-19','Fred Flintstone',23],
> ['2019-02-01','Scooby doo', 99]
> ]

Warning1: seem to be missing any identification of the "key"
Warning2: the intro text talked about "dictionaries" (the correct word) 
but the code-snippet is describing nested lists

> The above structure does have 100's of elements. I want to reference
> specific lists within the above structure. Using the only method I know how:
> Category = {'under-50':[data[0]], 'over-50':[data[1]]}
> If I understand things correctly with Python. The above will copy the value
> into the list within the key. Not the memory address of the nested list I am
> referencing. I am using a list within the key to permit multiple references
> to different nested lists from the original data structure. The end result
> of the structure for the dict could look like this (using example, not real
> output)
> Category['under-50'] = [ List1 pointer, List22 pointer, List52 pointer]
> I hope the above makes sense. How can this be done?

I hope I've understood the description! One option would be to follow 
your line of thinking by turning the first data-structure into a 
dictionary (key-value) pairs, where the key is the character's age and 
the value is the inner list structure, previously outlined:

   23: ['2019-01-19','Fred Flintstone',23],
   99: ['2019-02-01','Scooby doo', 99]

Then it would be possible to maintain the two lists, each containing 
keys for the relevant dict-elements:

under_50 = [ 23 ]
over_50 = [ 99 ]

However, this would require that only one character be listed at a given 
age (dict keys must be unique), so another key might be a better choice!

Another data structure you might consider is a "linked list".

Regards =dn

On 07/07/2019 09:19, David L Neil wrote:
> First-off, it has to be said that "100's of elements" suggests using an 
> RDBMS - particularly if 'age' (eg 23 and 99) is not the only likely 
> selection mechanism.

Multiple selection mechanisms might suggest an RDBMS but hundreds of
items is chickenfeed and an RDBMS would be overkill for such small
numbers, if volume was the only criteria. Millions of items would
certainly warrant such an approach but nowadays holding 10's of
thousands of items in memory is entirely reasonable.

On 7/6/19 8:39 PM, mhysnm1964 at wrote:
> Hi all.
> In C, you can use pointers to reference variables, arrays, ETC. In python, I
> do not recall anything specifically that refers to such a capability. What I
> want to do is:
> I want to create different data structures such as dictionaries which
> contain specific  list elements based upon specific keys. The original data
> structure could look like:
> Data = [
>   ['2019-01-19','Fred Flintstone',23],
>   ['2019-02-01','Scooby doo', 99]
> ]
> The above structure does have 100's of elements. I want to reference
> specific lists within the above structure. Using the only method I know how:
> Category = {'under-50':[data[0]], 'over-50':[data[1]]}
> If I understand things correctly with Python. The above will copy the value
> into the list within the key. Not the memory address of the nested list I am
> referencing. I am using a list within the key to permit multiple references
> to different nested lists from the original data structure. The end result
> of the structure for the dict could look like this (using example, not real
> output)
> Category['under-50'] = [ List1 pointer, List22 pointer, List52 pointer]
> I hope the above makes sense. How can this be done? 

It's easy enough to convince yourself that what Alan said is true. You
can, for example, use the id function to show this:

# identity of 0'th element of Data:
# identity of the list that is the value of 'under-50' key:
# identity of 0'th element of that list:

the first and third should be the same, showing you that's the same
object referred to by those two places.  Again, like Python's
"variables", these are just references to objects. As in:

item = Data[0]
print(id(item), id(Data[0]))

(note: id() is handy for explorations, especially interactive ones, but
isn't terribly useful for production code.  don't attach any meaning to
the value returned by id() other than "unique" - different Pythons
famously generate different id values, something that's been known to
confuse people doing experiments in, for example, PyPy)

Since you turned up here you sometimes also get free unasked-for advice:
I know this was a toy fragment just to explain the concept you're
getting at, but you'll normally want to build your design in a way that
minimizes "magic".  Using numbered indices into an array-like structure
is one of those bits of magic that raises flags. To refer to Fred's age,
you could end up with  Data[0][2].  That would be pretty ugly, and
worse, hard to remember what it meant.  Try to seek ways you can give
meaningful names to things.  We can make suggestions if that's of
interest (I don't want to belabor the point).

From PyTutor at  Sun Jul  7 15:54:51 2019
From: PyTutor at (David L Neil)
Date: Mon, 8 Jul 2019 07:54:51 +1200
Subject: [Tutor] pointers or references to variables or sub-sets of
 variables query.
In-Reply-To: <qft0oj$2tvc$>
References: <01c001d5346d$30798e90$916cabb0$>
Message-ID: <>

On 8/07/19 2:48 AM, Alan Gauld via Tutor wrote:
> On 07/07/2019 09:19, David L Neil wrote:
>> First-off, it has to be said that "100's of elements" suggests using an
>> RDBMS - particularly if 'age' (eg 23 and 99) is not the only likely
>> selection mechanism.
> Multiple selection mechanisms might suggest an RDBMS but hundreds of
> items is chickenfeed and an RDBMS would be overkill for such small
> numbers, if volume was the only criteria. Millions of items would
> certainly warrant such an approach but nowadays holding 10's of
> thousands of items in memory is entirely reasonable.

Assuming plentiful RAM: agreed.
(However, some of us grew-up at a time when RAM was expensive and even 
in our relaxed state, such 'costs' still impinge on our consciousness - 
also, in another thread (here?Python list) we had someone frustrated 
about using an MS-Vista 'powered' machine and limited to 32-bits. We 
don't know the OP's circumstances. That said, loading an RDBMS, if (s)he 
doesn't already have one, is...)

As you point-out, with memory more-commonly available, I've obtained 
significant speed improvements by moving relatively small, and 
particularly temporary, DB tables into MySQL's MEMORY storage (and with 
almost zero code-change/risk)!
(so, it IS possible to teach old dogs new tricks)

The key justification for moving to RDBMS would be "not the only 
selection mechanism". Whereas a Python dictionary (hash) offers speedy 
access to data based upon a single index, it is hard to beat the bug- 
and time-saving facility of a DB managing multiple indices/indexes.
(appreciating that I have no difficulty moving from (Python) procedural 
programming to (SQL) declarative, but many of our colleagues hate such, 
and with a passion)

So, using the OP's data-example, and assuming the 'columns' to be 
perhaps employment_date, name, and age; respectively:

['2019-01-19','Fred Flintstone',23],
['2019-02-01','Scooby doo', 99]

- which Python (and pythonic - per OP's theme) structures and methods 
offer a relatively bug-unlikely solution to holding *multiple* indices 
into a base list (or other collection)?
(alternately, maybe we should wait for the OP, and allow opportunity to 
complete the homework first?)

(NB this may be veering OT, if the OP requires only the single access 
method, such as that illustrated earlier)
Regards =dn

From alan.gauld at  Sun Jul  7 18:54:39 2019
On 07/07/2019 20:54, David L Neil wrote:

> (However, some of us grew-up at a time when RAM was expensive and even 
> in our relaxed state, such 'costs' still impinge on our consciousness - 

Indeed, my first computer was at the local university and had 64KB.

My second computer was a Sinclair ZX81 (Timex in the USA?) with 16K

My third, a CP/M machine with 64K and 256K RAM disk and dual
floppies - such luxury! :-)

So I agree, it is hard to get out of that mode of thinking. But
today the minimum RAM is typically 4GB or more. My desktop
boxes all have 16GB and even my ancient Netbook has 4G.
My 20 year old iBook has 640M and even that is enough to
run Python with many thousands of data objects instantiated.

> particularly temporary, DB tables into MySQL's MEMORY storage (and with 
> almost zero code-change/risk)!

Yes, I use SQLite's MEMORY facility reguilarly. Not for managing
high volumes but where I need flexible search capability.
A SQL SELECT statement is much more flexible and faster
than any Python search I could cobble together.

> (appreciating that I have no difficulty moving from (Python) procedural 
> programming to (SQL) declarative, but many of our colleagues hate such, 
> and with a passion)

Yes, I've never quite understood why some programmers are
reluctant to use SQL. For complex structured data it is by far the
simplest approach and usually very efficient, especially with big
volumes. But simple searches on small datasets are easier (or as easy)
in native Python.

I cannot seem to figure this potential bug out.

Please advise,

Jesse Ibarra

I cannot seem to figure this potential bug out.

Please advise,

Jesse Ibarra

From: Julien Palard <julien at>
Sent: Sunday, July 7, 2019 2:04 PM
To: Ibarra, Jesse
Subject: Re: [docs] Python Embedding PyImport_ImportModule

Hi Jesse,

> Why does this code only print the result(array([ 1.,  1.,  1.,  1.,  1.])) once, when I am calling the python code twice?

You're on a mailing list about Python documentation, not embedding support. Have you tried

Julien Palard

On 08/07/2019 15:14, Ibarra, Jesse wrote:
> I cannot seem to figure this potential bug out.

Neither can we since we cannot see any code.

You need to give us some context. What are you trying to do? What
libraries are you using? Which OS and Python versions?

Did you get ay errors? If so post them in their entirety.

But most of all post the code!

On 8/07/19 10:54 AM, Alan Gauld via Tutor wrote:
> On 07/07/2019 20:54, David L Neil wrote:
>> (However, some of us grew-up at a time when RAM was expensive and even
>> in our relaxed state, such 'costs' still impinge on our consciousness -
> Indeed, my first computer was at the local university and had 64KB.
> My second computer was a Sinclair ZX81 (Timex in the USA?) with 16K

Wow, go you! I could never cope with using analog tape decks as digital 
storage devices.

I've just decided to take a look at SBCs. I guess the 'splash' of news 
about the Raspberry Pi 4 contributed to that, and yet also contributes 
to my prevarication/procrastination...

My 'first' was at high school - an over-grown accounting machine. 
Paper-tape program-software, Teletype-style input and 'line-flow' 
output, and a magnetic drum for data-storage (and yes, I was computing 
physical locations in order to optimise response times - try that!)

At uni, we built Motorola D2 Kits - IIRC an 8-bit Motorola MC6800 
processor. Maybe a 6809 - or more likely, that was the one to which we 
aspired. Those who could 'afford' more hardware courses started building 
intelligent devices, eg embedding a microprocessor within a 'dumb 
terminal'/'green screen'. Intelligent devices, Internet of Things. Plus 
?a change!

> My third, a CP/M machine with 64K and 256K RAM disk and dual
> floppies - such luxury! :-)

Ah nostalgia.

One of my divertissements of that era (early-80s) was MP/M - a bus 
network of what were effectively single-board processors. It implemented 
my first 'computer lab' and we taught everything from COBOL to 
accounting software and word processing on it. Fantastic stuff in its day!
(interestingly, my SBC research last night, took me to a R.Pi device 
embodying exactly these concepts:

> So I agree, it is hard to get out of that mode of thinking. But
> today the minimum RAM is typically 4GB or more. My desktop
> boxes all have 16GB and even my ancient Netbook has 4G.
> My 20 year old iBook has 640M and even that is enough to
> run Python with many thousands of data objects instantiated.
>> particularly temporary, DB tables into MySQL's MEMORY storage (and with
>> almost zero code-change/risk)!
> Yes, I use SQLite's MEMORY facility reguilarly. Not for managing
> high volumes but where I need flexible search capability.
> A SQL SELECT statement is much more flexible and faster
> than any Python search I could cobble together.
>> (appreciating that I have no difficulty moving from (Python) procedural
>> programming to (SQL) declarative, but many of our colleagues hate such,
>> and with a passion)
> Yes, I've never quite understood why some programmers are
> reluctant to use SQL. For complex structured data it is by far the
> simplest approach and usually very efficient, especially with big
> volumes. But simple searches on small datasets are easier (or as easy)
> in native Python.

Agreed, but if we move beyond standard dict-s, into multi-keyed data 
structures - even with PSL and PyPI at our disposal, isn't it 
much-of-a-muchness to use MySQL/SQLite versus linked-lists or trees?
(or perhaps am showing too much bias from personal experience?)

The "reluctance" (good word!) is intriguing: (a) one more 
package/language to learn - yet such claimants might well have been the 
ones leaping into NoSQL a few years back; and (b) it is a different way 
of thinking - compare 'spaghetti' and monolithic code to "structured", 
procedural to OOP, OOP to 'functional'... I notice a similar likelihood 
to avoid HTML/CSS because of their 'declarative' approaches.

Hey, if you don't like the green ones, that's all the more Smarties/M&Ms 
for me!
(also, avoid the brown ones, they may be 'sheep pellets'/rabbit 
Regards =dn

From bgailer at  Mon Jul  8 19:51:35 2019
Data = [
>> ?? ['2019-01-19','Fred Flintstone',23],
>> ['2019-02-01','Scooby doo', 99]
>> ]
Warning 3: age is not a fundamental attribute; it is a computed value!

On Tue, 9 Jul 2019 at 03:13, Alan Gauld via Tutor <tutor at> wrote:
> On 08/07/2019 15:14, Ibarra, Jesse wrote:
> >
> > I cannot seem to figure this potential bug out.
> Neither can we since we cannot see any code.

I'm guessing this might be the original post:

And while trying to find that one, I noticed that Jesse asked
another question on a similar topic:

The questions are about calling Python functions from C code.
So asking the docs mailing list isn't likely to produce useful responses.

It appears that some good advice on what would be the best place to
ask these questions is needed, can anyone here provide that

Sorry for the duplicate threads but the forwarded message did not send the original email. I apologize for any inconvenience.
The file are below.

I am running CentOS7:

[jibarra at redsky ~]$ uname -a
Linux 3.10.0-957.21.2.el7.x86_64 #1 SMP Wed Jun 5 14:26:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I am using Python3.6:
[jibarra at redsky ~]$ python3.6
Python 3.6.8 (default, Apr 25 2019, 21:02:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.

I was sucessfully ran example:

I am sucessfully ran example and named it

from scipy.optimize import minimize, rosen, rosen_der
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)

I then embedded the example using C/Python API:

int main(){
  PyObject *pModule, *pName;

  PyRun_SimpleString("import sys");

  pModule = PyImport_ImportModule("Rosenbrock_Function");
  pName = PyImport_ImportModule("Rosenbrock_Function");



  return 0;

Why does this code only print the result(array([ 1.,  1.,  1.,  1.,  1.])) once, when I am calling the python code twice?

Please advise,
Jesse Ibarra

On 09/07/2019 15:13, Ibarra, Jesse wrote:

Caveat: I'm no expert on embedding and indeed have only done
it once using the examples in the docs. However, based on
my general Python experience...

> I then embedded the example using C/Python API:
> int main(){
>   PyObject *pModule, *pName;
>   Py_Initialize();
>   PyRun_SimpleString("import sys");
>   PyRun_SimpleString("sys.path.append('.')");
>   pModule = PyImport_ImportModule("Rosenbrock_Function");
>   pName = PyImport_ImportModule("Rosenbrock_Function");

These import the same module twice. But Python usually checks if a
module is already imported so I'm guessing it does the same here.

>   Py_XDECREF(pModule);
>   Py_XDECREF(pName);

Maybe if you decrement the count after each import you will
get the result you want?

Although I'm guessing that relying on code to run via an import
is probably a bad practice. You'd normally expect to import
the module then call a function.

But that's mostly guesswork, so I could well be wrong!

I am using MAC OS X 10.14.5 on a MAC iBook
I use Python 3.7.0 from Anaconda, with Spyder 3.3.3

I am a relative beginner.

My program models cell reproduction. I have written a program that models this and it works.

Now I want to model a tissue with several types of cells. I did this by simply rerunning the program with different inputs (cell characteristics). But now I want to send and receive signals between the cells in each population. This requires some sort of concurrent processing with halts at appropriate points to pass and receive signals.

I thought to use multiprocessing. I have read the documentation and reproduced the models in the docs. But I cannot figure out how to feed in the data for multiple parameters.

I have tried using Pool and it works fine, but I can only get it to accept 1 input parameter, although multiple data inputs with one parameter works nicely.

So, my questions are;

  1.  Is multiprocessing the suitable choice.
  2.  if yes, how does one write a function with multiple input parameters.

Thank s in advance.


How might I best make a linked list subscriptable? Below is skeleton code
for a linked list (my
actual is much more). I've done __iter__ and __next__ but I would like to
be able to do start:stop:stride I just can't figure out how. Suggestions or
just hints please?

# -*- coding: utf8 -*-

class Node:
    def __init__(self, val=None, nxt=None):
        self.val = val = nxt

class LinkedList:
    def __init__(self, node=None):
        self.root = node
        self.length = 0

    def prepend(self, data):
        """ add data to head/root of list """
        if self.root == None:
            self.root = Node(data)
            self.length = self.length + 1
            n = Node(data)
   = self.root
            self.root = n
            self.length = self.length + 1

    def pop(self):
        """ Remove first data node """
        t = self.root.val
        if self.root:
            self.root =
            self.length = self.length - 1
        return t

    def __repr__(self):
        tmp = self.root
        s = ''
        while tmp:
            s = s + str(tmp.val) + '> '
            tmp =

        return s[:-2]

ll = LinkedList()
[ll.prepend(x) for x in range(14,-1,-1)]

>>> ll
0> 1> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14

On 10Jul2019 20:30, Sarah Hembree <sarah123ed at> wrote:
>How might I best make a linked list subscriptable? Below is skeleton code
>for a linked list (my
>actual is much more). I've done __iter__ and __next__ but I would like to
>be able to do start:stop:stride I just can't figure out how. Suggestions or
>just hints please?

Well, you could write a method to find and return element `n`, counting 
from 0 (the root Node).

For start stop stride you could do the extremely simple approach: 
iterate over a range() of the start stop stride and call the 
get-element-n method. This will be very slow though (a complete scan for 
every element).

Instead, you could write a method accepting a start, stop and stride.  
Call the find-element-n method to get to start, lets call that `n`.  
While n < stop, step forward `stride` elements and also bump `n` by 
stride. That steps you along each element indicated by the 

You'll notice that this only works for a positive stride.

For a negative stride you're probably better off handing it to range(), 
getting a list of values back from that as a list, and reversing the 
list (which then has ascending values). Collect the elements indicated 
by the list (because you can traverse the linkedlist forwards). Then 
reverse the collected elements and return them.

Now, you _could_ accumulate each element in a list and return that at 
the end. _Or_ you could make the function a generator and just yield 
each element as found.

To turn all this into a subscriptable list, define the __getitem__ 
method as a function accepting an index. If that index is an int, just 
return that element. If the index is a slice (start:stop:stride), call 
the more complicated function to return multiple elements.

Cameron Simpson <cs at>

On 7/10/19 6:30 PM, Sarah Hembree wrote:
> How might I best make a linked list subscriptable? Below is skeleton code
> for a linked list (my
> actual is much more). I've done __iter__ and __next__ but I would like to
> be able to do start:stop:stride I just can't figure out how. Suggestions or
> just hints please?

As a learning exercise this can be interesting, but as to practical
applications, one would like to ask "why"?  If index into the list is
important, then choose a regular list; the "interesting" part of a
linked list, which is "next node" is then available as index + 1.

If you're passing parameters as a list, then you need a "," at the end of the items.  Otherwise if you have something like a string as the only item, the list will be the string.

list_with_one_item = ['item one',]


Quick background on what I'm trying to achieve.

I have a data set from a digital storage oscilloscope.  It includes sampled
data points for several electrical events that I'd like to break down and

The scope generates a single file with all of the events concatenated.  The
data set is a comma-delimited file with time and volts.  There's a header
between events that will allow me to recognize the beginning of a new
event.  This 'raw' file is roughly 4 GBytes of data.  Too much for any
editor to handle.

So, I've written a script that will go through and 'chunk' out each event
and save it to a .csv file.  Now, I have smaller files to work with but
they are still a bit too large for most applications.  These files are
roughly 50 MByte.

In order to extract the desired information from the files, I ended up
reading through each row in the .csv file and finding the events of
interest (rising edges) and saved off data on either side of the event into
arrays, which I saved to .csv files.  I then wrote a script that further
processed the information to generate actual bit-concatenated words..

So, here's where it gets interesting.  And, I'm presuming that someone out
there knows exactly what is going on and can help me get past this hurdle.

When I read through the .csv file and collect the events and the bits, I
get the expected result.  A 16-bit output

To improve efficiency, I then took this snippet of code and included it
into a larger script that will help skip a few manual steps and automate
the process on the original 4 GByte file.  In this larger script, I save
the data that would have normally gone to the .csv files into an array and
I work on the array within the script.  Everything should be the same.. or
so I thought.

When I use the same code, except reading an array, I get results that are
basically 10x the size of the correct output.

I've checked the contents of the array against the contents of the .csv
file (the sources in each of these cases) and they are identical to each
other.  Same size, dimensions and data.

My guess, at this point, is that the way a loop reading a .csv file and the
way a loop reads an array are somehow slightly different and my code isn't
accounting for this.

The other possibility is that I've become code-blind to a simple mistake
which my brain keeps overlooking...

Thank you in advance for your time,

Example code when reading from file:

    risingdetected = False

    for row in csvReader:
        voltage = float(row[1])
        # print("test  v ", voltage)

        if(voltage > (avg + triglevel) and not(risingdetected)):
            # we've found an edge
            risingdetected = True
            edgearray.append([float(row[0]), float(row[1])])
            # print(edgearray)

        elif(voltage < (avg + triglevel) and risingdetected):
            # print(voltage)
            # we've found the falling edge of the signal
            risingdetected = False

    print("edge array: ", edgearray)    # displays the array

    arraysize = len(edgearray)    # roughly a count <= 33
    print("edge array size: ", arraysize)    # display size

Example code when reading array inside a script:
(note that I simplified things since it didn't make sense to have a bunch
of repeated math going on for what ended up being the same result.
Specifically, triggervolts, which is the avg + the trigger level)

[I also added in the else statement to see if there were cases where it was
falling through.. which there are.  But that should be the case in both

        risingdetected = False

        for row in range (len(TrigWind)):
            # grab the voltage from this entry
            voltage = float(TrigWind[row][1])

            # if we've not already detected a rising edge
            # and we're over the trigger level
            if((voltage > triggervolts) and not(risingdetected)):
                # set the rising edge detected to help control flow
                risingdetected = True
                # write the edge entry into the edge array

            # We've detected a rising edge, now we're looking for a falling
            elif((voltage < triggervolts) and risingdetected):
                # we're done with the pulse, time to wait for the next one..
                risingdetected = False


        print("Edge array: ", edgearray)    # display the array results

        arraysize = len(edgearray)    # ends up being about twice the size
of the .csv version
        print("Size of edagearray: ", arraysize)    # show the size of the

From PyTutor at  Thu Jul 11 20:51:05 2019
1 There have been many projects to look at cells, division, 
multiplication, ... It is worth researching the Python eco-system in the 
expectation of saving yourself time and effort!

2 The latest releases of Python (such as you quote) offer updated 
asyncio module(s) for multiprocessing, ie be careful if you are reading 
older articles! We haven't discussed hardware. Most modern PC CPUs offer 
multiple "cores". Assuming (say) four cores, asyncio is capable of 
running up to four processes concurrently - realising attendant 
acceleration of the entirety.
(admittedly, I tend to limit my ambitions to number_of_cores - 1)

On 7/11/19 10:55 AM, Mats Wichmann wrote:
> On 7/10/19 6:30 PM, Sarah Hembree wrote:
>> How might I best make a linked list subscriptable? Below is skeleton code
>> for a linked list (my
>> actual is much more). I've done __iter__ and __next__ but I would like to
>> be able to do start:stop:stride I just can't figure out how. Suggestions or
>> just hints please?
> As a learning exercise this can be interesting, but as to practical
> applications, one would like to ask "why"?  If index into the list is
> important, then choose a regular list; the "interesting" part of a
> linked list, which is "next node" is then available as index + 1.
To expand on the question, the primary use of something like a linked
list is that you want cheap insertions/deletions (of O(1)) and in
exchange for that indexing becomes O(n), verse an array based list which
has O(1) indexing but O(N) insertions/deletions (since you need to
compact the array). Both can be iterated in O(1).

You can add an index operator that takes O(N) time to a linked list.
obj[n] will call obj.__getitem__ (and you will also want to implement
__setitem__, __delitem__), and check if the argument is a slice to
handle slices.

Richard Damon

From alan.gauld at  Fri Jul 12 04:41:56 2019
From: alan.gauld at (Alan Gauld)
Date: Fri, 12 Jul 2019 09:41:56 +0100
Subject: [Tutor] Multiprocessing with many input input parameters
In-Reply-To: <>
References: <>
Message-ID: <qg9h4k$1u57$>

On 12/07/2019 01:51, DL Neil wrote:

> older articles! We haven't discussed hardware. Most modern PC CPUs offer 
> multiple "cores". Assuming (say) four cores, asyncio is capable of 
> running up to four processes concurrently - realising attendant 
> acceleration of the entirety.

Just to pick up on this point because I often see it being cited.
The number of concurrent processes running to achieve performance
gain is only very loosely tied to the number of cores. We ran
concurrent processes for many years before multi-core processors
were invented with significant gains. Indeed any modern computer
runs hundreds of "concurrent" processes on a small umber of cores
and the OS switches between them.

What the number of cores affects is the number of processes
actually executing at the same time. If you just want to chunk
up the processing of a large amount of data and run the exact
same code multiple times then there is no point in having more
than the number of cores. But if your concurrent processes are
doing different tasks on different data then the number of cores
is basically irrelevant. And especially if they are performing
any kind of I/O operations since they are likely to be parked
by the OS for most of the time anyway.

Of course, there is a point where creating extra processes becomes
counter-effective since that is an expensive operation, and
especially if the process will be very short lived or only
execute for tiny lengths of time (such as handling a network
event by passing it on to some other process). But for most real
world uses of multiprocessing the number of cores is not a
major factor in deciding how many processes to run. I certainly
would not hesitate to run 10xN where N is the number of cores.
Beyond that you might need to think carefully.

In Sydney's scenario is sounds like the processes are different
and explicitly halt to perform I/O so the cores issue should
not be a problem.

Thanks Mike,

But I am still not clear.

do I write:

def f([x,y,z]) ?
How exactly do one write the function and how does one ensure that each positional argument is accounted for.

Can someone please explain me the reason for below output.

def fun(n,li = []):
    a = list(range(5))

fun(5) # reason for output (why am I getting to values in this output.)

[[0, 1, 2, 3, 4]]
[7, 8, 9, [0, 1, 2, 3, 4]]
[7, 8, 9, [0, 1, 2, 3, 4]]
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

Thank you,

On 12/07/2019 15:24, Gursimran Maken wrote:

> Can someone please explain me the reason for below output.

You've been bitten by one of the most common gotchas in Python :-)

> def fun(n,li = []):
>     a = list(range(5))
>     li.append(a)
>     print(li)
> fun(4)
> fun(5,[7,8,9])
> fun(4,[7,8,9])
> fun(5) # reason for output (why am I getting to values in this output.)

When you define a default value in Python it creates the default value
at the time you define the function. It then uses that value each time a
default is needed. in the case of a list that means Python creates an
empty list and stores it for use as the default.

When you first call the function with the default Python adds values to
the defaiult list.

Second time you call the function using the default Python adds (more)
values to (the same) default list.

Sometimes that is useful, usually it's not. The normal pattern to get
round this is to use a None default and modify the function like so

def fun(n,li = None):
    if not ni: ni = []   # create new list
    a = list(range(5))
    return li  # bad practice to mix logic and display...


If I remember how that works right, there is a single empty list that is created and used for all the calls that use the default argument, and then your function modifies that empty list so it is no longer empty, and that modified list is used on future calls. (Not good to use a mutable as a default parameter).

A better solution would be to make the default something like None, and test if at the beginning of the function li is None, and if so set it to an empty list, and that empty list will be in function scope so it goes away and a new one is created on a new call.

> On Jul 12, 2019, at 10:24 AM, Gursimran Maken <gursimran.maken at> wrote:
> Hi,
> Can someone please explain me the reason for below output.
> Program:
> def fun(n,li = []):
>    a = list(range(5))
>    li.append(a)
>    print(li)
> fun(4)
> fun(5,[7,8,9])
> fun(4,[7,8,9])
> fun(5) # reason for output (why am I getting to values in this output.)
> Output:
> [[0, 1, 2, 3, 4]]
> [7, 8, 9, [0, 1, 2, 3, 4]]
> [7, 8, 9, [0, 1, 2, 3, 4]]
> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
> Thank you,
> Gursimran
On 7/12/19 11:39 AM, Alan Gauld via Tutor wrote:
> On 12/07/2019 15:24, Gursimran Maken wrote:
>> Can someone please explain me the reason for below output.
> You've been bitten by one of the most common gotchas in Python :-)
>> def fun(n,li = []):
>>     a = list(range(5))
>>     li.append(a)
>>     print(li)
>> fun(4)
>> fun(5,[7,8,9])
>> fun(4,[7,8,9])
>> fun(5) # reason for output (why am I getting to values in this output.)
> When you define a default value in Python it creates the default value
> at the time you define the function. It then uses that value each time a
> default is needed. in the case of a list that means Python creates an
> empty list and stores it for use as the default.

It may help in seeing why this happens to be aware that a def: statement
is an executable statement like any other, which is executed at the time
it is reached in the file.  Running it generates a function object, a
reference to it being attached to the name of the function. Conceptually,

def foo():

is like

foo = FunctionConstructor()

foo ends up referring to an object that is marked as callable so you can
then later call it as foo().  So it makes some sense that some things
have to happen at object construction time, like setting up the things
that are to be used as defaults.

> When you first call the function with the default Python adds values to
> the defaiult list.
> Second time you call the function using the default Python adds (more)
> values to (the same) default list.

FWIW, Python does the same no matter the type of the default argument,
but it only causes the trap we poor programmers fall into if the type is
one that can be modified ("mutable").  If you have fun(n, a=5) or fun(n,
s="stringystuff") those are unchangeable and we don't get this little

By the way, checker tools (and IDEs/editors with embedded checking
capabilities) will warn about this, which is a hint on using good tools.
 pylint would tell you this:

W0102: Dangerous default value [] as argument (dangerous-default-value)

From cs at  Fri Jul 12 19:59:16 2019
On 11Jul2019 15:40, Mike Barnett <mike_barnett at> wrote:
>If you're passing parameters as a list, then you need a "," at the end of the items.  Otherwise if you have something like a string as the only item, the list will be the string.
>list_with_one_item = ['item one',]

Actually, this isn't true.

This is a one element list, no trailing coma required:


Mike has probably confused this with tuples. Because tuples are 
delineated with parentheses, there is ambiguity between a tuple's 
parentheses and normal "group these terms together" parentheses.


  x = 5 + 4 * (9 + 7)

Here we just have parentheses causing the assignment "9 + 7" to occur 
before the multiplication by 4. And this is also legal:

  x = 5 + 4 * (9)

where the parentheses don't add anything special in terma of behaviour.

Here is a 2 element tuple:

  (9, 7)

How does one write a one element tuple? Like this:


Here the trailing comma is _required_ to syntacticly indicate that we 
intend a 1 element tuple instead of a plain "9 in parentheses") as in 
the earlier assignment statement.

I'm not sure any of this is relevant to Sydney's question though.

Cameron Simpson <cs at>

On Sat, Jul 13, 2019 at 09:59:16AM +1000, Cameron Simpson wrote:

> Mike has probably confused this with tuples. Because tuples are 
> delineated with parentheses, there is ambiguity between a tuple's 
> parentheses and normal "group these terms together" parentheses.

There are no "tuple parentheses". Tuples are created by the *comma*, 
not the parens. The only exception is the empty tuple, since you can't 
have a comma on its own.

    x = ()    # Zero item tuple.
    x = 1,    # Single item tuple.
    x = 1, 2  # Two item tuple.

Any time you have a tuple, you only need to put parens around it to 
dismbiguate it from the surrounding syntax:

    x = 1, 2, (3, 4, 5), 6     # Tuple containing a tuple.
    function(0, 1, (2, 3), 4)  # Tuple as argument to a function.

or just to make it more clear to the human reader.

> Here is a 2 element tuple:
>  (9, 7)
> How does one write a one element tuple? Like this:
>  (9,)

To be clear, in both cases you could drop the parentheses and still get 
a tuple:

    9, 7


provided that wasn't in a context where the comma was interpreted as 
something with higher syntactic precedence, such as a function call:

    func(9, 7)    # Two integer arguments, not one tuple argument.
    func((9, 7))  # One tuple argument.


From oscar.j.benjamin at  Fri Jul 12 22:14:24 2019
On Wed, 10 Jul 2019 at 16:45, Shall, Sydney via Tutor <tutor at> wrote:
> I am a relative beginner.
> My program models cell reproduction. I have written a program that models this and it works.
> Now I want to model a tissue with several types of cells. I did this by simply rerunning the program with different inputs (cell characteristics). But now I want to send and receive signals between the cells in each population. This requires some sort of concurrent processing with halts at appropriate points to pass and receive signals.

You say that this "requires some sort of concurrent processing" but I
think that you are mistaken there. I have a lot of experience in
mathematical modelling and dynamic simulation including some
multi-cell models and I have never personally come across a situation
where any significant benefit could be obtained from using concurrent
processing for different parts of a single simulation. Those
situations do exist but you haven't said anything to make me think
that yours is an exceptional case.

A simple way to do this (not the only way!) is something like:

# Some data structure that stores which cells are sending messages
messages_from_cells = {}

for t in range(num_timesteps):
    # Calculate new state of all cells based only on the old states of
all cells and messages.
    new_cells = {}
    for c in range(len(cells)):
        new_cells[c] = update_cell(cells[c], messages_from_cells)
    # Update all cells synchronously:
    cells = new_cells
    # Update messages based on new cell states:
    for c in range(len(cells)):
        messages_from_cells = update_messages(cells[c], messages_from_cells)

You just need to figure out a data structure (I've suggested a dict
above) that would somehow store what messages are being passed between
which cells. You can update each cell based on the current messages
and then update the messages ready for the next timestep.

Concurrent execution is not needed: I have simulated concurrency by
using two separate loops over the cells. The result is as if each cell
was updated concurrently. Another approach is that at each timestep
you can choose a cell randomly and update that one keeping all the
others constant. It really depends what kind of model you are using.

In a simulation context like this there are two different reasons why
you might conceivably want to use concurrent execution:

1. Your simulations are CPU-bound and are slow and you need to make
them run faster by using more cores.
2. Your simulation needs more memory then an individual computer has
and you need to run it over a cluster of many computers.

Python's multiprocessing module can help with the first problem: it
can theoretically make your simulations run faster. However it is hard
to actually achieve any speedup that way. Most likely there are other
ways to make your code run faster that are easier than using
concurrent execution and can deliver bigger gains. Multiprocessing
used well might make your code 10x faster but I will bet that there
are easier ways to make your code 100x faster.

Multiprocessing makes the second problem worse: it actually uses more
memory on each computer. To solve problem 2 is very hard but can be
done. I don't think either problem applies to you though.

There is a situation where I have used multiprocessing to make
simulations faster. In practice I rarely want to do just one
simulation; I want to do thousands with different parameters or
because they are stochastic and I want to average them. Running these
thousands of simulations can be made faster easily with
multiprocessing because it is an "embarrassingly parallel" problem.
You need to get your simulations working without multiprocessing first
though. This is a much easier way to solve problem 1 (in so far as
using more cores can help).

On Thu, 11 Jul 2019 at 18:52, Chip Wachob <wachobc at> wrote:
> Hello,

Hi Chip,

> So, here's where it gets interesting.  And, I'm presuming that someone out
> there knows exactly what is going on and can help me get past this hurdle.

I don't think anyone knows exactly what's going on...

> My guess, at this point, is that the way a loop reading a .csv file and the
> way a loop reads an array are somehow slightly different and my code isn't
> accounting for this.

There shouldn't be any difference. When you say "array" it looks to me
like a list. Is it a list?

I think it should be as simple as changing:

for row in csvReader:


for row in myArray:

(without making any other changes)

> The other possibility is that I've become code-blind to a simple mistake
> which my brain keeps overlooking...

The only thing I can see is that you've replaced avg+triglevel with
triggervolts. Are you sure they're the same?


From mats at  Sat Jul 13 01:01:10 2019
On 7/12/19 5:53 AM, Shall, Sydney via Tutor wrote:
> Thanks Mike,
> But I am still not clear.
> do I write:
> def f([x,y,z]) ?
> How exactly do one write the function and how does one ensure that each positional argument is accounted for.

The concept of packing will be useful, you can use the * operator to
pack and unpack.  A trivial example to get you started:

>>> a = [1, 2, 3, 4]
>>> print(a)
[1, 2, 3, 4]
>>> print(*a)
1 2 3 4

In the first print we print the list, in the second we print the result
of unpacking the list - you see it's now four numbers rather than one
list of four numbers.

In a function definition you can pack with the * operator:

>>> def f(*args):
...     print(type(args))
...     print(len(args))
...     print(args)
>>> f(1, 2, 3, 4)
<class 'tuple'>
(1, 2, 3, 4)

Here we called the function with four arguments, but it received those
packed into the one argument args, which is a tuple of length 4.

Python folk conventionally name the argument which packs the positional
args that way - *args - but the name "args" has no magic, its
familiarity just aids in recognition.  By packing your positional args
you don't error out if you're not called with the exact number you
expect (or if you want to accept differing numbers of args), and then
you can do what you need to with what you get.

The same packing concept works for dictionaries as well, here the
operator is **.

>>> def fun(a, b, c):
...     print(a, b, c)
>>> d = {'a':2, 'b':4, 'c':10}
>>> fun(**d)
2 4 10

What happened here is in unpacking, the keys in the dict are matched up
with the names of the function parameters, and the values for those keys
are passed as the parameters.  If your dict doesn't match, it fails:

>>> d = {'a':2, 'b':4, 'd':10}
>>> fun(**d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: fun() got an unexpected keyword argument 'd'

Dictionary unpacking looks like:

>>> def fun(**kwargs):
...     print(f"{kwargs}")
>>> fun(a=1, b=2, c=3)
{'a': 1, 'b': 2, 'c': 3}

again the name 'kwargs' is just convention.

There are rules for how to mix regular positional args, unpacked
positional args (or varargs), and keyword ares but don't want to go on
forever here...

From asad.hasan2004 at  Sat Jul 13 02:40:35 2019
Hi All ,

           I written a python script which opens a logfile and searches for
error and based or error found it prints  a solution to screen , instead I
want to print the output of the script in a xml file so that I can add some
tags . How can it be done ?

Right, I meant tuple, not list.

a = ('A string')
b = ('A List Member',)


The output for this is:
A List Member


On 13/07/2019 07:40, Asad wrote:

> want to print the output of the script in a xml file so that I can add some
> tags . How can it be done ?

XML is just text dso you can just use the normal Python string
operations to format an XML string with your data.

However there are XML libraries aplenty that can aid you in
constructing a valid XML tree structure. The standard library
contains an xml package which includes the dom module
which might be a good starting place if you want to
understand the concepts.

The etree module is probably easier to use in the long
term though. Look at the "Building XML documents" section
in the etree documentation.

Alan G
On 7/11/19 8:15 AM, Chip Wachob wrote:

kinda restating what Oscar said, he came to the same conclusions, I'm
just being a lot more wordy:

> So, here's where it gets interesting.  And, I'm presuming that someone out
> there knows exactly what is going on and can help me get past this hurdle.

Well, each snippet has some "magic" variables (from our point of view,
since we don't see where they are set up):

1: if(voltage > (avg + triglevel)

2: if((voltage > triggervolts)

since the value you're comparing voltage to gates when you decide
there's a transition, and thus what gets added to the transition list
you're building, and the list size comes out different, and you claim
the data are the same, then guess where a process of elimination
suggests the difference is coming from?


Stylistic comment, I know this wasn't your question.

>         for row in range (len(TrigWind)):

Don't do this.  It's not a coding error giving you wrong results, but
it's not efficient and makes for harder to read code.  You already have
an iterable in TrigWind.  You then find the size of the iterable and use
that size to generate a range object, which you then iterate over,
producing index values which you use to index into the original
iterable.  Why not skip all that?  Just do

for row in TrigWind:

now row is actually a row, as the variable name suggests, rather than an
index you use to go retrieve the row.

Further, the "row" entries in TrigWind are lists (or tuples, or some
other indexable iterable, we can't tell), which means you end up
indexing into two things - into the "array" to get the row, then into
the row to get the individual values. It's nicer if you unpack the rows
into variables so they can have meaningful names - indeed you already do
that with one of them. Lets you avoid code snips like  "x[7][1]"

Conceptually then, you can take this:

for row in range(len(Trigwind)):
    voltage = float(TrigWind[row][1])
        edgearray.append([float(TrigWind[row][0]), float(TrigWind[row][1])])

and change to this:

for row in TrigWind:
    time, voltage = row  # unpack
        edgearray.append([float)time, float(voltage)])

or even more compactly you can unpack directly at the top:

for time, voltage in TrigWind:
        edgearray.append([float)time, float(voltage)])

Now I left an issue to resolve with conversion - voltage is not
converted before its use in the not-shown comparisons. Does it need to
be? every usage of the values from the individual rows here uses them
immediately after converting them to float.  It's usually better not to
convert all over the place, and since the creation of TrigWind is under
your own control, you should do that at the point the data enters the
program - that is as TrigWind is created; then you just consume data
from it in its intended form.  But if not, just convert voltage before
using, as your original code does. You don't then need to convert
voltage a second time in the list append statements.

for time, voltage in TrigWind:
    voltage = float(voltage)
        edgearray.append([float)time, voltage])

From steve at  Sun Jul 14 04:06:10 2019
On Fri, Jul 12, 2019 at 11:53:16AM +0000, Shall, Sydney via Tutor wrote:
> Thanks Mike,
> But I am still not clear.

Neither is your question.

> do I write:
> def f([x,y,z]) ?
> How exactly do one write the function and how does one ensure that each positional argument is accounted for.

Is your function taking three seperate arguments, or a single argument 
that must be a list of exactly three items?

(1) Three seperate arguments:

def function(a, b, c):
    # Write the body of the function.

That's all you need do, as the interpreter will ensure it is only called 
with three arguments.

(2) One list argument with three items:

def function(alist):
    if isinstance(alist, list):
        if len(alist) < 3:
            raise ValueError("too few arguments in alist")
        elif len(alist) > 3:
            raise ValueError("too many arguments in alist")
            a, b, c = alist  # Unpack the three items from the list.
            # Write the body of the function here.
        raise TypeError('expected a list')

Python will ensure your function is only called with a single argument. 
The rest is up to you.


From dcwebmakers at  Sun Jul 14 11:54:07 2019
Hi Everyone,

I am looking for resources for learning Blockchain development using Python
SDK. I found below tutorial like below but they are too advance:

Also, I am not familiar with OOP, will that be an issue?



I am currently not sure where to start with my query. 


I have a SQLite3 database which currently is being accessed by python code.
I am seeking a simple python module which would support a local web app in
order to update and  insert rows, and run reports  . This web app I am
creating is never going to ever see the public  internet. In fact, it will
be using the localhost address. So I do not require a full blown web server
like apache. If there is a simple python module that can run a web server,
this would be great. If there is a python module that can assist in building
the HTMl, this would be great as well. I am not sure how to link python and
the web HTML page together and haven't found anything that is clear on
addressing my needs.


Note: PPySimpleWeb does not fulfil my requirements. As there is
accessibility issues with this module preventing me using it for web. EG: My
screen reader that I use complains about most of the elements which are
generated. I thought of this and tested it. Thus why I have ruled it out.


The web pages are going to have:

*	Tables containing report outputs
*	Forms with edit and select tags and buttons/links.
*	Some Headers and basic HTML structure elements.


The web page is not going to have any fancy structure or visual effects at
all. I am more than happy to hand-code the HTML/JavaScript. I have had a
quick search and found a range of modules. Most of them indicate they are
for CMS web sites and look far to complex for my needs. 


If someone could point myself to a module and possibly a tutorial for the
module. I would be grateful. Hopefully this is not to of a open question.


From mhysnm1964 at  Mon Jul 15 03:00:10 2019
From: mhysnm1964 at (mhysnm1964 at
Date: Mon, 15 Jul 2019 17:00:10 +1000
Subject: [Tutor] Blockchain Dev with Python SDK
In-Reply-To: <>
References: <>
On 15/07/2019 07:56, mhysnm1964 at wrote:

> like apache. If there is a simple python module that can run a web server,
> this would be great. If there is a python module that can assist in building
> the HTMl, this would be great as well. I am not sure how to link python and
> the web HTML page together and haven't found anything that is clear on
> addressing my needs.

I have 4 topics in my tutorial that address this area.
All you need is in the standard library.

For a basic web server app look at the topic:

Writing Web Server Applications

However if you want more than the very basics then something
like Flask would make life easier (also in my tutorial! :-)

See the topic:
Using web Application Frameworks.

This even covers accessing SQLite data...

Forwarding to list....

Sorry for the late response. I have moved my program to SQLite. AS I
found I was going down the rabbit hole of wasting time. As allan said,
the select statement and other SQL statements are far easier to work
with then writing the code yourself. What I did I learn a lot and
started my road on OOPS.

The first program I programmed on was an Apple IIE using Applesoft
basic. Tried to learn assembly 6802 myself. But never got there. In the
90's I learnt C Clipper 68 and Turbo Pascal. Never became very strong
with these languages as my roles were never programming focused. Late
90's, early 2000 forget, I learnt some basic Visual Basic 6. Again,
didn't do much with it. I learnt the basic of PERL or enough to get by.

I haven't really touched programming for years and there is a lot I have
forgotten. Never had the need to do tree's, link-lists or any complex
data structures. This time, I am trying to learn Python beyond what I
used to know.

I have used just about every type of PC since the early 80's. Even used
DEC VAX, IBM 3270's as well. Yes, been around for a while now. ????

On 7/15/19 12:35 PM, Chip Wachob wrote:
> Oscar and Mats,
> Thank you for your comments and taking time to look at the snips.
> Yes, I think I had commented that the avg+trigger was = triggervolts in
> my original post.
> I did find that there was an intermediary process which I had forgotten
> to comment out that was adversely affecting the data in one instance and
> not the other.? So it WAS a case of becoming code blind.? But I didn't
> give y'all all of the code so you would not have known that.? My apologies.
> Mats, I'd like to get a better handle on your suggestions about
> improving the code.? Turns out, I've got another couple of 4GByte files
> to sift through, and they are less 'friendly' when it comes to
> determining the start and stop points.? So, I have to basically redo
> about half of my code and I'd like to improve on my Python coding skills.
> Unfortunately, I have gaps in my coding time, and I end up forgetting
> the details of a particular language, especially a new language to me,
> Python.
> I'll admit that my 'C' background keeps me thinking as these data sets
> as arrays.. in fact they are lists, eg:
> [
> [t0, v0],
> [t1, v1],
> [t2, v2],
> .
> .
> .
> [tn, vn]
> ]
> Time and volts are floats and need to be converted from the csv file
> entries.
> I'm not sure that follow the "unpack" assignment in your example of:
> for row in TrigWind:
> ? ? time, voltage = row? # unpack
> I think I 'see' what is happening, but when I read up on unpacking, I
> see that referring to using the * and ** when passing arguments to a
> function...

That's a different aspect of unpacking.  This one is sequnce unpacking,
sometimes called tuple (or seqeucence) assignment.  In the official
Python docs it is described in the latter part of this section:

> I tried it anyhow, with this being an example of my source data:
> "Record Length",2000002,"Points",-0.005640001706,1.6363
> "Sample Interval",5e-09,s,-0.005639996706,1.65291
> "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> "Trigger Time",0.341197,s,-0.005639986706,1.60309
> ,,,-0.005639981706,1.60309
> "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> ,,,-0.005639971706,1.65291
> ,,,-0.005639966706,1.65291
> ,,,-0.005639961706,1.6363
> .
> .
> .
> Note that I want the items in the third and fourth column of the csv
> file for my time and voltage.
> When I tried to use the unpack, they all came over as strings.? I can't
> seem to convert them selectively..

That's what the csv module does, unless you tell it not to. Maybe this
will help:

There's an option to convert unquoted values to floats, and leave quoted
values alone as strings, which would seem to match your data above quite

> Desc1, Val1, Desc2, TimeVal, VoltVal = row
> TimeVal and VoltVal return type of str, which makes sense.
> Must I go through yet another iteration of scanning TimeVal and VoltVal
> and converting them using float() by saving them to another array?
> Thanks for your patience.
> Chip
> On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at
> <mailto:mats at>> wrote:
>     On 7/11/19 8:15 AM, Chip Wachob wrote:
>     kinda restating what Oscar said, he came to the same conclusions, I'm
>     just being a lot more wordy:
>     > So, here's where it gets interesting.? And, I'm presuming that
>     someone out
>     > there knows exactly what is going on and can help me get past this
>     hurdle.
>     Well, each snippet has some "magic" variables (from our point of view,
>     since we don't see where they are set up):
>     1: if(voltage > (avg + triglevel)
>     2: if((voltage > triggervolts)
>     since the value you're comparing voltage to gates when you decide
>     there's a transition, and thus what gets added to the transition list
>     you're building, and the list size comes out different, and you claim
>     the data are the same, then guess where a process of elimination
>     suggests the difference is coming from?
>     ===
>     Stylistic comment, I know this wasn't your question.
>     >? ? ? ? ?for row in range (len(TrigWind)):
>     Don't do this.? It's not a coding error giving you wrong results, but
>     it's not efficient and makes for harder to read code.? You already have
>     an iterable in TrigWind.? You then find the size of the iterable and use
>     that size to generate a range object, which you then iterate over,
>     producing index values which you use to index into the original
>     iterable.? Why not skip all that?? Just do
>     for row in TrigWind:
>     now row is actually a row, as the variable name suggests, rather than an
>     index you use to go retrieve the row.
>     Further, the "row" entries in TrigWind are lists (or tuples, or some
>     other indexable iterable, we can't tell), which means you end up
>     indexing into two things - into the "array" to get the row, then into
>     the row to get the individual values. It's nicer if you unpack the rows
>     into variables so they can have meaningful names - indeed you already do
>     that with one of them. Lets you avoid code snips like? "x[7][1]"
>     Conceptually then, you can take this:
>     for row in range(len(Trigwind)):
>     ? ? voltage = float(TrigWind[row][1])
>     ? ? ...
>     ? ? ? ? edgearray.append([float(TrigWind[row][0]),
>     float(TrigWind[row][1])])
>     ? ? ...
>     and change to this:
>     for row in TrigWind:
>     ? ? time, voltage = row? # unpack
>     ? ? ....
>     ? ? ? ? edgearray.append([float)time, float(voltage)])
>     or even more compactly you can unpack directly at the top:
>     for time, voltage in TrigWind:
>     ? ? ...
>     ? ? ? ? edgearray.append([float)time, float(voltage)])
>     ? ? ...
>     Now I left an issue to resolve with conversion - voltage is not
>     converted before its use in the not-shown comparisons. Does it need to
>     be? every usage of the values from the individual rows here uses them
>     immediately after converting them to float.? It's usually better not to
>     convert all over the place, and since the creation of TrigWind is under
>     your own control, you should do that at the point the data enters the
>     program - that is as TrigWind is created; then you just consume data
>     from it in its intended form.? But if not, just convert voltage before
>     using, as your original code does. You don't then need to convert
>     voltage a second time in the list append statements.
>     for time, voltage in TrigWind:
>     ? ? voltage = float(voltage)
>     ? ? ...
>     ? ? ? ? edgearray.append([float)time, voltage])
>     ? ? ...
From wachobc at  Mon Jul 15 14:35:38 2019
From: wachobc at (Chip Wachob)
Date: Mon, 15 Jul 2019 14:35:38 -0400
Subject: [Tutor] Reading .csv data vs. reading an array
In-Reply-To: <>
References: <>
Message-ID: <>

Oscar and Mats,

Thank you for your comments and taking time to look at the snips.

Yes, I think I had commented that the avg+trigger was = triggervolts in my
original post.

I did find that there was an intermediary process which I had forgotten to
comment out that was adversely affecting the data in one instance and not
the other.  So it WAS a case of becoming code blind.  But I didn't give
y'all all of the code so you would not have known that.  My apologies.

Mats, I'd like to get a better handle on your suggestions about improving
the code.  Turns out, I've got another couple of 4GByte files to sift
through, and they are less 'friendly' when it comes to determining the
start and stop points.  So, I have to basically redo about half of my code
and I'd like to improve on my Python coding skills.

Unfortunately, I have gaps in my coding time, and I end up forgetting the
details of a particular language, especially a new language to me, Python.

I'll admit that my 'C' background keeps me thinking as these data sets as
arrays.. in fact they are lists, eg:

[t0, v0],
[t1, v1],
[t2, v2],
[tn, vn]

Time and volts are floats and need to be converted from the csv file

I'm not sure that follow the "unpack" assignment in your example of:

for row in TrigWind:
    time, voltage = row  # unpack

I think I 'see' what is happening, but when I read up on unpacking, I see
that referring to using the * and ** when passing arguments to a function...

I tried it anyhow, with this being an example of my source data:

"Record Length",2000002,"Points",-0.005640001706,1.6363
"Sample Interval",5e-09,s,-0.005639996706,1.65291
"Trigger Point",1128000,"Samples",-0.005639991706,1.65291
"Trigger Time",0.341197,s,-0.005639986706,1.60309
"Horizontal Offset",-0.00564,s,-0.005639976706,1.6363

Note that I want the items in the third and fourth column of the csv file
for my time and voltage.

When I tried to use the unpack, they all came over as strings.  I can't
seem to convert them selectively..

Desc1, Val1, Desc2, TimeVal, VoltVal = row

TimeVal and VoltVal return type of str, which makes sense.

Must I go through yet another iteration of scanning TimeVal and VoltVal and
converting them using float() by saving them to another array?

Thanks for your patience.


On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at> wrote:

> On 7/11/19 8:15 AM, Chip Wachob wrote:
> kinda restating what Oscar said, he came to the same conclusions, I'm
> just being a lot more wordy:
> > So, here's where it gets interesting.  And, I'm presuming that someone
> out
> > there knows exactly what is going on and can help me get past this
> hurdle.
> Well, each snippet has some "magic" variables (from our point of view,
> since we don't see where they are set up):
> 1: if(voltage > (avg + triglevel)
> 2: if((voltage > triggervolts)
> since the value you're comparing voltage to gates when you decide
> there's a transition, and thus what gets added to the transition list
> you're building, and the list size comes out different, and you claim
> the data are the same, then guess where a process of elimination
> suggests the difference is coming from?
> ===
> Stylistic comment, I know this wasn't your question.
> >         for row in range (len(TrigWind)):
> Don't do this.  It's not a coding error giving you wrong results, but
> it's not efficient and makes for harder to read code.  You already have
> an iterable in TrigWind.  You then find the size of the iterable and use
> that size to generate a range object, which you then iterate over,
> producing index values which you use to index into the original
> iterable.  Why not skip all that?  Just do
> for row in TrigWind:
> now row is actually a row, as the variable name suggests, rather than an
> index you use to go retrieve the row.
> Further, the "row" entries in TrigWind are lists (or tuples, or some
> other indexable iterable, we can't tell), which means you end up
> indexing into two things - into the "array" to get the row, then into
> the row to get the individual values. It's nicer if you unpack the rows
> into variables so they can have meaningful names - indeed you already do
> that with one of them. Lets you avoid code snips like  "x[7][1]"
> Conceptually then, you can take this:
> for row in range(len(Trigwind)):
>     voltage = float(TrigWind[row][1])
>     ...
>         edgearray.append([float(TrigWind[row][0]),
> float(TrigWind[row][1])])
>     ...
> and change to this:
> for row in TrigWind:
>     time, voltage = row  # unpack
>     ....
>         edgearray.append([float)time, float(voltage)])
> or even more compactly you can unpack directly at the top:
> for time, voltage in TrigWind:
>     ...
>         edgearray.append([float)time, float(voltage)])
>     ...
> Now I left an issue to resolve with conversion - voltage is not
> converted before its use in the not-shown comparisons. Does it need to
> be? every usage of the values from the individual rows here uses them
> immediately after converting them to float.  It's usually better not to
> convert all over the place, and since the creation of TrigWind is under
> your own control, you should do that at the point the data enters the
> program - that is as TrigWind is created; then you just consume data
> from it in its intended form.  But if not, just convert voltage before
> using, as your original code does. You don't then need to convert
> voltage a second time in the list append statements.
> for time, voltage in TrigWind:
>     voltage = float(voltage)
>     ...
>         edgearray.append([float)time, voltage])
>     ...
From mats at  Mon Jul 15 16:28:16 2019
From: mats at (Mats Wichmann)
Date: Mon, 15 Jul 2019 14:28:16 -0600
Subject: [Tutor] Reading .csv data vs. reading an array
In-Reply-To: <>
References: <>
Message-ID: <>

On 7/15/19 1:59 PM, Chip Wachob wrote:
> Mats,
> Thank you!
> So I included the QUOTE_NONNUMERIC to my csv.reader() call and it almost
> worked.
> Now, how wonderful that the scope's csv file simply wrote an s for
> seconds and didn't include quotes.? Now Python tells me it can't create
> a float of s.? Of course I can't edit a 4G file in any editor that I
> have installed, so I have to work with the fact that there is a bit of
> text in there that isn't quoted.

yeah, the chips don't always fall right...

not sure what you're running, there are "stream editors" that can just
work on data as it passes through, even on Windows you can install
environments (cygwin, mingw) where the old "sed" command exists. Of
course Python can do that too, by working line-at-a-time, explicitly by
calling readlines() or implicitly by looping over the file handle. The
latter looks something like this;

with open("/path/to/datafile", "r") as f:
    for line in f:
        if REDFLAGTEXT in line:  # skip these
        do-something-with line

I don't want to be leading you down further ratholes, just trying to
answer the questions as they come up....

> Which leads me to another question related to working with these csv
> files.?
> Is there a way for me to tell the reader to skip the first 'n' rows??
> Or, for that matter, skip rows in the middle of the file? 

it's usually easier to skip based on a pattern, if you can identify a
pattern, but you can certainly also add a counter used to skip.  If 'n'
is always the same!

> A this point, I think it may be less painful for me to just skip those
> few lines that have text.? I don't believe there will be any loss of
> accuracy.
> But, since row is not really an index, how does one conditionally skip a
> given set of row entries?

From jjhartley at  Mon Jul 15 17:25:26 2019
From: jjhartley at (James Hartley)
Date: Mon, 15 Jul 2019 16:25:26 -0500
Subject: [Tutor] Lengthy copyright notices?
Message-ID: <>

help(module_name) will place any text in the *first* module-level docstring
into the description section of the help page in Python 3.4.5.  Subsequent
docstrings found at module level are ignored.

I have been using this factoid for placement of a copyright & licensing
notice.  By placing a rather lengthy copyright & license in the code in a
the second module-level docstring, it is prominent within the code, but not
cluttering up help() output.

Two questions.  Is there a more standardized way of including long license
descriptions in code, & is it documented that any other module-level
docstring will be ignored in help() output?



From mats at  Mon Jul 15 18:34:19 2019
From: mats at (Mats Wichmann)
Date: Mon, 15 Jul 2019 16:34:19 -0600
Subject: [Tutor] Lengthy copyright notices?
In-Reply-To: <>
References: <>
Message-ID: <>

On 7/15/19 3:25 PM, James Hartley wrote:
> help(module_name) will place any text in the *first* module-level docstring
> into the description section of the help page in Python 3.4.5.  Subsequent
> docstrings found at module level are ignored.
> I have been using this factoid for placement of a copyright & licensing
> notice.  By placing a rather lengthy copyright & license in the code in a
> the second module-level docstring, it is prominent within the code, but not
> cluttering up help() output.
> Two questions.  Is there a more standardized way of including long license
> descriptions in code, & is it documented that any other module-level
> docstring will be ignored in help() output?

Rule #1: it's all opinion in the end...

The common practice is that licence/copyright text is included as a
comment in the code, not in a docstring.

It's only a docstring if it's the first thing in its block, and that is
assigned to the object's __doc__ attribute, and that's what help fishes
out, so yes, that behavior is documented.

Your second dosctring isn't technically a docstring, it's just a string,
which isn't assigned to anything, so it just ends up getting lost as a
runtime thing (no references). Is that what you want?

You're on the right track in your second paragraph - "not cluttering up
help output".  There is _some_ support for this opinion, namely PEP 257

The 4th, 5th and 6th paragraphs of that section all suggest what should
be in the docstrings, none of them mentions copyright/license, all are
oriented to making the help text usable by users.

I'm sure someone else will have an opinion :)

From PyTutor at  Mon Jul 15 19:16:42 2019
From: PyTutor at (David L Neil)
Date: Tue, 16 Jul 2019 11:16:42 +1200
Subject: [Tutor] Lengthy copyright notices?
In-Reply-To: <>
References: <>
Message-ID: <>

On 16/07/19 10:34 AM, Mats Wichmann wrote:
> On 7/15/19 3:25 PM, James Hartley wrote:
>> help(module_name) will place any text in the *first* module-level docstring
>> into the description section of the help page in Python 3.4.5.  Subsequent
>> docstrings found at module level are ignored.
>> I have been using this factoid for placement of a copyright & licensing
>> notice.  By placing a rather lengthy copyright & license in the code in a
>> the second module-level docstring, it is prominent within the code, but not
>> cluttering up help() output.
>> Two questions.  Is there a more standardized way of including long license
>> descriptions in code, & is it documented that any other module-level
>> docstring will be ignored in help() output?
> Rule #1: it's all opinion in the end...
> The common practice is that licence/copyright text is included as a
> comment in the code, not in a docstring.

Packaging and project templating offer/recommend a separate file for 
'legal-stuff', eg licensing.

That said, I also include a one-liner at the top of every module, adding 
__license__ to several other similar labels/definitions.

One thing is for-sure: when calling for help or reminding myself of 
method signatures, I'd be greatly irritated by having to wade-through a 
flood of irrelevance.

On the other hand, if 'you' use my work...

Whether either/both of these pass for pythonic, I can't say.

Regards =dn

From alan.gauld at  Mon Jul 15 20:42:54 2019
From: alan.gauld at (Alan Gauld)
Date: Tue, 16 Jul 2019 01:42:54 +0100
Subject: [Tutor] Reading .csv data vs. reading an array
In-Reply-To: <>
References: <>
Message-ID: <qgj6ie$7mtb$>

On 15/07/2019 21:28, Mats Wichmann wrote:

>> a float of s.? Of course I can't edit a 4G file in any editor that I
>> have installed, so I have to work with the fact that there is a bit of
>> text in there that isn't quoted.

Try sed, it's on most Unix like OS.
It doesn't read the entire file into memory so file size is not usually
an issue. I've never tried 4G but I have gone over 1GB before with no

If you have never used sed before its batch oriented so you need to
practice your commands in advance on something like vim or ex then
translate them to a file. But it sounds like it would be a worthwhile
automation step in your workflow. Write once, use often...

> course Python can do that too, by working line-at-a-time, explicitly by
> calling readlines() or implicitly by looping over the file handle. The
> latter looks something like this;
> with open("/path/to/datafile", "r") as f:
>     for line in f:
>         if REDFLAGTEXT in line:  # skip these
>             continue
>         do-something-with line

All true, but sed - once you get used to it! - is easier IMHO
and usually faster than Python - it's written in C...

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

From wachobc at  Mon Jul 15 15:59:01 2019
From: wachobc at (Chip Wachob)
Date: Mon, 15 Jul 2019 15:59:01 -0400
Subject: [Tutor] Reading .csv data vs. reading an array
In-Reply-To: <>
References: <>
Message-ID: <>


Thank you!

So I included the QUOTE_NONNUMERIC to my csv.reader() call and it almost

Now, how wonderful that the scope's csv file simply wrote an s for seconds
and didn't include quotes.  Now Python tells me it can't create a float of
s.  Of course I can't edit a 4G file in any editor that I have installed,
so I have to work with the fact that there is a bit of text in there that
isn't quoted.

Which leads me to another question related to working with these csv

Is there a way for me to tell the reader to skip the first 'n' rows?  Or,
for that matter, skip rows in the middle of the file?

A this point, I think it may be less painful for me to just skip those few
lines that have text.  I don't believe there will be any loss of accuracy.

But, since row is not really an index, how does one conditionally skip a
given set of row entries?

I started following the link to iterables but quickly got lost in the


On Mon, Jul 15, 2019 at 3:03 PM Mats Wichmann <mats at> wrote:

> On 7/15/19 12:35 PM, Chip Wachob wrote:
> > Oscar and Mats,
> >
> > Thank you for your comments and taking time to look at the snips.
> >
> > Yes, I think I had commented that the avg+trigger was = triggervolts in
> > my original post.
> >
> > I did find that there was an intermediary process which I had forgotten
> > to comment out that was adversely affecting the data in one instance and
> > not the other.  So it WAS a case of becoming code blind.  But I didn't
> > give y'all all of the code so you would not have known that.  My
> apologies.
> >
> > Mats, I'd like to get a better handle on your suggestions about
> > improving the code.  Turns out, I've got another couple of 4GByte files
> > to sift through, and they are less 'friendly' when it comes to
> > determining the start and stop points.  So, I have to basically redo
> > about half of my code and I'd like to improve on my Python coding skills.
> >
> > Unfortunately, I have gaps in my coding time, and I end up forgetting
> > the details of a particular language, especially a new language to me,
> > Python.
> >
> > I'll admit that my 'C' background keeps me thinking as these data sets
> > as arrays.. in fact they are lists, eg:
> >
> > [
> > [t0, v0],
> > [t1, v1],
> > [t2, v2],
> > .
> > .
> > .
> > [tn, vn]
> > ]
> >
> > Time and volts are floats and need to be converted from the csv file
> > entries.
> >
> > I'm not sure that follow the "unpack" assignment in your example of:
> >
> > for row in TrigWind:
> >     time, voltage = row  # unpack
> >
> > I think I 'see' what is happening, but when I read up on unpacking, I
> > see that referring to using the * and ** when passing arguments to a
> > function...
> That's a different aspect of unpacking.  This one is sequnce unpacking,
> sometimes called tuple (or seqeucence) assignment.  In the official
> Python docs it is described in the latter part of this section:
> > I tried it anyhow, with this being an example of my source data:
> >
> > "Record Length",2000002,"Points",-0.005640001706,1.6363
> > "Sample Interval",5e-09,s,-0.005639996706,1.65291
> > "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> > "Trigger Time",0.341197,s,-0.005639986706,1.60309
> > ,,,-0.005639981706,1.60309
> > "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> > ,,,-0.005639971706,1.65291
> > ,,,-0.005639966706,1.65291
> > ,,,-0.005639961706,1.6363
> > .
> > .
> > .
> >
> > Note that I want the items in the third and fourth column of the csv
> > file for my time and voltage.
> >
> > When I tried to use the unpack, they all came over as strings.  I can't
> > seem to convert them selectively..
> That's what the csv module does, unless you tell it not to. Maybe this
> will help:
> There's an option to convert unquoted values to floats, and leave quoted
> values alone as strings, which would seem to match your data above quite
> well.
> > Desc1, Val1, Desc2, TimeVal, VoltVal = row
> >
> > TimeVal and VoltVal return type of str, which makes sense.
> >
> > Must I go through yet another iteration of scanning TimeVal and VoltVal
> > and converting them using float() by saving them to another array?
> >
> >
> > Thanks for your patience.
> >
> > Chip
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at
> > <mailto:mats at>> wrote:
> >     1: if(voltage > (avg + triglevel)
> >
> >     2: if((voltage > triggervolts)
> >
> >     since the value you're comparing voltage to gates when you decide
> >     there's a transition, and thus what gets added to the transition list
> >     you're building, and the list size comes out different, and you claim
> >     the data are the same, then guess where a process of elimination
> >     suggests the difference is coming from?
> >
> >
> >     for row in TrigWind:
> >
> >     now row is actually a row, as the variable name suggests, rather
> than an
> >     index you use to go retrieve the row.
> >
> >     Further, the "row" entries in TrigWind are lists (or tuples, or some
> >     other indexable iterable, we can't tell), which means you end up
> >     indexing into two things - into the "array" to get the row, then into
> >     the row to get the individual values. It's nicer if you unpack the
> rows
> >     into variables so they can have meaningful names - indeed you
> already do
> >     that with one of them. Lets you avoid code snips like  "x[7][1]"
> >
> >     Conceptually then, you can take this:
> >
> >     for row in range(len(Trigwind)):
> >         voltage = float(TrigWind[row][1])
> >         ...
> >             edgearray.append([float(TrigWind[row][0]),
> >     float(TrigWind[row][1])])
> >         ...
> >
> >     and change to this:
> >
> >     for row in TrigWind:
> >         time, voltage = row  # unpack
> >         ....
> >             edgearray.append([float)time, float(voltage)])
> >
> >     or even more compactly you can unpack directly at the top:
> >
> >     for time, voltage in TrigWind:
> >         ...
> >             edgearray.append([float)time, float(voltage)])
> >         ...
> >
> >     Now I left an issue to resolve with conversion - voltage is not
> >     converted before its use in the not-shown comparisons. Does it need
> to
> >     be? every usage of the values from the individual rows here uses them
> >     immediately after converting them to float.  It's usually better not
> to
> >     convert all over the place, and since the creation of TrigWind is
> under
> >     your own control, you should do that at the point the data enters the
> >     program - that is as TrigWind is created; then you just consume data
> >     from it in its intended form.  But if not, just convert voltage
> before
> >     using, as your original code does. You don't then need to convert
> >     voltage a second time in the list append statements.
> >
> >     for time, voltage in TrigWind:
> >         voltage = float(voltage)
> >         ...
> >             edgearray.append([float)time, voltage])
> >         ...
> >
> >
On 15/07/2019 23:34, Mats Wichmann wrote:

> Rule #1: it's all opinion in the end...


> The common practice is that licence/copyright text is included as a
> comment in the code, not in a docstring.

I'd second that opinion. I don't like losing the copyright stuff
to a separate file - too easy to get lost. But I certainly don't
want it in my help() output either.

A comment solves both for the downside of some initial scrolling
when reading or editing the file

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

Message-ID: <>


Hopefully you recognize my email address as someone you have given 
advice concerning Python.

Over the last month or so I have received at least 3 emails supposedly 
coming from you that I am sure you did not send.

The from line is:  Mats Wichmann <drandall at>

The body is: On Monday, July 15, 2019 10:36 AM, Mats Wichmann wrote:
Hope you are well. Just wanted to share something with you

I just wanted you to know that it seems someone is trying to impersonate 

Regards,  Jim

On Tue, 16 Jul 2019 at 01:44, Alan Gauld via Tutor <tutor at> wrote:
> On 15/07/2019 21:28, Mats Wichmann wrote:
> > course Python can do that too, by working line-at-a-time, explicitly by
> > calling readlines() or implicitly by looping over the file handle. The
> > latter looks something like this;
> >
> > with open("/path/to/datafile", "r") as f:
> >     for line in f:
> >         if REDFLAGTEXT in line:  # skip these
> >             continue
> >         do-something-with line
> All true, but sed - once you get used to it! - is easier IMHO
> and usually faster than Python - it's written in C...

I always think I'll like sed but whenever I actually try to use it I
just can't get the syntax together. I do use vim and can do
find/replace there. It seems like every different utility grep, egrep,
sed, vim etc has subtly different escaping rules or maybe I just
haven't got my head around it.

When writing this pull request:
I spent something like 15 minutes trying to get sed to work before
giving up. It took me 2 minutes to write and run the Python script
that I ended up using.


On 16/07/2019 09:59, Oscar Benjamin wrote:

>> All true, but sed - once you get used to it! - is easier IMHO
>> and usually faster than Python - it's written in C...
> I always think I'll like sed but whenever I actually try to use it I
> just can't get the syntax together. 

Id been using awk and sed for 15 years before I discovered
Python (and perl) so they are kind of second nature.

If you can't settle with sed try awk, it's much simpler to
use and almost as powerful but not as fast. I think awk is
one of the most neglected of the *nix tools now that
scripting languages like perl/python/ruby exist. But for
line by line file processing it is superb.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

Please what are references in Python development?

> On Jul 16, 2019, at 04:31, Alan Gauld via Tutor <tutor at> wrote:
> On 16/07/2019 09:59, Oscar Benjamin wrote:
>>> All true, but sed - once you get used to it! - is easier IMHO
>>> and usually faster than Python - it's written in C...
>> I always think I'll like sed but whenever I actually try to use it I
>> just can't get the syntax together. 
> Id been using awk and sed for 15 years before I discovered
> Python (and perl) so they are kind of second nature.
> If you can't settle with sed try awk, it's much simpler to
> use and almost as powerful but not as fast. I think awk is
> one of the most neglected of the *nix tools now that
> scripting languages like perl/python/ruby exist. But for
> line by line file processing it is superb.

The first O?Reilly book I ever purchased was ?Sed & Awk?, and has been one of the most invaluable over the last 20 years.  While they are not the simplest tools to master, they are worth the effort; especially when you are trying to do inline ?one-liners? to massage data or process large files.  That doesn?t mean it?s a requirement to know them, but it does get easier with practice.  That said, if making a little python one-off filter to do what you need is faster (to write) and works (well enough), it comes down to what your time is worth.

David Rock
david at

On 7/15/19 9:36 PM, Jim wrote:
> Mats,
> Hopefully you recognize my email address as someone you have given 
> advice concerning Python.
> Over the last month or so I have received at least 3 emails supposedly 
> coming from you that I am sure you did not send.
> The from line is:? Mats Wichmann <drandall at>
> The body is: On Monday, July 15, 2019 10:36 AM, Mats Wichmann wrote:
> Hope you are well. Just wanted to share something with you 
> I just wanted you to know that it seems someone is trying to impersonate 
> you.
> Regards,? Jim

My apologies. I intended that this message go only to Mats.

Regards, Jim

On Tue, Jul 16, 2019 at 12:08:10PM +0000, AHIA Samuel wrote:

> Please what are references in Python development?

x = 19
y = x

The name "x" is a reference to the int 19; the name y is a reference to 
the same int.

x = "Hello world"

Now the name "x" is a reference to the string "Hello world". y remains a 
reference to the int 19.

x = None

Now the name "x" is a reference to the None object.

x = [100, 200, 300]

Now the name "x" is a reference to a list with three items:

- The first item of x, x[0], is a reference to the int 100.
- The second item of x, x[1], is a reference to the int 200.
- The third item of x, x[2], is a reference to the int 300.

y = x

Now y is no longer a reference to the int 19 as before, but is a 
reference to the same list that x is a reference to. There are now two 
references to the list object: x and y.

(If you are a C programmer, you can think of x and y both being pointers 
to the same list. This is not completely correct, but it isn't far 

Since x and y are references to the same list, we can equally say:

- The first item of y, y[0], is a reference to the int 100.
- The second item of y, y[1], is a reference to the int 200.
- The third item of y, y[2], is a reference to the int 300.


Now the name x is still a reference to the same list as before, except 
that we have added a new item to the end of the list:

- The fourth item of x, x[3], is a reference to the None object.
- The fourth item of y, y[3], is a reference to the None object.

Since both x and y are references to the same list, any change to the x 
list is a change to the y list (since they are the same).

class Parrot:
    def __init__(self, colour="blue"):
        self.colour = colour
    def speak(self):
         print("Polly wants a cracker!")

Now the name "Parrot" is a reference to the class "Parrot".

x = Parrot()

Now x is a reference to a Parrot instance. y remains a reference to the 

x.colour is a reference to the string "blue" (by default).

x.speak is a reference to the "speak" method of Parrot objects.

Does this help?


On 7/16/19 2:33 PM, Steven D'Aprano wrote:

> x = Parrot()
> Now x is a reference to a Parrot instance. y remains a reference to the 
> list.
> x.colour is a reference to the string "blue" (by default).
> x.speak is a reference to the "speak" method of Parrot objects.
> Does this help?

Let's add one more little cute one for good measure:

>>> def foo():
...     print("This function does Foo")
>>> foo()
This function does Foo
>>> # we created a function object, and foo is a reference to it
>>> x = foo
>>> # x should be a reference to the same object
>>> x()
This function does Foo
>>> x is foo
>>> def foo():
...     print("This function no longer Foos")
>>> # we created a new function object, and foo is now a reference to it
>>> foo()
This function no longer Foos
>>> x()
This function does Foo
>>> # x is still a reference to the original function object
>>> x is foo

Chip Wachob wrote:

> I tried it anyhow, with this being an example of my source data:
> "Record Length",2000002,"Points",-0.005640001706,1.6363
> "Sample Interval",5e-09,s,-0.005639996706,1.65291
> "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> "Trigger Time",0.341197,s,-0.005639986706,1.60309
> ,,,-0.005639981706,1.60309
> "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> ,,,-0.005639971706,1.65291
> ,,,-0.005639966706,1.65291
> ,,,-0.005639961706,1.6363
> .
> .
> .
> Note that I want the items in the third and fourth column of the csv file
> for my time and voltage.
> When I tried to use the unpack, they all came over as strings.  I can't
> seem to convert them selectively..

Try wrapping the reader like this:

 $ cat
import csv
import io

data = """\
"Record Length",2000002,"Points",-0.005640001706,1.6363
"Sample Interval",5e-09,s,-0.005639996706,1.65291
"Trigger Point",1128000,"Samples",-0.005639991706,1.65291
"Trigger Time",0.341197,s,-0.005639986706,1.60309
"Horizontal Offset",-0.00564,s,-0.005639976706,1.6363

def maybe_float(s):
        return float(s)
    except ValueError:
        return s

def myreader(*args, **kw):
    reader = csv.reader(*args, **kw)
    for row in reader:
        yield [maybe_float(field) for field in row]

for row in myreader(io.StringIO(data)):

$ python3 
['Record Length', 2000002.0, 'Points', -0.005640001706, 1.6363]
['Sample Interval', 5e-09, 's', -0.005639996706, 1.65291]
['Trigger Point', 1128000.0, 'Samples', -0.005639991706, 1.65291]
['Trigger Time', 0.341197, 's', -0.005639986706, 1.60309]
['', '', '', -0.005639981706, 1.60309]
['Horizontal Offset', -0.00564, 's', -0.005639976706, 1.6363]
['', '', '', -0.005639971706, 1.65291]
['', '', '', -0.005639966706, 1.65291]
['', '', '', -0.005639961706, 1.6363]

If you find that performance suffers more than you are willing to accept 
here's an alternative implementation of maybe_float() that may be faster for 
some inputs:

def maybe_float(s):
    if s and s[:1] in " 0123456789+-":
            return float(s)
        except ValueError:
            return s
    return s

I ask this having more C++ knowledge than sense.

There is an adage in the halls of everything Stroustrup that one needs to
think about how resource allocation will be unwound if an exception is
thrown.  This gets watered down to the mantra "Don't throw exceptions from
within constructors."  Does this carry over to Python?  I'm trying to
develop a Pythonistic mindset as opposed to carrying over old baggage...



On 7/16/19 3:29 PM, James Hartley wrote:
> I ask this having more C++ knowledge than sense.
> There is an adage in the halls of everything Stroustrup that one needs to
> think about how resource allocation will be unwound if an exception is
> thrown.  This gets watered down to the mantra "Don't throw exceptions from
> within constructors."  Does this carry over to Python?  I'm trying to
> develop a Pythonistic mindset as opposed to carrying over old baggage...

If you mean __init__, that's not a constructor, so you should set your
mind at rest :)   It's more properly an "initializer", the instance has
already been constructed when it's called.

Hi all,

I have a problem trying to match items in a dict and pandas series in

I have a dict ( called city_dict )of cities and city_id's ; for each city (
which is a key in the dict ), a unique city_id is a value in that dict.

So for example, city_dict = { New York : 1001, LA : 1002, Chicago : 1003 }.
New York is a key, 1001 is a value.

Now I have a panda Series called dfCities. In this series is a bunch of
cities, including the cities in city_dict.

My goal is to replace the cities in dfCities with the city_id's in a brand
new csv file. So if dfCities has New York in it, I want to replace it with
it's value in the dictionary, so 1001.

Approaches I've tried - checking to see if the keys  match the cities in
dfCities in a  'if in' statement ( such as "if city_dict.keys() in
dfSeries"), and then doing a straight replace ( can't do that since series
are ambiguous in truth values).  Therefore I tried using .any() for Pandas
series (since .all() would strictly want all values in dfCities to match,
and all values don't match )

Afterwards, tried to directly match the series with keys and the clarified
truth value series, but dict_keys are unhashable, so I had to convert the
keys to str and see if I could compare strings ( with a stringified
dfCities )

Then I realized that even if I can get a if statement to start checking
    (if dfCities.str.contains(keyss).any(): ) (keyss being the stringified
version of the keys for city_dict ), I don't know how to build a approach
to cross check the values of city_dict with the cities in dfCities ( I have
a vague notion that I should check if the keys of city_dict match with
dfCities, and then replace the cities in dfCities with the values of
city_dict in a new csv file output. However, I don't know how to replace
data in a Series with vaues of a dict ).

So I would like to ask the community what approach I can take to build to
that piece of the puzzle. I feel I have most of the solution, but I'm
missing something.

Thanks for reading and I appreciate the help.

Peter Otten, while responding to one of my questions in the past,
mentioned something in passing that apparently has been mulling around
in the back of my head.  I don't recall his exact words, but he
essentially said that I should be testing the public interface to my
classes, but not the methods only used internally by the class and not
meant to be publicly accessible.  Is this generally how I should be
viewing testing?  Would someone be willing to expand at some length on
this topic?


On 16/07/2019 22:56, Mats Wichmann wrote:

>> thrown.  This gets watered down to the mantra "Don't throw exceptions from
>> within constructors."  Does this carry over to Python?  

> If you mean __init__, that's not a constructor, so you should set your
> mind at rest :)   It's more properly an "initializer", the instance has
> already been constructed when it's called.

FWIW The true constructor is __new__() and its quite rarely overridden
by application programmers. But if you do, it's not common that you'd do
anything that would merit an exception. __new__ pretty much just sets
up the structure of the object ready for initialisation by __init__.

Incidentally, this two stage construction/initialisation is also found
in other OOP languages like Smalltalk and Objective C (and Lisp?).

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On 16/07/2019 23:41, boB Stepp wrote:

> essentially said that I should be testing the public interface to my
> classes, but not the methods only used internally by the class and not
> meant to be publicly accessible.  

I suspect he meant that you should publish the tests for the API
but not necessarily for the internal/private methods.

You should definitely test all code you write, but how formally
you test the private stuff is up to you. But publishing the
public API tests allows clients to run them too.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On Tue, Jul 16, 2019 at 04:29:15PM -0500, James Hartley wrote:
> I ask this having more C++ knowledge than sense.
> There is an adage in the halls of everything Stroustrup that one needs to
> think about how resource allocation will be unwound if an exception is
> thrown.  This gets watered down to the mantra "Don't throw exceptions from
> within constructors."  Does this carry over to Python?  I'm trying to
> develop a Pythonistic mindset as opposed to carrying over old baggage...

No, it is perfectly safe to raise exceptions from within the Python 
constructors, whether you are using __new__ (the true constructor) or 
__init__ (the initialiser).

The only tricky part is if you allocate resources external to the 
object, like this:

class Weird(object):
    openfiles = []
    def __new__(cls, fname):
        f = open(fname)
        # New instance:
        instance = super().__new__(cls)
        if condition:
            raise ValueError
        return instance

Even if the __new__ constructor fails, I've kept a reference to an open 
file in the class. (I could have used a global variable instead.) That 
would be bad. But notice I had to work hard to make this failure mode, 
and write the code in a weird way. The more natural way to write that^1 
would be:

class Natural(object):
    def __init__(self, fname):
        self.openfile = open(fname)
        if condition:
            raise ValueError

Now if there is an exception, the garbage collector will collect the 
instance and close the open file as part of the collection process. 
That might not be immediately, for example Jython might not close the 
file until interpreter shutdown. But the earlier example will definitely 
leak an open file, regardless of which Python interpreter you use, while 
the second will only leak if the garbage collector fails to close open 

Here's a better example that doesn't depend on the quirks of the garbage 

class Leaky(object):
    instances = []
    def __init__(self):
        if random.random() < 0.1:
            raise ValueError

This will hold onto a reference to the instance even if the initialiser 
(constructor) fails. But you normally wouldn't do that.

class NotLeaky(object):
    def __init__(self):
        if random.random() < 0.1:
            raise ValueError

    x = NotLeaky()
except ValueError:

Now either the call to NotLeaky succeeds, and x is bound to the 
instance, or it fails, and x is *not* bound to the instance. With no 
references to the newly-created instance, it will be garbage collected.

^1 Actually that's not too natural either. It is not usually a good idea 
to hold onto an open file when you aren't actively using it, as the 
number of open files is severely constrained on most systems. 


On 16Jul2019 23:49, Alan Gauld <alan.gauld at> wrote:
>On 16/07/2019 22:56, Mats Wichmann wrote:
>>> thrown.  This gets watered down to the mantra "Don't throw 
>>> exceptions from
>>> within constructors."  Does this carry over to Python?
>> If you mean __init__, that's not a constructor, so you should set your
>> mind at rest :)   It's more properly an "initializer", the instance has
>> already been constructed when it's called.
>FWIW The true constructor is __new__() and its quite rarely overridden
>by application programmers. But if you do, it's not common that you'd do
>anything that would merit an exception. __new__ pretty much just sets
>up the structure of the object ready for initialisation by __init__.
>Incidentally, this two stage construction/initialisation is also found
>in other OOP languages like Smalltalk and Objective C (and Lisp?).

And to return to the OP's question:

The __init__ method (and arguably __new__ if you touch it - very rare) 
is like other Python code: resource allocation should normally get 
unwound as objects become unreferenced. So raising an exception should 
be a pretty safe thing to do.

That is a simplification. Of course if you implement an object with side 
effects _outside_ the Python object space (maybe it opened a scratch 
file to support something), it is your responsibility to ensure release 
in the object's __del__ method. But an object that just allocates a 
bunch of lists or dicts or the like? Python will clean that up for you.

That said, I try to do cheap initialisation before exspensive 
initialisation. So allocating locks, opening files, starting worker 
threads: these come at the bottom of the __init__ method.

Also, it is _ROUTINE_ to raise exceptions from __init__: like any other 
method we _expect_ you to raise ValueError if the initialiser parameters 
are insane (negatively sized arrays, etc etc).

So in Python, raising exceptions in __init__ is normal: it shouldn't 
happen when you programme is running correctly of course, but it is the 
_preferred_ action when your initialiser cannot complete correctly.


  x = Foo(....)

After this assignment we expect "x" to be a usable instance of Foo. We 
don't put special checks; what would such checks look like? (There are 
some answers for that, but they're all poor.)

So raising an exception is what happens if __init__ fails.

Cameron Simpson <cs at>

On 7/16/19 4:41 PM, boB Stepp wrote:
> Peter Otten, while responding to one of my questions in the past,
> mentioned something in passing that apparently has been mulling around
> in the back of my head.  I don't recall his exact words, but he
> essentially said that I should be testing the public interface to my
> classes, but not the methods only used internally by the class and not
> meant to be publicly accessible.  Is this generally how I should be
> viewing testing?  Would someone be willing to expand at some length on
> this topic?

Test everything (within reason).

If you practice TDD, say, where your tests are the contract for how an
interface shall behave, that makes a lot more sense for public APIs,
which shall not change or be flagged immediately by your unit tests,
than for internal functions which should be able to change within reason
to suit implementation needs.

On 16/07/19 12:47 PM, Alan Gauld via Tutor wrote:
> On 15/07/2019 23:34, Mats Wichmann wrote:
>> Rule #1: it's all opinion in the end...

Not quite. Different jurisdictions (remember, this list has an 
international membership!) have different understandings of (even, 
respect for) copyrights and permissions.

I live in a jurisdiction where whatever I write ("create") is mine - or 
my employer's. It is not even necessary to claim or "exert" copyright!

However, under the 'different strokes...' rule, I still include a 
copyright/license statement - if only to avoid misunderstandings in 
places that have other requirements and to at least 'make an effort' to 
communicate with those who have no such concepts - either in society, or 
(un)enshrined in law.

>> The common practice is that licence/copyright text is included as a
>> comment in the code, not in a docstring.
> I'd second that opinion. I don't like losing the copyright stuff
> to a separate file - too easy to get lost. But I certainly don't
> want it in my help() output either.
> A comment solves both for the downside of some initial scrolling
> when reading or editing the file

Some disagreement here.
(but nothing worth fighting-over!)

One line offers plenty of space to exert a claim (such can be very 
simple and does not need to be lawyer-speak!) which should also refer to 
the template's/package's external file or web-page. The latter giving as 
much space for whatever you (or your legal representative(s) ) might 
want to say!

There can be quite an accumulation of 'paper-work' at the top of 
modules, which then has to be scrolled-through before we can get 
stuck-in to function/class/__main__ code - even with an editor's 
code-folding assistance.

Should it be left to the (far) end of the file? Would it lessen any 
legal implication?

Aside from possibly irritating 'the good guys', does such really 'stop' 
a determined rapscallion?
(...desperate student, lazy 'professional', corporate 'raider'...?)

Regards =dn

On 17/07/2019 21:01, David L Neil wrote:

> One line offers plenty of space to exert a claim (such can be very 
> simple and does not need to be lawyer-speak!) which should also refer to 
> the template's/package's external file or web-page. 

Yes, I've seen that and if the lawyer speak is very verbose its
a good compromise.

> Aside from possibly irritating 'the good guys', does such really 'stop' 
> a determined rapscallion?

Nothing will stop a determined rapscallion(love that phrase! ;-)
But it lets the good guys know who to contact at least if they
do need to.
For example, in my last book the publishers required me to
get a disclaimer from the author of some open source files
even though they clearly stated they could be used for any
purpose. Having the copyright notice with email link made
that easy.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

I am using Python3.6

Working with C/Python embedding:

I have to call a shared library (.so) that I created using PyObjects.

Please advise.

Jesse Ibarra

On 18/07/19 10:08 AM, Alan Gauld via Tutor wrote:
> On 17/07/2019 21:01, David L Neil wrote:
>> One line offers plenty of space to exert a claim (such can be very
>> simple and does not need to be lawyer-speak!) which should also refer to
>> the template's/package's external file or web-page.
> Yes, I've seen that and if the lawyer speak is very verbose its
> a good compromise.
>> Aside from possibly irritating 'the good guys', does such really 'stop'
>> a determined rapscallion?
> Nothing will stop a determined rapscallion(love that phrase! ;-)
> But it lets the good guys know who to contact at least if they
> do need to.
> For example, in my last book the publishers required me to
> get a disclaimer from the author of some open source files
> even though they clearly stated they could be used for any
> purpose. Having the copyright notice with email link made
> that easy.

Open source:
I've had the same - even for short "shazzam" or "drum-roll-please" 
sound-clips (which advertisers use all the time - who can name the 
pieces of music without saying "The lady loves Milk Tray" or "British 
Airways"?). That said, internationally there are many definitions and 
variations of "fair use" - and some jurisdictions don't even recognise 
such a thing! (I think the British law is quite 'tight').

I refused such a request?instruction, suggesting that the publishers AND 
their lawyers should enter 'the real world' and learn to understand 
(?and embrace) "open" concepts. In response to the inevitable grumpy 
push-back, I pointed-out that I am an author/content-producer and not 
legally-trained (not quite true, but they don't know that) so why on 
earth would they take MY advice...

On 17/07/2019 18:03, Jesse Ibarra wrote:
> I am using Python3.6
> Working with C/Python embedding:
> I have to call a shared library (.so) that I created using PyObjects.
To be honest embedding is quite an advanced topic and the tutor
list is generally targeted at beginners so you might find
limited help here. (Although technically it does fall within
our remit, it's just that it doesn't come up too often!)

I'd suggest you post your queries on the main python list where
there will likely be far more experts available.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On 16 Jul 2019 23:31, Daniel Bosah <dbosah at> wrote:

Hi all,

I have a problem trying to match items in a dict and pandas series in

I have a dict ( called city_dict )of cities and city_id's ; for each city (
which is a key in the dict ), a unique city_id is a value in that dict.

So for example, city_dict = { New York : 1001, LA : 1002, Chicago : 1003 }.
New York is a key, 1001 is a value.

Now I have a panda Series called dfCities. In this series is a bunch of
cities, including the cities in city_dict.

My goal is to replace the cities in dfCities with the city_id's in a brand
new csv file. So if dfCities has New York in it, I want to replace it with
it's value in the dictionary, so 1001.

=====? check out It accepts a dict, see also examples below.

I'm trying to pass arguments to list.

This script needs to generate csv files. I want to pass the version and
build number to the script and it should be passed to the list.

My script below is blowing index out of range after adding the args.
However, if I hardcode the version, it works fine.
Kindly advise.

# User Args

version = sys.argv[1]

build = sys.argv[2]

add_file = (



     'Product-'+version + 'Package (Build' + build + ')',


     'Product-'+version + 'Service Package (Build' + build +  ')',




Traceback (most recent call last):

  File "/Users/atamsekar/Projects/PulseSO/trunk/swrelease/",
line 5, in <module>

    version = sys.argv[1]

IndexError: list index out of range

Process finished with exit code 1


Anirudh Tamsekar

On 18/07/2019 19:34, Anirudh Tamsekar wrote:

> # User Args
> version = sys.argv[1]
> build = sys.argv[2]
> add_file = (...
> )

> Traceback (most recent call last):
>   File "/Users/atamsekar/Projects/PulseSO/trunk/swrelease/",
> line 5, in <module>
>     version = sys.argv[1]
> IndexError: list index out of range

The obvious thing to do is add a print statement before the asignments


and check that it contains your expected values.

If that doesn't help show us your full code (including import
statements) and your full execution output, including whree you
call the program.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On Thu, Jul 18, 2019 at 11:34:09AM -0700, Anirudh Tamsekar wrote:

> My script below is blowing index out of range after adding the args.

> version = sys.argv[1]

> Traceback (most recent call last):
>   File "/Users/atamsekar/Projects/PulseSO/trunk/swrelease/",
> line 5, in <module>
>     version = sys.argv[1]
> IndexError: list index out of range

The obvious question is, are you passing command line arguments to your 

How are you calling your script? If you are running it from an IDE you 
need to tell the IDE to include command line arguments.

If you run it like this, from the operating system's command line:


then there are no command line arguments and sys.argv[1] will be out of 
range. You need to run something like this:

    python argument1 argument2

Try putting:


at the start of your script and see what is there.


<mhysnm1964 at> writes:

> The web page is not going to have any fancy structure or visual effects at
> all. I am more than happy to hand-code the HTML/JavaScript. I have had a
> quick search and found a range of modules. Most of them indicate they are
> for CMS web sites and look far to complex for my needs. 
> If someone could point myself to a module and possibly a tutorial for the
> module. I would be grateful. Hopefully this is not to of a open question.

It's a good question. You have an answer that directs you to the Python

If you want to go beyond that, the Python Wiki has many suggestions
<URL:> and for frameworks
specifically <URL:> (some may
be out of date though). Each has a short description, you can use that
to gauge whether you want to explore.

David L Neil <PyTutor at> writes:

> There can be quite an accumulation of 'paper-work' at the top of
> modules, which then has to be scrolled-through before we can get
> stuck-in to function/class/__main__ code - even with an editor's
> code-folding assistance.
> Should it be left to the (far) end of the file? Would it lessen any
> legal implication?

I am more commonly using:

* Top of file has a brief (3-line) declaration, in a comment, that this
  is free software, you are free to modify and redistribute under
  certain conditions; see the end of the file.

* End of file has full copyright information: copyright statements
  (years and holders), standard boilerplate grant of freedoms,
  disclaimer of warranty, and specific filename where the full legal
  license document is to be found.

> Aside from possibly irritating 'the good guys', does such really
> 'stop' a determined rapscallion?

The good guys also need to have a clear idea of exactly who claims what,
and exactly what freedoms they have in exactly what work when they
receive it.

Files have a tendency to migrate from work to work, and so it's not
uncommon to have multiple works combined with different declarations
that apply. It's good to attach this information directly in the file.

I see no good reason why it needs to be all at the top of the file,
though. To my eye, a very brief ?you're free to do this set of things,
see the end of this file for details? is notifying the reader without
over-burdening them every time they open the file.

Hello together,

I try to write a Python tool but after hours of trying , reading and looking for help nothing an no one would help me. So this is my most desperate call for help. I attached the project and my required files. If there is someone who would help me not loosing my scholarship and so beeing excluded from university you'll be my hero.

Best regards


Hello all,

I have a date time string that looks like the following.

0    2015-07-01 00:01:44.538420-08:00
1    2015-07-01 00:27:58.717530-08:00
2    2017-07-01 07:07:48.391376-08:00

I have tried the following two different methods, both did not work.
Method one: pandas
import pandas as pd
stamp = pd.to_datetime(my_string, format='%Y%m%d %H:%M:%S')

Method two: datetime package
from datetime import datetime
datetime.strptime(my_string, '%Y-%m-%d %H:%M:%S')

Are both ways suppose to work? Also, does it matter if there are decimals
after seconds? Thanks a lot!

Try python-dateutil - I found it to be one of the better python modules 
for dealing with dates.? You may have to replace or strip some of the 


On 7/19/19 7:44 PM, C W wrote:
> Hello all,
> I have a date time string that looks like the following.
> 0    2015-07-01 00:01:44.538420-08:00
> 1    2015-07-01 00:27:58.717530-08:00
> 2    2017-07-01 07:07:48.391376-08:00
> I have tried the following two different methods, both did not work.
> Method one: pandas
> import pandas as pd
> stamp = pd.to_datetime(my_string, format='%Y%m%d %H:%M:%S')
> Method two: datetime package
> from datetime import datetime
> datetime.strptime(my_string, '%Y-%m-%d %H:%M:%S')
> Are both ways suppose to work? Also, does it matter if there are decimals
> after seconds? Thanks a lot!
On Jul 20, 2019 7:56 AM, <aliqua at> wrote:
> Hello together,
> I try to write a Python tool but after hours of trying , reading and
looking for help nothing an no one would help me. So this is my most
desperate call for help. I attached the project and my required files.

Unfortunately the tutor list does not forward attachments. If the files are
not too big just include them in the body of the email. Otherwise you'll
have to store them in the cloud and give us a link.

Be sure to reply all so we all get to see your problem.

Bob gailer

On 25 Jun 2019 15:50, stephen.m.smith at wrote:


I have written a 'program' that does some reasonable screen scraping off of
a specific website. The program has gotten too large so I have tried to
segment it into logical pieces (tkinter logic as a start) but I am having
problems. Specifically I need to pass several dictionaries to the module
(imported code) that validates some user selection and into the code that
navigates through the website.

====?? Hi, not sure if you could use it here, but I was triggered by the term "validation":
* Tkinter allows you to define a validationcommand,
* You can also define tracers. These are functions that are triggered when a Tkinter var (e.g  StringVar) is changed.
* You can also bind events to a function. For example a <focusOut> event from an Entry checks the entry contents and colors it red if it's invalid.

It's been a while since I've used this but those tricks may come in handy!


On 7/19/19 4:53 AM, aliqua at wrote:
> Hello together,
> I try to write a Python tool but after hours of trying , reading and looking for help nothing an no one would help me. So this is my most desperate call for help. I attached the project and my required files. If there is someone who would help me not loosing my scholarship and so beeing excluded from university you'll be my hero.
> Best regards
> Hanna

Hanna, this ia bit over the top.

If you send code - with descriptions of specific problems and what
you've tried, not just a huge blob of code with a non-specific "it
doesn't work" - pepople here will happily try to help.  There are other
places you can look, like Stack Overflow, again it will require asking
targeted questions.

But you really can't expect us, as a band of spare-time volunteers, to
bail you out of a real or imagined situation where if you can't solve
the problem you'll be sent down. If it's that dire, there is usually
paid tutoring available at most institutions or hanging around them, and
potential loss of a scholarship ought to make that investment worthwhile.

On Fri, Jul 19, 2019 at 10:44:36PM -0400, C W wrote:
> Hello all,
> I have a date time string that looks like the following.
> 0    2015-07-01 00:01:44.538420-08:00
> 1    2015-07-01 00:27:58.717530-08:00
> 2    2017-07-01 07:07:48.391376-08:00

I assume that the leading number and spaces "0    " etc are NOT part of 
the strings.

> I have tried the following two different methods, both did not work.
> Method one: pandas
> import pandas as pd
> stamp = pd.to_datetime(my_string, format='%Y%m%d %H:%M:%S')
> Method two: datetime package
> from datetime import datetime
> datetime.strptime(my_string, '%Y-%m-%d %H:%M:%S')
> Are both ways suppose to work?

Not unless the string format matches the actual string. You can't expect 
to convert a string unless it matches the format.

> Also, does it matter if there are decimals
> after seconds?

Of course it matters.

Did you read the error message? The single most important skill for a 
programmer is to READ THE ERROR MESSAGE and pay attention to what it 
tells you went wrong:

py> datetime.strptime('2015-07-01 00:01:44.538420-08:00', '%Y-%m-%d %H:%M:%S')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/", line 510, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.5/", line 346, in _strptime
ValueError: unconverted data remains: .538420-08:00

See how the error message tells you that it couldn't convert the string 
because there is data left over at the end. The first thing to do is 
handle the microseconds.

Googling gets the answer: use "%f" as the code for fractional seconds.

py> datetime.strptime('2015-07-01 00:01:44.538420-08:00', '%Y-%m-%d %H:%M:%S.%f')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/", line 510, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.5/", line 346, in _strptime
ValueError: unconverted data remains: -08:00

Now we're making progress! The error message has changed. Now you just 
need to decide what the "-08:00" part means, and change the format 
string appropriately.


From penelope at  Sat Jul 20 11:09:39 2019
From: penelope at (penelope at
Date: Sat, 20 Jul 2019 17:09:39 +0200
Subject: [Tutor] Stylometry Help
In-Reply-To: <>
References: <>
Message-ID: <>

Hello Bob,

thanks for your reply. I hope you can help me...the Project description is attached. Please let me know if you can afford some time and coding skills.

Best regards


Am 20. Juli 2019 16:36:08 MESZ schrieb Bob Gailer <bgailer at>:
>On Jul 20, 2019 7:56 AM, <aliqua at> wrote:
>> Hello together,
>> I try to write a Python tool but after hours of trying , reading and
>looking for help nothing an no one would help me. So this is my most
>desperate call for help. I attached the project and my required files.
>Unfortunately the tutor list does not forward attachments. If the files
>not too big just include them in the body of the email. Otherwise
>have to store them in the cloud and give us a link.
>Be sure to reply all so we all get to see your problem.
>Bob gailer

Time for me to ask something...

I've got a scenario where I need to pass data in both directions of a
multiprocessing program.  Specifically what I'm working with is a test
runner from a homegrown testing harness (aside having nothing to do with
the question: if it was me, I would build this starting with pytest, but
the codebase predates pytest and I'm only willing to do so much surgery
at the moment!). The workflow is: the main program either uses it
arguments to figure out what tests to run or scans for them, building up
a list of files which is then modified into a list of Test class instances.

This is passed to a set of workers through a multiprocessing.Queue.
Each worker pulls items off the queue and runs them, which causes the
Test instance to be updated with results. This is easy enough and looks
just like the threaded version of the runner; since everything is
gathered and enqueued before the workers are fired up, finding when
things are done is easy enough. For example, can use the sentinel trick
of writing a None for each worker at the end so each one in turn will
pick up one of those and quit. That's not even necessary, it seems.

The mp version (which I'm doing just as a personal exercise) has a
difference: the main program wants to go through the results and produce
a summary report, which means it needs access to the modified Test
instances. Unlike the threaded version, the list of instances is not
shared between the processes. I figured I'd send the modified instances
back through another queue which the main process reads.  So now after
all this vebiage, the question: how does the main process, acting in a
consumer role on the resultq, read this queue effectively in the
presence of multiple producers?

If it does q.get in a loop, it will eventually block when the queue is
empty.  If it does q.get_nowait, it _might_ see an empty queue when some
tests are pending that haven't been written to the queue yet (especially
since the pickling that takes place on the queue apparently introduces a
little delay)

A somewhat simplified view of what I have now is:

Queue, JoinableQueue imported from multiprocessing

    queue = JoinableQueue()
    resultq = Queue()
        procs = [RunTest(queue=queue, resultq=resultq) for _ in range(jobs)]
    total_num_tests = len(tests)
    for t in tests:
    for p in procs:
        p.daemon = True

    # collect back the modified test instances
    tests = []
    for t in iter(resultq.get, None):
        # because q.get() blocks, we need a way to break out
        # q.empty() is claimed not to be reliable in mp.
        if len(tests) >= total_num_tests:

This actually works fine based on knowing how many tests there are and
assuming when we've collected the same number, everything is done.  But
it doesn't sound entirely "clean" to me. What if a test is "lost" due to
an error, so its modified Test instance is never written to the result
queue?  Could use a sentinel, but who's going to write the sentinel -
this seems to suffer from the same problem as the non-blocking get, in
that a worker could write a sentinel when *it* is done, but other
workers might still be running their last test and we'll conclude the
queue has been drained a bit too early.  I did play some with trying to
set up a barrier across the workers, so they're all finished before each
writes a sentinel, but that still blocked in the q.get (mea culpa: I'm
violating the guidelines and not posting code for that - I didn't save
that particular hack).

I've read a little bit about multiprocessing.Manager and about
multiprocessing.Pool to look for other approaches, both seemed to have
limitations that made them awkward for this case, probably based on me
failing to understand something.

So... what else should I be trying to make this a little cleaner?


-- mats

Thanks a lot Steven. The %f is what I was missing.

The "-08:00" is the UTC timezone, which is California, USA, which I believe
is %z.


On Sat, Jul 20, 2019 at 7:50 PM Steven D'Aprano <steve at> wrote:

> On Fri, Jul 19, 2019 at 10:44:36PM -0400, C W wrote:
> > Hello all,
> >
> > I have a date time string that looks like the following.
> >
> > 0    2015-07-01 00:01:44.538420-08:00
> > 1    2015-07-01 00:27:58.717530-08:00
> > 2    2017-07-01 07:07:48.391376-08:00
> I assume that the leading number and spaces "0    " etc are NOT part of
> the strings.
> > I have tried the following two different methods, both did not work.
> > Method one: pandas
> > import pandas as pd
> > stamp = pd.to_datetime(my_string, format='%Y%m%d %H:%M:%S')
> >
> > Method two: datetime package
> > from datetime import datetime
> > datetime.strptime(my_string, '%Y-%m-%d %H:%M:%S')
> >
> >
> > Are both ways suppose to work?
> Not unless the string format matches the actual string. You can't expect
> to convert a string unless it matches the format.
> > Also, does it matter if there are decimals
> > after seconds?
> Of course it matters.
> Did you read the error message? The single most important skill for a
> programmer is to READ THE ERROR MESSAGE and pay attention to what it
> tells you went wrong:
> py> datetime.strptime('2015-07-01 00:01:44.538420-08:00', '%Y-%m-%d
> %H:%M:%S')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python3.5/", line 510, in
> _strptime_datetime
>     tt, fraction = _strptime(data_string, format)
>   File "/usr/local/lib/python3.5/", line 346, in _strptime
>     data_string[found.end():])
> ValueError: unconverted data remains: .538420-08:00
> See how the error message tells you that it couldn't convert the string
> because there is data left over at the end. The first thing to do is
> handle the microseconds.
> Googling gets the answer: use "%f" as the code for fractional seconds.
> py> datetime.strptime('2015-07-01 00:01:44.538420-08:00', '%Y-%m-%d
> %H:%M:%S.%f')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python3.5/", line 510, in
> _strptime_datetime
>     tt, fraction = _strptime(data_string, format)
>   File "/usr/local/lib/python3.5/", line 346, in _strptime
>     data_string[found.end():])
> ValueError: unconverted data remains: -08:00
> Now we're making progress! The error message has changed. Now you just
> need to decide what the "-08:00" part means, and change the format
> string appropriately.
> --
> Steve
> _______________________________________________
> Tutor maillist  -  Tutor at
> To unsubscribe or change subscription options:

On Tue, 23 Jul 2019 at 00:29, Mats Wichmann <mats at> wrote:
> I've got a scenario where I need to pass data in both directions of a
> multiprocessing program.
> So... what else should I be trying to make this a little cleaner?

I think if possible it is best to set these things up as an acyclic
pipeline so the basic problem here is the two-way communication. How
about using separate processes for putting things on the worker input
queue and reading from the worker output queue?

It probably makes sense for the master process that will outlive the
others to be the one that collects the output so maybe spawn a process
at the start whose job is putting all the test instances on the input
queue. Then spawn the workers who will read the input queue and write
to the output queue. Then the master process reads from the output
queue until it's done. I think this eliminates any need for
bidirectional communication which generally simplifies things.


Hi All,

Need one help in understanding generator expression/comprehensions.

This is my sample code.

# This code creates a generator and not a tuple comprehensions.
my_square =(num *num fornum inrange(11))
print(my_square) # <generator object <genexpr> at 0x7f3c838c0ca8>
# We can iterate over the square generator like this.
print(next(my_square)) # Prints the value 0,1,4....
print("Stop Iteration")
# Another iteration
forx inmy_square:
print(x) # This prints nothing.

Does the generator exhausts its values when we run the iterator once?
Lastly any specific reason for not having a tuple comprehensions?

Have checked this link, but could not understood the reason?



Hello everyone.  I'm thinking through a short program I want to write 
that will 'par2'/generate ECCs for all of my work files which branch out 
from a single directory and number approximately 15,000.  Specifically:
1) day one:
  - create a mirror copy of the directory tree empty of all files (there 
are a bunch of ways in bash of doing this).
  - recurse down the directory tree which has the files and run a par2 
create calculation on each file which generates approximately 10 *.par2 
fileblocks.  I will then copy the *.par2 fileblocks to the mirror 
directory tree into the same position as the 'principal file.  Therefore 
assuming 10 *.par2 fileblocks for every actual file, the mirror tree 
will have around 150,000 *.par2 fileblocks (space and CPU time are a 
2) day two:
  - for each file in the primary directory, par2 verify it with respect 
to its corresponding *.par2 fileblocks in the mirror tree.  If it's ok, 
move on to the next file, if not, repair it, generate a new set of 
*.par2 fileblocks and copy them over to the mirror.
3) day three:
  - same as day two, ongoing.

I'm aware that most par2 programs need the file and *.par blocks to be 
in the same location but let's assume I find a way around this.  Also, I 
believe it would be possible to par2 the top directory (which will give 
me work1.par2 - work10.par2) but the problem is performed this way, the 
blocks treat all files as a single whole so if I detect corruption, I 
have no way of locating which file.

I'm considering two ways of doing this:

Option A:
- This seems the most obvious if somewhat inelegant: define a few 
functions, and incorporate them into a for loop which will be applied to 
each file as described in 1) - 3) above.

Option B:
- I'm afraid my thinking is not entirely clear regards this option but 
somehow I import metadata for every (primary) file into a list (I think 
all that's needed is file name and location), perhaps even a nested list 
although I'm not sure if that provides an advantage.  Then I apply the 
operations for 1) - 3) above sequentially per list item, the assumption 
being the list data and my home made functions will be sufficient.

I've found various par2 programs on PyPi and possibly pyFileFixity could 
be used but in this instance I'd rather give it a go myself.  For 
various reasons I can't use ZFS which would, of course, negate the need 
for doing any of this.  It seems this would be my consolation prize :)

Hi Animesh,
Unfortunately the list server/email has removed the formatting from your 
sample, but no matter...

On 24/07/19 5:06 AM, Animesh Bhadra wrote:
> # This code creates a generator and not a tuple comprehensions.
> my_square =(num *num fornum inrange(11))
> print(my_square) # <generator object <genexpr> at 0x7f3c838c0ca8>
> # We can iterate over the square generator like this.
> try:
> whileTrue:
> print(next(my_square)) # Prints the value 0,1,4....
> exceptStopIterationasSI:
> print("Stop Iteration")
> # Another iteration
> forx inmy_square:
> print(x) # This prints nothing.
> Does the generator exhausts its values when we run the iterator once?

Yes, it involves a "lazy" approach - the value is not calculated until 
the next() requests it. Whereas, a list comprehension calculates all its 
values at one time (and requires the storage space to hold them). - 
notice that there is a yield and a next facility, but when it terminates 
StopIteration is raised. There is no 'start again' command!

The Python docs are informative:

If you have an actual reason for running through a generator expression 
more than once, then consider return-ing it from a function/class (which 
will then be directly accessible to the for-loop/next method).

> Lastly any specific reason for not having a tuple comprehensions?
> Have checked this link, but could not understood the reason?

I don't know.

Have you understood the differences between lists and tuples - 
specifically "mutability" and "immutability"?

Let's take a list comprehension. If you 'unwind it', can you reproduce 
it as a multi-line for-loop? Yes, but before the loop the 'target' list 
must be defined/declared to be a list; and within the loop the list is 
appended with the 'generated' values.

Ok? (sorry, don't know if this will be new to you, or not)

Now, instead of a list, try using a tuple? How do you append() to a tuple?

Yes, many people have confused generator expressions - 
surrounded/"delimited" by parentheses, ie ( and ), with tuples.
However, try this little demonstration:

>>> a, b = 1, 2
>>> a
>>> b
>>> a, b = ( 1, 2 )
>>> a
>>> b
>>> ( a, b ) = ( 1, 2 )
>>> a
>>> b
>>> type( a )
<class 'int'>
>>> type( ( 1, 2 ) )
<class 'tuple'>

The principle here is known as "tuple unpacking". The two constants 
(right-hand side) are arranged as a tuple, as are the two variables (a 
and b/left-hand side), regardless of the presence/absence of the 

Clarifying the difference/similarity in appearance between a generator 
expression and a tuple, it might help to think that it is the comma(s) 
which make it a tuple!

Regards =dn

On 7/23/19 11:06 AM, Animesh Bhadra wrote:
> Hi All,
> Need one help in understanding generator expression/comprehensions.
> This is my sample code.
> # This code creates a generator and not a tuple comprehensions.
> my_square =(num *num fornum inrange(11))
> print(my_square) # <generator object <genexpr> at 0x7f3c838c0ca8>
> # We can iterate over the square generator like this.
> try:
> whileTrue:
> print(next(my_square)) # Prints the value 0,1,4....
> exceptStopIterationasSI:
> print("Stop Iteration")

is this code you were actually running? because it won't work... an
except needs to be matched with a try, it can't match with a while.

you *can* comsume your the values your generator expression generates by
doing a bunch of next's, but why would you?  Instead, just iterate over
it (every generator is also an iterator, although not vice versa):

for s in my_square:

you don't have to manually catch the StopIteration here, because that's
just handled for you by the loop.

> # Another iteration
> forx inmy_square:
> print(x) # This prints nothing.
> Does the generator exhausts its values when we run the iterator once?
> Lastly any specific reason for not having a tuple comprehensions?
> Have checked this link, but could not understood the reason?
> ?*
> Regards,
> Animesh
On Tue, Jul 23, 2019 at 10:36:01PM +0530, Animesh Bhadra wrote:
> Hi All,
> Need one help in understanding generator expression/comprehensions.
> This is my sample code.

Lots of missing spaces in your code! Don't forget to hit the space bar 
between words :-)

Also missing indentation, which is probably Gmail being "helpful".

I'm just going to fix the obvious typos without comment.

> # This code creates a generator and not a tuple comprehensions.
> my_square =(num*num for num in range(11))
> print(my_square) # <generator object <genexpr> at 0x7f3c838c0ca8>
> # We can iterate over the square generator like this.
> try:
>     while True:
>         print(next(my_square)) # Prints the value 0,1,4....
> except StopIteration as SI:
>     print("Stop Iteration")
> # Another iteration
> for x in my_square:
>     print(x) # This prints nothing.

> Does the generator exhausts its values when we run the iterator once?


> Lastly any specific reason for not having a tuple comprehensions?

(1) History and (2) because tuple comprehensions are not useful enough 
to dedicate syntax to them.

Originally, Python introduced *list comprehensions* using square bracket 

    [expression for x in values]

which created a list. If I remember correctly, I think this was about 
Python 2.2. 

Tuples are mostly intended for *small* numbers of heterogeneous values, 
like a struct or record in other languages:

    # Typical example of tuple
    (1, "Hello", 4.5)

while lists are intended for potentially large numbers of homogeneous 
values, like an array of data:

    [2.3, 4.5, 0.9, 7.2, 9.3, 4.5, 6.1, 0.2, 8.7, ... # and many more

(heterogeneous = "different types", homogeneous = "same types")

That makes lists a really good fit for comprehensions, and tuples a bad 
fit. So it was decided that the need for a direct tuple comprehension 
was pretty low. If you wanted a tuple, you could easily get one:

    tuple([expression for x in values])

and that was good enough without wasting useful syntax on something that 
would hardly ever be needed.

It would be a tiny bit more expensive to run (first you create a list, 
then copy it to a tuple, and then garbage collect the list) but so what? 
Not every rare operation needs to be optimized.

A few years later, generator comprehensions were added as a lazy version 
of list comprehensions. List comprehensions produce all the values up 
front, while generator comprehensions produce them one at a time as 
needed. The syntax chosen used round brackets ( ... ) instead of square 
brackets [ ... ] but otherwise was identical.

What other syntax could have been chosen? Curly brackets are used for 
dicts (Python 2 and 3) and sets (Python 3 only), so they are unsuitable, 
and later on in Python 3 we gained dict and set comprehensions as well.

Had we wasted the ( ... ) syntax on tuples, which hardly anyone would 
have ever used, we would have missed out on the super-useful lazy 
generator comprehensions.

> Have checked this link, but could not understood the reason?
>  * 

Don't believe everything you read on Stackoverflow. There's lots of 
utter nonsense on that page, for example:

    "The answer is obviously because tuple syntax and parenthesis are 

is rubbish. The parser can tell the difference between tuples and 
parentheses. Tuples use *commas*, not parentheses, except for the 
empty tuple () which is a special case. There's no ambiguity in Python.

The top rated answer

    "parentheses were already taken for ? generator expressions"

is incorrect, because generator expressions came second, not first. 
Python could have added tuple comprehensions at the same time it added 
list comprehensions, but the developers decided not to:

- it wasn't clear how useful comprehensions would be; they were really 
useful in Haskell, but people weren't sure if Python developers would 
take to them (they sure did!);

- list comprehensions are clearly useful, but tuple comprehensions not 
so much. Not ever single use has to get its own special syntax.

Of course, *later on* once Python had generator comprehensions using 
parentheses, we couldn't easily add tuple comprehensions because 
there's no good syntax left. But why would we want them?


From steve at  Tue Jul 23 21:22:34 2019
From: steve at (Steven D'Aprano)
Date: Wed, 24 Jul 2019 11:22:34 +1000
Subject: [Tutor] Python Generator expressions
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jul 24, 2019 at 11:26:26AM +1200, David L Neil wrote:

> Clarifying the difference/similarity in appearance between a generator 
> expression and a tuple, it might help to think that it is the comma(s) 
> which make it a tuple!

David makes an excellent point here. Except for the special case of the 
empty tuple, it is *commas* that create tuples, not parentheses. The 
round brackets are used for grouping, and are optional:

py> a = 1, 2, 3
py> type(a)
<class 'tuple'>

The only times you *need* parens around a non-empty tuple is to group 
the items or to avoid ambiguity, e.g. a tuple inside a tuple:

a = 1, 2, (3, 4, 5), 6
assert len(a) == 4

or when passing a literal tuple as argument to a function:

function(1, 2, 3, 4, 5, 6)    # six arguments
function(1, 2, (3, 4, 5), 6)  # four arguments


Hi tutors on Tutor,

I'm Taishi from Japan working in a data analytics team.

Currently, I'm trying to analyse purchase data of a fashion brand.
However, mysterious KeyErrors started occurring continuously when I was coding, and I haven't been able to get that fixed.

Although I asked this question on Stack Over Flow and another platform, I haven't got any solution.

I'm suspecting that Anaconda might keep raising the error, or simply there are bugs in the codes.
However, I can't be sure what the real cause is.

Do you think you can help me out?
If yes, please give me some assistance.

Taishi Kawamura

On 24/07/19 3:21 PM, TAISHI KAWAMURA wrote:
> Hi tutors on Tutor,
> I'm Taishi from Japan working in a data analytics team.
> Currently, I'm trying to analyse purchase data of a fashion brand.
> However, mysterious KeyErrors started occurring continuously when I was coding, and I haven't been able to get that fixed.
> Although I asked this question on Stack Over Flow and another platform, I haven't got any solution.
> I'm suspecting that Anaconda might keep raising the error, or simply there are bugs in the codes.
> However, I can't be sure what the real cause is.
> Do you think you can help me out?
> If yes, please give me some assistance.

People here are happy to help, but you'll need to show us the relevant 
code and perhaps some data if that's possibly the source of the problem.

Regards =dn

Hi Taishi, and welcome.

On Wed, Jul 24, 2019 at 03:21:03AM +0000, TAISHI KAWAMURA wrote:

> I'm suspecting that Anaconda might keep raising the error, or simply 
> there are bugs in the codes. However, I can't be sure what the real 
> cause is.

Almost certainly it will be a bug in your code, or unexpected values in 
your data. It is possible that it is a bug in Anaconda, but not very 

Start by removing Anaconda from the problem: what happens if you run 
your code using the plain Python interpreter, without Anaconda involved? 
>From the operating system command line, run something like this:

    # Windows CMD.EXE
    py path/to/

    # Linux or Mac OS shell
    python path/to/

Of course the example command you run will depend on the version of 
Python you are using, and the name and path to your script.

If the error goes away without Anaconda involved, then it is *possible* 
that it is a bug in Anaconda. But I expect the error will remain.

Then show us the full traceback of the error. Copy and paste the FULL 
exception message, starting with the line "Traceback" and ending with 
the KeyError and error message, something like this but probably longer:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: (1, 2, 3)

Once you know the line where the error is occurring, you could introduce 
some debugging code:

        result = data[key]
    except KeyError:

or use the debugger to investigate why the key is missing.

You should also read this:

It is written for Java programmers, but it applies to Python too.


Can anyone provide some simple example/s or two or three of using weakref?
I'm baffled and not seeing any documentation that is meaningful. My
interest is to minimize memory usage (generally speaking, overall) and am
wondering if this might help.

On 7/24/19 1:57 PM, Sarah Hembree wrote:
> Can anyone provide some simple example/s or two or three of using weakref?
> I'm baffled and not seeing any documentation that is meaningful. My
> interest is to minimize memory usage (generally speaking, overall) and am
> wondering if this might help.

The answer is "maybe" and as a guess "probably not" going to give
signifgant memory savings - but it depends on circumstances.

It would help if you could describe a bit more of the scenario that
causes you to worry about memory usage.

I looked for documentation on __weakref__ since you said there didn't
seem to be anything meaningful and agree there's not much to go on - it
mostly seems to be something you have to fiddle with if you're using
__slots__.  On the other hand, the weakref module, which is what you're
more likely to directly use, seems reasonably well documented.

There are sure to be people with more expertise here who can explain
this better... but the two prominent uses of weakref seem to be in
caching, and in avoiding reference loops (or rather avoiding problems
related to them).  In the former case, if the things you put in your
cache are weak references, they can be garbage-collected by the Python
interpreter as needed, but before that happens you can refer to them.
Where it perhaps makes sense is where you have an object that is
expensive that you're doing some processing on, and you also add it,
with a weak reference to a list of recent objects.  Once you're done...
well, I find that what I'm saying here is actually said better by the
documentation so I'll stop, just read there:

The latter case, reference cycles, where things refer to each other in a
way that potentially nothing gets released, ought to eventually get
caught by the interpreter which has a garbage collection algorithm for
this, but you can speed things along by being explicit. Something like
this perhaps:

import ctypes

# Use ctypes to access unreachable object by memory address.
class PyObject(ctypes.Structure):
    _fields_ = [("refcnt", ctypes.c_long)]

d1 = {}
d2 = {}
d1['obj2'] = d2
d2['obj1'] = d1

daddr = id(d1)
print("d1 ref count:", PyObject.from_address(daddr).refcnt)
## output: d1 ref count: 2

del d1, d2
print("d1 ref count:", PyObject.from_address(daddr).refcnt)
## output: d1 ref count: 2

## There's a non-zero refcount even though both dictionaries
## have been removed: d2 couldn't go away completely because
## it was still being referenced in d2, and d2 couldn't go away
## completely because it was still being referenced in d1,
## which couldn't go away because...

## We can change this by marking the values as weak references:

n1 = weakref.WeakValueDictionary()
n2 = weakref.WeakValueDictionary()
n1['obj2'] = n2
n2['obj1'] = n1

naddr = id(n1)
print("n1 ref count:", PyObject.from_address(naddr).refcnt)
## output: n1 ref count: 1

del n1, n2
print("n1 ref count:", PyObject.from_address(naddr).refcnt)
## output: n1 ref count: 0
## :: weakrefs don't contribute to the refcount.

Does this help at all?

On 7/25/19 9:58 AM, NITESH KUMAR wrote:
> Dear Sir/Madam
> I want to make Autocomplete searchbox using database .Please suggest me the
> code for this.

Your question leaves many more questions...

A searchbox implies you're using a graphical toolkit. Which one do you
intend to use (there are many available for Python)?  Or is your gui a
web browser maybe?

What do you mean "search... using database"?  Do you want the expanded
text to be used to search a database?  Or do you want to feed unexpanded
entered text to a database to get it to give you back completion
possibilities? Where would that data be coming from?  As you just trying
to replicate / replace Elasticsearch?

and so on....  it's likely you can find some existing code for this on
the web if you search a bit.  It's a pretty specialized problem, don't
know if anyone here has any specific places to point you to.  Which
isn't really what this list is for anyway.

The normal model here is you try something and ask for help if you can't
get it to work. Something like this:

 -  Send a code example that illustrates your problem - If possible,
make this a minimal example rather than an entire application.
 -  Details on how you attempted to solve the problem on your own.
 -  Full version information ? for example, "Python 3.6.4 with 1.0.0a". Mention the operating system (Linux, Mac, Windows)
as well.
 -  The full traceback if your code raises an exception - Do not curate
the traceback as you may inadvertently exclude information crucial to
solving your issue.

On Thu, Jul 25, 2019 at 07:04:52PM +0000, Mahima Daya wrote:

> hi there, please could you assist.

Yes, certainly. If you have questions about Python, you can ask 
concrete, specific questions. If you ask vague questions like "please 
help" expect to be ignored or told to do your own homework.

You should start by learning how to ask good questions about your code. 
Please read this:

It is written mostly for Java programmers but it applies equally to any 
programming language, including Python.

And remember that asking people to work on your assignments without 
crediting them is almost certainly a violation of your university's 
standards of academic ethics.


On 25/07/2019 20:04, Mahima Daya wrote:
> hi there, please could you assist.

With what?
Your subject tells us absolutely nothing.
Your message gives us no clues.

The fact that you are posting to the Python tutor list suggests
you might have a question about Python programming. But what
that is we cannot even begin to speculate.

Please ask again with a concise description of your task,
what you have tried and the outcome.

Include any code, include the full text of any error messages.
Tell us the OS and programming language (including versions)
that you are using.

We will not do your homework for you. We expect to see your
effort and a specific question that we can answer.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On 25/07/2019 16:58, NITESH KUMAR wrote:
> I want to make Autocomplete searchbox using database .Please suggest me the
> code for this.

Since you tell us next to noting we can only make wild suggestions.
Try to find a project that does the same thing - ideally one written
in Python (assuming that you are using python?) and see how it does it.

Failing that provide us with a lot more detail.
What kind of application is it? - Desktop? GUI? command line? Web based?
Mobile app?

What tools are you using - specifically any web or GUI toolkits.

What OS and python versions are you using?

What kind of database?

How do you anticipate this would work? Give us some examples?
Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

> On Jul 25, 2019, at 18:23, Alan Gauld via Tutor <tutor at> wrote:
> On 25/07/2019 20:04, Mahima Daya wrote:
>> hi there, please could you assist.
> Include any code, include the full text of any error messages.
> Tell us the OS and programming language (including versions)
> that you are using.
> We will not do your homework for you. We expect to see your
> effort and a specific question that we can answer.

It also looks like you tried to attach some files that have code in them.   Attachments won?t come through.  You will need to post any code in-line in the body of the email if we are going to be able to see it.

David Rock
david at

Dabo has an AutoComplete method.? You review how it was done.


On 7/25/2019 4:26 PM, Alan Gauld via Tutor wrote:
> On 25/07/2019 16:58, NITESH KUMAR wrote:
>> I want to make Autocomplete searchbox using database .Please suggest me the
>> code for this.
> Since you tell us next to noting we can only make wild suggestions.
> Try to find a project that does the same thing - ideally one written
> in Python (assuming that you are using python?) and see how it does it.
> Failing that provide us with a lot more detail.
> What kind of application is it? - Desktop? GUI? command line? Web based?
> Mobile app?
> What tools are you using - specifically any web or GUI toolkits.
> What OS and python versions are you using?
> What kind of database?
> How do you anticipate this would work? Give us some examples?
> #

This should be a slow ball pitch.  Unfortunately, I haven't stumbled across
a reasonable answer yet.

On occasion, I put long URL's into comments/docstrings simply to document
where I found specific information.  However, to be a good disciple of
PEP8, anything which can't fit within 72 characters needs to be split
across multiple lines.  Since a number of you seem to be prolific Python
coders, what opinion do you have about splitting URL's in


On 7/29/19 2:36 PM, James Hartley wrote:
> This should be a slow ball pitch.  Unfortunately, I haven't stumbled across
> a reasonable answer yet.
> On occasion, I put long URL's into comments/docstrings simply to document
> where I found specific information.  However, to be a good disciple of
> PEP8, anything which can't fit within 72 characters needs to be split
> across multiple lines.  Since a number of you seem to be prolific Python
> coders, what opinion do you have about splitting URL's in
> comments/docstrings?

One person's opinion:


It's not worth being so dogmatic that you split if something like a URL
by itself on a line goes a little long... even if it goes 105 chars, say
(there's not a magic number).

And if the URL is really long and full of crud, rather than simple and
readable, consider feeding it to a shortener - its visual clarity is
already compromised, after all.

James Hartley schreef op 29/07/2019 om 22:36:
> On occasion, I put long URL's into comments/docstrings simply to document
> where I found specific information.  However, to be a good disciple of
> PEP8, anything which can't fit within 72 characters needs to be split
> across multiple lines.  Since a number of you seem to be prolific Python
> coders, what opinion do you have about splitting URL's in
> comments/docstrings?
In my opinion this is one of the cases where this quote from PEP 8 applies:

"However, know when to be inconsistent -- sometimes style guide 
recommendations just aren't applicable. When in doubt, use your best 

My preference would be not to split the URLs. The reader doesn't need to 
see the whole URL to understand the structure of the code; the full URL 
is really only needed to copy/paste it into a browser.

On 30/07/19 8:36 AM, James Hartley wrote> On occasion, I put long URL's 
into comments/docstrings simply to document
> where I found specific information.  However, to be a good disciple of
> PEP8, anything which can't fit within 72 characters needs to be split
> across multiple lines.  Since a number of you seem to be prolific Python
> coders, what opinion do you have about splitting URL's in
> comments/docstrings?

Assuming the use-case for including a URL-comment is to be able to 
locate the resource again, I'd leave it integral so that it is ready for 
a quick copy-paste. Otherwise you'd have to re-assemble the URL in the 
browser, from multiple comment-lines in the code...

In addition to PEP-8, please consider 'the Zen of Python' (practicality 
and purity).

	python3 -c "import this"

Regards =dn

I have been using various iterations of a solitaire scorekeeper
program to explore different programming thoughts.  In my latest
musings I am wondering about -- in general -- whether it is best to
store calculated data values in a file and reload these values, or
whether to recalculate such data upon each new run of a program.  In
terms of my solitaire scorekeeper program is it better to store "Hand
Number, Date, Time, Score, Total Score" or instead, "Hand Number,
Date, Time, Score"?  Of course I don't really need to store hand
number since it is easily determined by its row/record number in its
csv file.

In this trivial example I cannot imagine there is any realistic
difference between the two approaches, but I am trying to generalize
my thoughts for potentially much more expensive calculations, very
large data sets, and what is the likelihood of storage errors
occurring in files.  Any thoughts on this?



On Tue, Jul 30, 2019 at 11:24 AM boB Stepp <robertvstepp at> wrote:
> In this trivial example I cannot imagine there is any realistic
> difference between the two approaches, but I am trying to generalize
> my thoughts for potentially much more expensive calculations, very
> large data sets, and what is the likelihood of storage errors
> occurring in files.  Any thoughts on this?

As with many things in programming, it comes down to how much time you
want to trade for space.  If you have a lot of space and not much
time, store the calculated values.  If you have a lot of time (or the
calculation time is negligible) and not much space, recalculate every
time.  If you have plenty of both, store it and recalculate it anyway
:).  Storing the information can also be useful for offline debugging.


On Tue, Jul 30, 2019 at 12:05 PM Zachary Ware
<zachary.ware+pytut at> wrote:
> On Tue, Jul 30, 2019 at 11:24 AM boB Stepp <robertvstepp at> wrote:
> > In this trivial example I cannot imagine there is any realistic
> > difference between the two approaches, but I am trying to generalize
> > my thoughts for potentially much more expensive calculations, very
> > large data sets, and what is the likelihood of storage errors
> > occurring in files.  Any thoughts on this?
> As with many things in programming, it comes down to how much time you
> want to trade for space.  If you have a lot of space and not much
> time, store the calculated values.  If you have a lot of time (or the
> calculation time is negligible) and not much space, recalculate every
> time.  If you have plenty of both, store it and recalculate it anyway

What is the likelihood of file storage corruption?  I have a vague
sense that in earlier days of computing this was more likely to
happen, but nowadays?  Storing and recalculating does act as a good
data integrity check of the file data.


On 30/07/2019 17:21, boB Stepp wrote:

> musings I am wondering about -- in general -- whether it is best to
> store calculated data values in a file and reload these values, or
> whether to recalculate such data upon each new run of a program.  

It depends on the use case.

For example a long running server process may not care about startup
delays because it only starts once (or at least very rarely) so either
approach would do but saving diskspace may be helpful so calculate the

On the other hand a data batch processor running once as part of a
chain working with high data volumes probably needs to start quickly.
In which case do the calculations take longer than reading the
extra data? Probably, so store in a file.

There are other options too such as calculating the value every
time it is used - only useful if the data might change
dynamically during the program execution.

It all depends on how much data?, how often it is used?,
how often would it be calculated? How long does the process
run for? etc.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On 30/07/2019 18:20, boB Stepp wrote:

> What is the likelihood of file storage corruption?  I have a vague
> sense that in earlier days of computing this was more likely to
> happen, but nowadays?  Storing and recalculating does act as a good
> data integrity check of the file data.

No it doesn't! You are quite likely to get a successful calculation
using nonsense data and therefore invalid results. But they look
valid - a number is a number...

Checking data integrity is what checksums are for.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

On 7/30/19 5:58 PM, Alan Gauld via Tutor wrote:
> On 30/07/2019 17:21, boB Stepp wrote:
>> musings I am wondering about -- in general -- whether it is best to
>> store calculated data values in a file and reload these values, or
>> whether to recalculate such data upon each new run of a program.  
> It depends on the use case.
> For example a long running server process may not care about startup
> delays because it only starts once (or at least very rarely) so either
> approach would do but saving diskspace may be helpful so calculate the
> values.
> On the other hand a data batch processor running once as part of a
> chain working with high data volumes probably needs to start quickly.
> In which case do the calculations take longer than reading the
> extra data? Probably, so store in a file.
> There are other options too such as calculating the value every
> time it is used - only useful if the data might change
> dynamically during the program execution.
> It all depends on how much data?, how often it is used?,
> how often would it be calculated? How long does the process
> run for? etc.

Hey, boB - I bet you *knew* the answer was going to be "it depends" :)

There are two very common classes of application that have to make this
very decision - real databases, and their toy cousins, spreadsheets.

In the relational database world - characterized by very long-running
processes (like: unless it crashes, runs until reboot. and maybe even
beyond that - if you have a multi-mode replicated or distributed DB it
may survive failure of one point) - if a field is calculated it's not
stored. Because - what Alan said: in an RDBMS, data are _expected_ to
change during runtime. And then for performance reasons, there may be
some cases where it's precomputed and stored to avoid huge delays when
the computation is expensive. That world even has a term for that: a
materialized view (in contrast to a regular view).  It can get pretty
tricky, you need something that causes the materialized view to update
when data has changed; for databases that don't natively support the
behavior you then have to fiddle with triggers and hopefully it works
out.  More enlightened now?

On Tue, Jul 30, 2019 at 7:05 PM Alan Gauld via Tutor <tutor at> wrote:
> On 30/07/2019 18:20, boB Stepp wrote:
> > What is the likelihood of file storage corruption?  I have a vague
> > sense that in earlier days of computing this was more likely to
> > happen, but nowadays?  Storing and recalculating does act as a good
> > data integrity check of the file data.
> No it doesn't! You are quite likely to get a successful calculation
> using nonsense data and therefore invalid results. But they look
> valid - a number is a number...

Though I may be dense here, for the particular example I started with
the total score in a solitaire game is equal to the sum of all of the
preceding scores plus the current one.  If the data in the file
somehow got mangled, it would be an extraordinary coincidence for
every row to yield a correct total score if that total score was
recalculated from the corrupted data.

But the underlying question that I am trying to answer is how
likely/unlikely is it for a file to get corrupted nowadays?  Is it
worthwhile verifying the integrity of every file in a program, or, at
least, every data file accessed by a program every program run?  Which
leads to your point...

> Checking data integrity is what checksums are for.

When should this be done in  normal programming practice?


On Tue, Jul 30, 2019 at 7:26 PM Mats Wichmann <mats at> wrote:
> On 7/30/19 5:58 PM, Alan Gauld via Tutor wrote:
> > On 30/07/2019 17:21, boB Stepp wrote:
> >
> >> musings I am wondering about -- in general -- whether it is best to
> >> store calculated data values in a file and reload these values, or
> >> whether to recalculate such data upon each new run of a program.
> >
> > It depends on the use case.
> >
> > For example a long running server process may not care about startup
> > delays because it only starts once (or at least very rarely) so either
> > approach would do but saving diskspace may be helpful so calculate the
> > values.
> >
> > On the other hand a data batch processor running once as part of a
> > chain working with high data volumes probably needs to start quickly.
> > In which case do the calculations take longer than reading the
> > extra data? Probably, so store in a file.
> >
> > There are other options too such as calculating the value every
> > time it is used - only useful if the data might change
> > dynamically during the program execution.
> >
> > It all depends on how much data?, how often it is used?,
> > how often would it be calculated? How long does the process
> > run for? etc.
> Hey, boB - I bet you *knew* the answer was going to be "it depends" :)

You are coming to know me all too well! ~(:>))

I just wanted to check with the professionals here if my thinking
(Concealed behind the asked questions.) was correct or, if not, where
I am off.

> There are two very common classes of application that have to make this
> very decision - real databases, and their toy cousins, spreadsheets.
> In the relational database world - characterized by very long-running
> processes (like: unless it crashes, runs until reboot. and maybe even
> beyond that - if you have a multi-mode replicated or distributed DB it
> may survive failure of one point) - if a field is calculated it's not
> stored. Because - what Alan said: in an RDBMS, data are _expected_ to
> change during runtime. And then for performance reasons, there may be
> some cases where it's precomputed and stored to avoid huge delays when
> the computation is expensive. That world even has a term for that: a
> materialized view (in contrast to a regular view).  It can get pretty
> tricky, you need something that causes the materialized view to update
> when data has changed; for databases that don't natively support the
> behavior you then have to fiddle with triggers and hopefully it works
> out.  More enlightened now?

Not more enlightened, perhaps, but more convinced than ever on how
difficult it is to manage the complexity of real world programs.

On 31/7/19 2:21 am, boB Stepp wrote:
> I have been using various iterations of a solitaire scorekeeper
> program to explore different programming thoughts.  In my latest
> musings I am wondering about -- in general -- whether it is best to
> store calculated data values in a file and reload these values, or
> whether to recalculate such data upon each new run of a program.  In
> terms of my solitaire scorekeeper program is it better to store "Hand
> Number, Date, Time, Score, Total Score" or instead, "Hand Number,
> Date, Time, Score"?  Of course I don't really need to store hand
> number since it is easily determined by its row/record number in its
> csv file.
> In this trivial example I cannot imagine there is any realistic
> difference between the two approaches, but I am trying to generalize
> my thoughts for potentially much more expensive calculations, very
> large data sets, and what is the likelihood of storage errors
> occurring in files.  Any thoughts on this?
> TIA!
 From a scientific viewpoint, you want to keep the raw data, so you can 
perform other calculations that you may not have thought of yet. But 
that's not got much to do with programming ;)

On 31/07/2019 03:02, boB Stepp wrote:

> preceding scores plus the current one.  If the data in the file
> somehow got mangled, it would be an extraordinary coincidence for
> every row to yield a correct total score if that total score was
> recalculated from the corrupted data.

True but the likelihood of that happening is vanishingly small.
What is much more likely is that a couple of bits in the
entire file will be wrong. So a 5 becomes a 7 for example.
Remember that the data in the files is a character based
(assuming its a text file) not numerical. The conversion
to numbers happens when you read it. The conversion is more
likely to detect corrupted data than any calculations you perform.

> But the underlying question that I am trying to answer is how
> likely/unlikely is it for a file to get corrupted nowadays?  

It is still quite likely. Not as much as it was 40 years ago,
but still very much a possibility. Especially if the data
is stored/accessed over a network link. It is still very
much a real issue for anyone dealing with critical data.

> worthwhile verifying the integrity of every file in a program, or, at
> least, every data file accessed by a program every program run?  Which
> leads to your point...

Anything critical should go in a database. That will be much
less likely to get corrupted since most RDBMS systems include
data cleansing and verification as part of their function.
Also for working with large volumes of data(where corruption
risk rises just because of the volumes) a database is a more
effective way of storing data anyway.

>> Checking data integrity is what checksums are for.
> When should this be done in  normal programming practice?

Any time you gave a critical piece of data in a text file.
If it is important to know that the data has changed (for
any reason, not just data corruption) then use a checksum.
Certainly if it's publicly available or you plan on shipping
it over a network a checksum is a good idea.

Alan G
Author of the Learn to Program web site
Follow my photo-blog on Flickr at:

Anyone could please let me know the difference between decorators and
inheritance in python.

Both are required to add additional functionality to a method then why are
we having 2 separate things in python for doing same kind of work.

Thank you,