[Tutor] Collating date data from a csv file

David L Neil PyTutor at DancesWithMice.info
Sat May 11 23:20:58 EDT 2019


Hi Dave,

I also volunteer to do PAT safety testing during my "20% time". 
Clambering around Snowdonia as a boy, I eschewed* the Rheilffordd yr 
Wyddfa/SMR in favor of shanks' pony...

* OK, I was made to...! For the good of my soul???


On 9/05/19 8:04 AM, Dave Hill wrote:
> I have a csv file which details the results of equipment tests, I carry 
> out PAT testing as a volunteer at a heriatge railway in N. Wales. I want 
> to extract how many items were tested on each test day. So far I have 
> generated a List of test dates, but I am now stalled at how to 
> efficiently count numbers tested on each date.
> 
> Can I have a list of tuples, where one item is the date and the second 
> the count?
> 
> or is there a better construct?
> 
> Thanks in advance,
> 
> Dave
> 
> For completeness, I have listed below an extract from a target file, 
> where the 10 digit number is the UNIX timestamp
> 
> 182     1515001232
>      Toaster     13     2000     1
> 183     1515001259        Contact Grill     13     2000     1
> 245     1515001367
>      3G Cube Adaptor     13     0     1
> 246     1515001396         13A IEC Lead     5     0     1
> 248     1515001415
>      Worktop Light     3     30     1
> 420     1515001440
>      Fly killer     0     0     1
> 424     1515001461
>      Dairy fridge     13     0     1
> 427     1513277293        Fire     13     0     1
> 429     1515001489
>      Toaster Avanti     13     0     1


When you say "target file", is this coming off the tester via a link 
cable to your PC, or are you capturing by hand to a spreadsheet?

A tactic which many people 'miss' is that a workbook may contain 
multiple spreadsheets, and that the data on one spreadsheet may be 
auto-magically 'copied' onto another. Thus if the above is data coming 
off the PAT into one spreadsheet, I would immediately create a more 
meaningful sheet, 'for human consumption', which has column headings and 
converts (re-formats) the timestamp into a readable date (as suggested 
elsewhere), but is otherwise pretty-much a direct copy. We now have a 
sheet used for data capture/computer processing and something separate 
(and prettier) as a report/presentation for people.

 From the spec, above, we are only interested in the date. Remember that 
considering the whole timestamp only makes life confusing. So convert 
them (only) to dates. These can be strings because Python compares 
strings as easily as dates!  The time component could be retained if 
sequence (of testing) might be important.

The sad reality is that a daily count could be accomplished in either 
LO-Writer or MS-Excel. No reason why you shouldn't use Python though.

(Assuming that the data appears in (forward or reverse) date sequence) 
Read-in the data sheet/CSV file, row-by-row, taking note of the date of 
the first data-entry, and starting to count from one. Then increment for 
each row where the date matches. When the dates don't match, report, 
reset the counter, and note the new date.

How will you lay-out and present this report? Another spreadsheet? 
Screen? Paper?

When you say "count numbers tested on each date", the above method will 
let you know a (single) daily total of tests-performed.

Did you (also) mean that you want to track how many of tests were 
performed within categories of devices, eg how many toasters on the one 
day? In which case, further design consideration is required, eg which 
devices fit into which category and how to match "Toaster" with "Toaster 
Avanti"...

-- 
Regards =dn


More information about the Tutor mailing list