[Tutor] Type annotation errors
DL Neil
PyTutor at danceswithmice.info
Fri Jun 5 05:13:40 EDT 2020
> In terms of following along with Beazley's course material this would be
> cheating. Classes have not been presented yet. Even very simple ones.
> ~(:>))
>
>> PPS: For someone who knows a little C the obvious fix for your version
>> is to
>> cast the union to the expected type:
This is a little difficult to rationalise:
- we are not trying to get ahead of the course material
- but we are prepared to launch into complex (if not complicated) mypy
Apologies, I can see the 'why' but can't help but feel it is only that
imbalance which has led to the problem...
I'd like to discuss the (actual) problem, and then dispute the
claim-pursuant:
[from another part of the thread]
<<<
In this exercise set he wishes to create a module fileparse which can
accept a csv-like file and allow one to obtain a list of dictionaries (one
per file row) if the file has headers or a list of tuples if not,
representing the collection of records. Additionally he wishes to allow
for setting arguments to choose only certain data columns (If the file has
headers), do type conversions by providing a list of functions to do the
conversions, and setting a different delimiter than the default comma. At
the end of the exercises he comments:
"If you’ve made it this far, you’ve created a nice library function that’s
genuinely useful. You can use it to parse arbitrary CSV files, select out
columns of interest, perform type conversions, without having to worry too
much about the inner workings of files or the csv module."
>>>
I'm firmly joining with others who have suggested that rather than
"nice" it is actually quite 'ugly'. The routine, as-is, violates the
Single Responsibility Principle (SRP). It is trying to deal with CSV
files that have column headings AND those which don't. That's not an
horrendous crime, per-se (but see later). However, the idea that the
routine will output either a list of dicts or a list of tuples, most
certainly is a major transgression!
The Zen of Python says "Simple is better than complex. Complex is better
than complicated.". Which of the three describes that code?
To be fair, you weren't intending to discuss the course
materials/output, and IIRC there has been no explanation as to how the
calling routine 'knows' whether this .CSV has headers or not. Similarly,
it does not say how it will deal with the two-format output issue when
it comes time to actually use the extracted-data.
Regarding the first, wouldn't it make sense to not only ascertain that
headers are present (or not) AND note any headings - as a single task?
Now, instead of varying the function's output according to the presence
of headers (or not), the data could be extracted (only) as a tuple.
Lastly, when it comes time to further-process the extracted-data, it can
be paired/zipped with the headings, if-possible, as-required...
Alternately, write the basic function (no headings) and then add a
decorator to handle headings - separation in a different fashion.
To those, apply the Zen?
Now apply mypy to the function (even design-level stubs)?
Per the comment "nice library function", I'm hoping that you will later
report that the course builds-upon this/these function(s) and makes them
even more useful/"nice". Speaking for myself, and because I have
frequent needs to extract data from worksheets or .CSV files, I have a
bunch of classes ready for re-use (sub-classing per file/application)
and find them *very* useful.
As a general rule, I find that the greatest re-use is of simple classes
rather than those more complex or complicated. The complexity comes when
the simple 'framework' of a base-class is adapted and expanded to suit
the application. So, SRP rules!
It might be a little early to throw at you what is possibly the hardest
part of "SOLID" to understand and implement by-habit (but I know you've
used multiple languages over the years, before tackling Python): the
Dependency Inversion Principle (DIP) wherein we say that "details should
depend on abstractions" not "abstractions depend upon details".
In this mode, we abstract the process of taking data from a worksheet.
Then, isn't the absence/presence of headings, a 'detail'?
(just as another hint for the course's future: that we might only wish
to extract specific columns ("select") or rows ("project")...) Each of
these 'details' refines the extraction process rather than calls for an
entirely different method of extraction/different presentation of the
results!
--
Regards =dn
More information about the Tutor
mailing list