On Sun, May 24, 2020 at 6:56 PM Steven D'Aprano <steve@pearwood.info> wrote:
I use bash a lot, and writing something like this is common: cat data | sort | cut -d; -f6 | grep ^foo | sort -r | uniq -c
And today's "Useless Use Of cat Award" goes to... :-)
sort data | ...
(What is it specifically about cat that is so attractive? I almost certainly would have done exactly what you did, even knowing that sort will take a file argument.)
This is probably going afield since it is a bash thing, not a Python thing. But I can actually answer this quite specifically. When I write a pipeline like that, I usually do not do it in one pass. I write a couple of the stages, look at what I have, and then add some more stages until I get it right. Many of the commands in the pipeline can take a file argument (not just sort, also cut, also grep, also uniq... everything I used in the example). But I find fairly often that I need to add a step BEFORE what I initially thought was first processing step. And then I have to remove the filename as an argument of that no-longer-first step. Rinse and repeat. With `cat` I know it does nothing, and I won't have to change it later (well, OK, sometimes I want -n or -s). So it is a completely generic "data" object ... sort of like how I would write "fluent programming" starting with a Pandas DataFrame, for example, and calling chains of methods.. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.