On Mon, Oct 26, 2020 at 8:44 AM Cameron Simpson
On 24Oct2020 13:37, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2020-10-24 at 12:29:01 -0400, Brian Allen Vanderburg II via Python-ideas
wrote: ... Find can output it's filenames in null-terminated lines since it is possible to have newlines in a filename(yuck) ...
Spaces in filenames are just as bad, and much more common:
But much easier to handle in simple text listings, which are newline delimited.
You're really running into a horrible behaviour from xargs, which is one reason why GNU parallel exists.
I don't consider the behaviour horrible, and xargs isn't the only thing to do this - other tools can be put into zero-termination mode too. But it's pretty rare to consume huge amounts of data in this way (normally it'll just be a list of file names), so what I would do is simply read the entire thing, then split on "\0". It's not like reading a gigabyte of log file, where you really want to work line by line and not read in more than you need; it's easily going to fit into memory. If you actually DO need to read null-terminated records from a file that's too big for memory, it's probably worth just rolling your own buffering, reading a chunk at a time and splitting off the interesting parts. It's not hugely difficult, and it's a good exercise to do now and then. And yes, I can see the temptation to get Python to do it, but unfortunately, newline support is such a weird mess of cross-platform support that I don't think it needs to be made more complicated :) ChrisA