[Python-Dev] pathlib - current status of discussions

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 12 01:28:51 EDT 2016


Donald Stufft writes:

 > I think yes and yes [__fspath__ and fspath should be allowed to
 > handle bytes, otherwise] it seems like making it needlessly harder
 > to deal with a bytes path

It's not needless.  This kind of polymorphism makes it hard to review
code locally.  Once bytes get a foothold inside a text application,
they metastasize altogether too easily, and you end up with TypeErrors
or UnicodeErrors quite far from the origin.  Debugging often requires
tracing data flows over hill and over dale while choking from the
dusty trail, or band-aids like a top-level "except UnicodeError:
log_and_quarantine(bytes)".  I can't prove that returning bytes from
these APIs is a big risk in this sense, but I can't see a way to prove
that it's not, either, given that their point is duck-typing, and
therefore they may be generalized in the future, and by third parties.

I understand that there are applications where it's bytes all the way
down, but by the very nature of computing systems, there are systems
where bytes are decoded to text.  For historical reasons (the encoding
Tower of Babel), it's very error-prone to do that on demand.  Best
practice is to do the conversion as close to the boundary as possible,
and process only text internally.

In text applications, "bytes as carcinogen" is an apt metaphor.

Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
text-processing applications is more important than the inconvenience
to byte-shoveling applications.  But there is a need to be
parsimonious with polymorphism.



More information about the Python-Dev mailing list