
On Tue, Mar 29, 2016 at 10:21 PM, Steven D'Aprano steve@pearwood.info wrote:
Unless I'm missing something, pathlib doesn't support bytes filenames. So the question doesn't come up. Any path you can use with pathlib is a valid Python string, because you can only create paths using Python strings.
So if I have a Linux file b'\xD8\x01', I can't create a path object to work with it.
Actually you can. Here's an annotated interactive session illustrating byte smuggling:
# Create a file using a bytes name
... with open(b'\xD8\x01',"w") as f: f.write("Demo text\n") ... 10
import os, pathlib # By default, you get text strings.
... os.listdir() ['\udcd8\x01']
os.listdir(".")
['\udcd8\x01']
# But you can request byte strings.
... os.listdir(b".") [b'\xd8\x01']
# pathlib works with text strings the same way.
... list(pathlib.Path(".").glob("*")) [PosixPath('\udcd8\x01')]
# And you can cast that to str and open it.
... open(str(pathlib.Path("\uDCD8\x01"))).read() 'Demo text\n'
So it looks like the only missing piece of the puzzle is a way to construct a Path from a bytes, which would do the same transformation of \xD8 to \uDCD8.
ChrisA