![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 21 August 2014 12:16, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Nick Coghlan writes:
One idea I had along those lines is a surrogatereplace error handler ( http://bugs.python.org/issue22016) that emitted an ASCII question mark for each smuggled byte, rather than propagating the encoding problem.
Please, don't.
"Smuggled bytes" are not independent events. They tend to be correlated *within* file names, and this handler would generate names whose human semantics get lost (and there *are* human semantics, otherwise the name would be str(some_counter)). They tend to be correlated across file names, and this handler will generate multiple files with the same munged name (and again, the differentiating human semantics get lost).
If you don't know the semantics of the intended file names, you can't generate good replacement names. This has to be an application-level function, and often requires user intervention to get good names.
If you want to provide helper functions that applications can use to clean names explicitly, that might be OK.
Yeah, I was thinking in the context of reproducing sys.stdout's behaviour in Python 2, but that reproduces the bytes faithfully, so 'surrogateescape' is already offers exactly the behaviour we want (sys.stdout will have surrogateescape enabled by default in 3.5). I'll keep pondering the question of possible helper functions in the "string" module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia