I think we should break up this problem into several parts:<br><br>(1) Should the default UTF-8 decoder fail if it gets an illegal byte sequence.  <br><br>It&#39;s probably OK for the default decoder to be lenient in some way (see below).<br>


<br>(2) Should the default UTF-8 encoder for file system operations be allowed to generate illegal byte sequences?<br><br>I think that&#39;s a definite no; if I set the encoding for a device to UTF-8, I never want Python to try to write illegal UTF-8 strings to my device.<br>


<br>(3) What kind of representation should the UTF-8 decoder return for illegal inputs?<br><br>There are actually several choices: (a) it could guess what the actual encoding is and use that, (b) it could return a valid unicode string that indicates the illegal characters but does not re-encode to the original byte sequence, or (c) it could return some kind of non-standard representation that encodes back into the original byte sequence.<br>


<br>PEP 383 violated (2), and I think that&#39;s a bad thing.<br><br>I think the best solution would be to use (3a) and fall back to (3b) if that doesn&#39;t work.  If people try to write those strings, they will always get written as correctly encoded UTF-8 strings.<br>


<br>If people really want the option of (3c), then I think encoders related to the file system should by default reject those strings as illegal because the potential problems from writing them are just too serious.  Printing routines and UI routines could display them without error (but some clear indication), of course.<br>


<br>There is yet another option, which is arguably the &quot;right&quot; one: make the results of os.listdir() subclasses of string that keep track of where they came from.  If you write back to the same device, it just writes the same byte sequence.  But if you write to other devices and the byte sequence is illegal according to its encoding, you get an error.<br>


<br>Tom<br><br><br>