
On Wed, Nov 16, 2016 at 11:22:49PM -0800, Glyph Lefkowitz wrote:
However; is it really a regression to have py3 support for Words that just doesn't support other encodings yet? It strikes me that this is just a bug, and that we should just fall back from UTF-8 to latin-1 in this scenario. But adding that fallback is a small additional fix (perhaps one that should be slated for 16.6.0 if you want to make it).
Falling back to latin-1 will address the most obvious issue exposed by the client in the re-opened ticket. It will not fix the general issue. Note that my sample was heavily biased towards European servers. Other IRC servers in other regions might prefer a different 8-bit encoding, like windows-1251 or Big5. And often a single server will see a long tail (or at least a tail) of different 8-bit encodings. Listing all channels on a server, as the example script does, cannot be done with an implementation that decodes input as text prior to parsing it. It's even possible to use chardet to detect encodings. IRC's encoding situation mirrors file systems' one on POSIX. A given path's components can be in multiple encodings. I believe at least part of the reason FilePath's paths are bytes, even when surrogateescape exists, is that Unicode paths on POSIX systems would make FilePath unusable for perfectly valid use cases. We can pretend that IRC has a defined encoding, but doing so will make unusable for perfectly valid use cases.
-glyph
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python