Barry Warsaw writes:
I really hope you do this, but note that it would be very helpful to have guidelines and recommendations even for advanced, knowledgeable Python developers.
I have participated in many discussions in various forums with other Python developers where genuine differences of opinion or experience, leads to different solutions. It would be very helpful to point to a document and say "here are the best practices for your [application|library] as recommended by core Python experts in Unicode handling."
I'll see what I can do, but for *best practices* going beyond the level of Paul Moore's use case is difficult for the reasons elaborated elsewhere (by others as well as myself): basic Unicode handling is no harder than ASCII handling as long as everything is Unicode. So the real answer is to insist on valid Unicode for your text I/O, failing that, text labeled *as* text *with* an encoding[1], and failing that (or failing validation of the input), reject the input.[2] If that's not acceptable -- all too often it is not -- you're in a world of pain, and the solutions are going to be ad hoc. The WSGI folks will not find the solutions proposed for email acceptable, and vice versa. Something like the format Nick proposed, where the tradeoffs are described, would be useful, I guess. But the tradeoffs have to be made ad hoc. Footnotes: [1] Of course it's OK if these are implicitly labeled by requirements or defaults of a higher-level protocol. [2] This is the Unicode party line, of course. But it's really the only generally applicable advice.