<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker <span dir="ltr"><<a href="mailto:chris.barker@noaa.gov" target="_blank">chris.barker@noaa.gov</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou <span dir="ltr"><<a href="mailto:solipsis@pitrou.net" target="_blank">solipsis@pitrou.net</a>></span> wrote:<br>


<div class="gmail_extra"><div class="gmail_quote">


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>> latin-1 guaranteed to work with any binary data, and round-trip accurately?<br>


<br>

</div><div class="im">Yes, it is.<br>

<div><br>

> and will surrogateescape work for arbitrary binary data?<br>

<br>

</div>Yes, it will.<br></div></blockquote><div><br></div><div>Then maybe this is really a documentation issue, after all.</div><div><br></div><div>I know I learned something.</div></div></div></div></blockquote><div><br>


</div><div>I think the other issue is everyone is talking about keeping the data from the file in a single object. If you slice it up into pieces and decode the parts as necessary this also solves the issue. So if you had an HTTP header you could do::</div>


<div><br></div><div>  raw_header, body = data.split(b'\r\n\r\n)</div><div>  header = raw_header.decode('ascii')  # Ort whatever HTTP headers are encoded in.</div><div><br></div><div>Now that might not easily solve the issue of the ASCII text interspersed (such as Kristján's "phone number in the middle of stuff" example), but it will deal with the problem. And if the numbers were separated with clean markers then this would probably still work.</div>


</div></div></div>