<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, 17 Mar 2016 at 07:56 Guido van Rossum <<a href="mailto:guido@python.org">guido@python.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchaka <<a href="mailto:storchaka@gmail.com" target="_blank">storchaka@gmail.com</a>> wrote:<br>

>> Should we recommend that everyone use tokenize.detect_encoding()?<br>

><br>

> Likely. However the interface of tokenize.detect_encoding() is not very<br>

> simple.<br>

<br>

I just found that out yesterday. You have to give it a readline()<br>

function, which is cumbersome if all you have is a (byte) string and<br>

you don't want to split it on lines just yet. And the readline()<br>

function raises SyntaxError when the encoding isn't right. I wish<br>

there were a lower-level helper that just took a line and told you<br>

what the encoding in it was, if any. Then the rest of the logic can be<br>

handled by the caller (including the logic of trying up to two lines).<br></blockquote><div><br></div><div>Since this is for mypy my guess is you only want to know the encoding, but if you're simply trying to decode bytes of syntax then importilb.util.decode_source() will handle that for you.</div></div></div>