<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
tt
        {mso-style-priority:99;
        font-family:"Courier New";}
span.pre
        {mso-style-name:pre;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>This is one of those bugs that it’s simply not clear that
it can be fixed at all. The problem is that we have four different things
to try and be compatible with:<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>unicode(some_unicode_string)<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>unicode(some_ascii_string)<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>str(some_unicode_string)<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>str(some_ascii_string)<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>But in IronPython we don’t know whether some_*_string is
Unicode or ASCII because they’re always Unicode. We also don’t
know if we’re calling unicode or str because they’re also the same
thing. So we have 4 possible behaviors in CPython but there can
only be 1 behavior in IronPython. Ultimately we need to pick which behaviors
we want to be incompatible with </span><span style='font-size:11.0pt;
font-family:Wingdings;color:#1F497D'>L</span><span style='font-size:11.0pt;
font-family:"Calibri","sans-serif";color:#1F497D'> Maybe now that we have
bytes we should look at changing which one we picked so that if you replace str
with bytes we could match CPython. But most likely this problem, and
other subtle Unicode issues like it, won’t be completely solvable until
IronPython 3k. <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
users-bounces@lists.ironpython.com [mailto:users-bounces@lists.ironpython.com] <b>On
Behalf Of </b>Vernon Cole<br>
<b>Sent:</b> Thursday, December 17, 2009 11:06 AM<br>
<b>To:</b> Discussion of IronPython<br>
<b>Subject:</b> [IronPython] x = unicode(someExtendedUnicodeString) fails.<o:p></o:p></span></p>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal>I just tripped over this one and it took some time to figure
out what in blazes was going on. You may want to watch for it when porting
CPython code.<br>
<br>
I was cleaning up an input argument using <br>
s = unicode(S.strip().upper())<br>
where S is the argument supplying the value I need to convert.<br>
<br>
When I handed the function a genuine unicode string, such as in:<br>
assert Roman(u'\u217b') == 12 #unicode Roman number
'xii' as a single charactor<br>
IronPython complains with:<br>
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')<br>
<br>
The Python manual says:<o:p></o:p></p>
<p class=MsoNormal>If no optional parameters are given, <span class=pre><span
style='font-size:10.0pt;font-family:"Courier New";color:white;background:#3399FF'>unicode</span></span><span
class=pre><span style='font-size:10.0pt;font-family:"Courier New"'>()</span></span>
will mimic the behaviour of <span class=pre><span style='font-size:10.0pt;
font-family:"Courier New"'>str()</span></span> except that it returns <span
style='color:white;background:#3399FF'>Unicode</span> strings instead of 8-bit
strings. More precisely, if <em>object</em> is a <span style='color:white;
background:#3399FF'>Unicode</span> string or subclass it will return that <span
style='color:white;background:#3399FF'>Unicode</span> string without any
additional decoding applied.<o:p></o:p></p>
<div>
<p class=MsoNormal style='margin-bottom:12.0pt'><br>
It turns out that this was already reported on codeplex as:<br>
<a href="http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=15372">http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=15372</a>
<br>
but the reporting party did not catch the fact that he had located an
incompatibility with documented behavior.<br>
It has been setting on a back burner for some time. <br>
<br>
Others may want to join me in voting this up. Meanwhile I will add an
unneeded exception handler to my own code.<br>
--<br>
Vernon Cole<o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>