Hello Hauke,<div><br></div><div>I guess, you are mistaken with the meaning of re.LOCALE flag for space. It is not intersection but Union of the locale's space characters with the ascii space characters.</div><div><br>
</div><div><span style>For \S, </span><span style>with `LOCALE flag set, it will match </span>[^ \t\n\r\f\v]<span style> plus any non-whitespace characters defined by that locale. </span></div><div><font color="#222222" face="arial, sans-serif"><br>
</font></div><div><font color="#222222" face="arial, sans-serif"><br></font></div><div><br></div><div>+ In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,<br>+ these character classes will behave as if the union was given.<br>
<div class="gmail_quote"><br></div><div class="gmail_quote">Where did you find this logic? I see that, locale flag is matched first and then unicode.</div><div class="gmail_quote"><br></div><div class="gmail_quote">In Modules\_sre.c </div>
<div class="gmail_quote"><br></div><div class="gmail_quote">if (pattern->flags & SRE_FLAG_LOCALE)</div><div class="gmail_quote"> state->lower = sre_lower_locale;</div><div class="gmail_quote"> else if (pattern->flags & SRE_FLAG_UNICODE)</div>
<div><br></div><div><br></div><div>I am going ahead with the changes as I suggested previously and also opening a bug report. Further discussions and changes can be tracked there. Yeah, sometimes doc changes go for discussions and iterations too. :( </div>
<div><br></div><div>-- </div><div>Senthil</div><div><br></div><div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote">On Fri, Mar 9, 2012 at 6:12 AM, Hauke Rehr <span dir="ltr"><<a href="mailto:homo_laber@yahoo.de">homo_laber@yahoo.de</a>></span> wrote:<br>
</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td valign="top" style="font:inherit">
Hello again,<br><br>I can’t agree with your rewrite either, sorry - my suggestion based on yours:<div class="im"><br><br> + When the :const:`LOCALE` and :const:`UNICODE` flags are not specified,<br>+ matches any non-whitespace character; this is equivalent to the set ``[^<br>
</div>+ \t\n\r\f\v]`` With :const:`LOCALE`, it will match those elements of the above set<br>+ not defined as space in the current locale. If :const:`UNICODE` is set, those elements<br>+ of ``[^ \t\n\r\f\v]`` not marked as space in the Unicode character properties database<br>
+ will be matched.<br><br>If I don’t get the meaning of \S (that is: anything but \s) wrong, this should be correct.<br>The same applies to \W:<br><br>+ this will match anything other than ``[0-9_]`` not classified as<br>
+
alphanumeric in the Unicode character properties database.<br><br><br>For the additional sentence, I’d prefer:<br><br>+ In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,<br>+ these character classes will behave as if the union was given.<br>
<br>for that’s the logic behind.<br><br>Hauke<br><br>--- Senthil Kumaran <i><<a href="mailto:senthil@uthcode.com" target="_blank">senthil@uthcode.com</a>></i> schrieb am <b>Fr, 9.3.2012:<br></b><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px">
<b><br>Von: Senthil Kumaran <<a href="mailto:senthil@uthcode.com" target="_blank">senthil@uthcode.com</a>><br>Betreff: Re: [docs] copy&waste problem<br>An: "Hauke Rehr" <<a href="mailto:homo_laber@yahoo.de" target="_blank">homo_laber@yahoo.de</a>><br>
CC: <a href="mailto:docs@python.org" target="_blank">docs@python.org</a><br>Datum: Freitag, 9. März, 2012 09:18 Uhr<br><br></b><div><div class="h5"><div><b>Hello Hauke,<br><br>Yeah, it was pretty confusing. Thanks for catching this. How does this<br>
change sound?<br><br>- When the :const:`LOCALE` and :const:`UNICODE`
flags are not<br>specified, matches<br>- any non-whitespace character; this is equivalent to the set ``[^<br>\t\n\r\f\v]``<br>- With :const:`LOCALE`, it will match any character not in this set, and not<br>- defined as space in the current locale. If :const:`UNICODE` is<br>
set, this will<br>- match anything other than ``[ \t\n\r\f\v]`` and characters marked<br>as space in<br>- the Unicode character properties database.<br>+ When the :const:`LOCALE` and :const:`UNICODE` flags are not specified,<br>
+ matches any non-whitespace character; this is equivalent to the set ``[^<br>+ \t\n\r\f\v]`` With :const:`LOCALE`, it will match the above set and any<br>+ non-space character in the current locale. If :const:`UNICODE` is set, the<br>
+ above set ``[^ \t\n\r\f\v]`` and characters not marked as
space in the<br>+ Unicode character properties database.<br><br> ``\w``<br> When the :const:`LOCALE` and :const:`UNICODE` flags are not<br>specified, matches<br>@@ -381,8 +381,8 @@<br> any non-alphanumeric character; this is equivalent to the set<br>
``[^a-zA-Z0-9_]``.<br> With :const:`LOCALE`, it will match any character not in the set<br>``[0-9_]``, and<br> not defined as alphanumeric for the current locale. If<br>:const:`UNICODE` is set,<br>- this will match anything other than ``[0-9_]`` and characters marked as<br>
- alphanumeric in the Unicode character properties database.<br>+ this will match anything other than ``[0-9_]`` plus characters classied as<br>+ not alphanumeric in the Unicode character properties database.<br><br>
<br>Hope the rewrite is less confusing.<br><br>We can also include this sentence
somewhere.<br><br>Both re.LOCALE and re.UNICODE is specified together,in that case<br>re.LOCALE would be matched first and the re.UNICODE.<br><br><br>-- <br>Senthil<br><br></b></div></div></div></blockquote></td></tr></tbody></table>
</blockquote></div><br></div>