Hello Hauke,<div><br></div><div>I guess, you are mistaken with the meaning of re.LOCALE flag for space.  It is not intersection but Union of the locale&#39;s space characters with the ascii space characters.</div><div><br>

</div><div><span style>For \S, </span><span style>with `LOCALE flag set, it will match </span>[^ \t\n\r\f\v]<span style> plus any non-whitespace characters defined by that locale. </span></div><div><font color="#222222" face="arial, sans-serif"><br>

</font></div><div><font color="#222222" face="arial, sans-serif"><br></font></div><div><br></div><div>+   In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,<br>+   these character classes will behave as if the union was given.<br>

<div class="gmail_quote"><br></div><div class="gmail_quote">Where did you find this logic? I see that, locale flag is matched first and then unicode.</div><div class="gmail_quote"><br></div><div class="gmail_quote">In Modules\_sre.c    </div>

<div class="gmail_quote"><br></div><div class="gmail_quote">if (pattern-&gt;flags &amp; SRE_FLAG_LOCALE)</div><div class="gmail_quote">        state-&gt;lower = sre_lower_locale;</div><div class="gmail_quote">    else if (pattern-&gt;flags &amp; SRE_FLAG_UNICODE)</div>

<div><br></div><div><br></div><div>I am going ahead with the changes as I suggested previously and also opening a bug report. Further discussions and changes can be tracked there. Yeah, sometimes doc changes go for discussions and iterations too. :( </div>

<div><br></div><div>-- </div><div>Senthil</div><div><br></div><div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote">On Fri, Mar 9, 2012 at 6:12 AM, Hauke Rehr <span dir="ltr">&lt;<a href="mailto:homo_laber@yahoo.de">homo_laber@yahoo.de</a>&gt;</span> wrote:<br>

</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td valign="top" style="font:inherit">

Hello again,<br><br>I can’t agree with your rewrite either, sorry - my suggestion based on yours:<div class="im"><br><br> +   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified,<br>+   matches any non-whitespace character; this is equivalent to the set ``[^<br>

</div>+   \t\n\r\f\v]`` With :const:`LOCALE`, it will match those elements of the above set<br>+   not defined as space in the current locale. If :const:`UNICODE` is set, those elements<br>+   of ``[^ \t\n\r\f\v]`` not marked as space in the Unicode character properties database<br>

+   will be matched.<br><br>If I don’t get the meaning of \S (that is: anything but \s) wrong, this should be correct.<br>The same applies to \W:<br><br>+   this will match anything other than ``[0-9_]`` not classified as<br>

+  

 alphanumeric in the Unicode character properties database.<br><br><br>For the additional sentence, I’d prefer:<br><br>+   In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,<br>+   these character classes will behave as if the union was given.<br>

<br>for that’s the logic behind.<br><br>Hauke<br><br>--- Senthil Kumaran <i>&lt;<a href="mailto:senthil@uthcode.com" target="_blank">senthil@uthcode.com</a>&gt;</i> schrieb am <b>Fr, 9.3.2012:<br></b><blockquote style="border-left:2px solid rgb(16,16,255);margin-left:5px;padding-left:5px">

<b><br>Von: Senthil Kumaran &lt;<a href="mailto:senthil@uthcode.com" target="_blank">senthil@uthcode.com</a>&gt;<br>Betreff: Re: [docs] copy&amp;waste problem<br>An: &quot;Hauke Rehr&quot; &lt;<a href="mailto:homo_laber@yahoo.de" target="_blank">homo_laber@yahoo.de</a>&gt;<br>

CC: <a href="mailto:docs@python.org" target="_blank">docs@python.org</a><br>Datum: Freitag, 9. März, 2012 09:18 Uhr<br><br></b><div><div class="h5"><div><b>Hello Hauke,<br><br>Yeah, it was pretty confusing. Thanks for catching this. How does this<br>

change sound?<br><br>-   When the :const:`LOCALE` and :const:`UNICODE`

 flags are not<br>specified, matches<br>-   any non-whitespace character; this is equivalent to the set ``[^<br>\t\n\r\f\v]``<br>-   With :const:`LOCALE`, it will match any character not in this set, and not<br>-   defined as space in the current locale. If :const:`UNICODE` is<br>

set, this will<br>-   match anything other than ``[ \t\n\r\f\v]`` and characters marked<br>as space in<br>-   the Unicode character properties database.<br>+   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified,<br>

+   matches any non-whitespace character; this is equivalent to the set ``[^<br>+   \t\n\r\f\v]`` With :const:`LOCALE`, it will match the above set and any<br>+   non-space character in the current locale. If :const:`UNICODE` is set, the<br>

+   above set ``[^ \t\n\r\f\v]`` and characters not marked as

 space in the<br>+   Unicode character properties database.<br><br> ``\w``<br>    When the :const:`LOCALE` and :const:`UNICODE` flags are not<br>specified, matches<br>@@ -381,8 +381,8 @@<br>    any non-alphanumeric character; this is equivalent to the set<br>

``[^a-zA-Z0-9_]``.<br>    With :const:`LOCALE`, it will match any character not in the set<br>``[0-9_]``, and<br>    not defined as alphanumeric for the current locale. If<br>:const:`UNICODE` is set,<br>-   this will match anything other than ``[0-9_]`` and characters marked as<br>

-   alphanumeric in the Unicode character properties database.<br>+   this will match anything other than ``[0-9_]`` plus characters classied as<br>+   not alphanumeric in the Unicode character properties database.<br><br>

<br>Hope the rewrite is less confusing.<br><br>We can also include this sentence

 somewhere.<br><br>Both re.LOCALE and re.UNICODE is specified together,in that case<br>re.LOCALE would be matched first and the re.UNICODE.<br><br><br>-- <br>Senthil<br><br></b></div></div></div></blockquote></td></tr></tbody></table>

</blockquote></div><br></div>