[docs] copy&waste problem

Senthil Kumaran senthil at uthcode.com
Mon Mar 12 04:39:05 CET 2012


The tracker issue I created for this discussion is here:
http://bugs.python.org/issue14258

I have made the changes too. Thanks for spotting the mistake.

-- 
Senthil

On Sun, Mar 11, 2012 at 08:09:11PM -0700, Senthil Kumaran wrote:
> Hello Hauke,
> 
> I guess, you are mistaken with the meaning of re.LOCALE flag for space.  It is
> not intersection but Union of the locale's space characters with the ascii
> space characters.
> 
> For \S, with `LOCALE flag set, it will match [^ \t\n\r\f\v] plus any
> non-whitespace characters defined by that locale. 
> 
> 
> 
> +   In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,
> +   these character classes will behave as if the union was given.
> 
> Where did you find this logic? I see that, locale flag is matched first and
> then unicode.
> 
> In Modules\_sre.c    
> 
> if (pattern->flags & SRE_FLAG_LOCALE)
>         state->lower = sre_lower_locale;
>     else if (pattern->flags & SRE_FLAG_UNICODE)
> 
> 
> I am going ahead with the changes as I suggested previously and also opening a
> bug report. Further discussions and changes can be tracked there. Yeah,
> sometimes doc changes go for discussions and iterations too. :( 
> 
> -- 
> Senthil
> 
> 
> 
> On Fri, Mar 9, 2012 at 6:12 AM, Hauke Rehr <homo_laber at yahoo.de> wrote:
> 
>     Hello again,
> 
>     I can’t agree with your rewrite either, sorry - my suggestion based on
>     yours:
> 
> 
>     +   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified,
>     +   matches any non-whitespace character; this is equivalent to the set
>     ``[^
>     +   \t\n\r\f\v]`` With :const:`LOCALE`, it will match those elements of the
>     above set
>     +   not defined as space in the current locale. If :const:`UNICODE` is set,
>     those elements
>     +   of ``[^ \t\n\r\f\v]`` not marked as space in the Unicode character
>     properties database
>     +   will be matched.
> 
>     If I don’t get the meaning of \S (that is: anything but \s) wrong, this
>     should be correct.
>     The same applies to \W:
> 
>     +   this will match anything other than ``[0-9_]`` not classified as
>     +   alphanumeric in the Unicode character properties database.
> 
> 
>     For the additional sentence, I’d prefer:
> 
>     +   In case both ``re.LOCALE`` and ``re.UNICODE`` are specified alongside,
>     +   these character classes will behave as if the union was given.
> 
>     for that’s the logic behind.
> 
>     Hauke
> 
>     --- Senthil Kumaran <senthil at uthcode.com> schrieb am Fr, 9.3.2012:
> 
> 
>         Von: Senthil Kumaran <senthil at uthcode.com>
>         Betreff: Re: [docs] copy&waste problem
>         An: "Hauke Rehr" <homo_laber at yahoo.de>
>         CC: docs at python.org
>         Datum: Freitag, 9. März, 2012 09:18 Uhr
> 
>         Hello Hauke,
> 
>         Yeah, it was pretty confusing. Thanks for catching this. How does this
>         change sound?
> 
>         -   When the :const:`LOCALE` and :const:`UNICODE` flags are not
>         specified, matches
>         -   any non-whitespace character; this is equivalent to the set ``[^
>         \t\n\r\f\v]``
>         -   With :const:`LOCALE`, it will match any character not in this set,
>         and not
>         -   defined as space in the current locale. If :const:`UNICODE` is
>         set, this will
>         -   match anything other than ``[ \t\n\r\f\v]`` and characters marked
>         as space in
>         -   the Unicode character properties database.
>         +   When the :const:`LOCALE` and :const:`UNICODE` flags are not
>         specified,
>         +   matches any non-whitespace character; this is equivalent to the set
>         ``[^
>         +   \t\n\r\f\v]`` With :const:`LOCALE`, it will match the above set and
>         any
>         +   non-space character in the current locale. If :const:`UNICODE` is
>         set, the
>         +   above set ``[^ \t\n\r\f\v]`` and characters not marked as space in
>         the
>         +   Unicode character properties database.
> 
>         ``\w``
>             When the :const:`LOCALE` and :const:`UNICODE` flags are not
>         specified, matches
>         @@ -381,8 +381,8 @@
>             any non-alphanumeric character; this is equivalent to the set
>         ``[^a-zA-Z0-9_]``.
>             With :const:`LOCALE`, it will match any character not in the set
>         ``[0-9_]``, and
>             not defined as alphanumeric for the current locale. If
>         :const:`UNICODE` is set,
>         -   this will match anything other than ``[0-9_]`` and characters
>         marked as
>         -   alphanumeric in the Unicode character properties database.
>         +   this will match anything other than ``[0-9_]`` plus characters
>         classied as
>         +   not alphanumeric in the Unicode character properties database.
> 
> 
>         Hope the rewrite is less confusing.
> 
>         We can also include this sentence somewhere.
> 
>         Both re.LOCALE and re.UNICODE is specified together,in that case
>         re.LOCALE would be matched first and the re.UNICODE.
> 
> 
>         --
>         Senthil
> 
> 
> 
> 


More information about the docs mailing list