[Tutor] Regular expression - I

Santosh Kumar rhce.san at gmail.com
Wed Feb 19 04:36:43 CET 2014


Thank you all. I got it. :)
I need to read more between lines .


On Wed, Feb 19, 2014 at 4:25 AM, spir <denis.spir at gmail.com> wrote:

> On 02/18/2014 08:39 PM, Zachary Ware wrote:
>
>> Hi Santosh,
>>
>> On Tue, Feb 18, 2014 at 9:52 AM, Santosh Kumar <rhce.san at gmail.com>
>> wrote:
>>
>>>
>>> Hi All,
>>>
>>> If you notice the below example, case I is working as expected.
>>>
>>> Case I:
>>> In [41]: string = "<H*>test<H*>"
>>>
>>> In [42]: re.match('<H\*>',string).group()
>>> Out[42]: '<H*>'
>>>
>>> But why is the raw string 'r' not working as expected ?
>>>
>>> Case II:
>>>
>>> In [43]: re.match(r'<H*>',string).group()
>>> ------------------------------------------------------------
>>> ---------------
>>> AttributeError                            Traceback (most recent call
>>> last)
>>> <ipython-input-43-d66b47f01f1c> in <module>()
>>> ----> 1 re.match(r'<H*>',string).group()
>>>
>>> AttributeError: 'NoneType' object has no attribute 'group'
>>>
>>> In [44]: re.match(r'<H*>',string)
>>>
>>
>> It is working as expected, but you're not expecting the right thing
>> ;).  Raw strings don't escape anything, they just prevent backslash
>> escapes from expanding.  Case I works because "\*" is not a special
>> character to Python (like "\n" or "\t"), so it leaves the backslash in
>> place:
>>
>>     >>> '<H\*>'
>>     '<H\*>'
>>
>> The equivalent raw string is exactly the same in this case:
>>
>>     >>> r'<H\*>'
>>     '<H\*>'
>>
>> The raw string you provided doesn't have the backslash, and Python
>> will not add backslashes for you:
>>
>>     >>> r'<H*>'
>>     '<H*>'
>>
>> The purpose of raw strings is to prevent Python from recognizing
>> backslash escapes.  For example:
>>
>>     >>> path = 'C:\temp\new\dir' # Windows paths are notorious...
>>     >>> path   # it looks mostly ok... [1]
>>     'C:\temp\new\\dir'
>>     >>> print(path)  # until you try to use it
>>     C:      emp
>>     ew\dir
>>     >>> path = r'C:\temp\new\dir'  # now try a raw string
>>     >>> path   # Now it looks like it's stuffed full of backslashes [2]
>>     'C:\\temp\\new\\dir'
>>     >>> print(path)  # but it works properly!
>>     C:\temp\new\dir
>>
>> [1] Count the backslashes in the repr of 'path'.  Notice that there is
>> only one before the 't' and the 'n', but two before the 'd'.  "\d" is
>> not a special character, so Python didn't do anything to it.  There
>> are two backslashes in the repr of "\d", because that's the only way
>> to distinguish a real backslash; the "\t" and "\n" are actually the
>> TAB and LINE FEED characters, as seen when printing 'path'.
>>
>> [2] Because they are all real backslashes now, so they have to be
>> shown escaped ("\\") in the repr.
>>
>> In your regex, since you're looking for, literally, "<H*>", you'll
>> need to backslash escape the "*" since it is a special character *in
>> regular expressions*.  To avoid having to keep track of what's special
>> to Python as well as regular expressions, you'll need to make sure the
>> backslash itself is escaped, to make sure the regex sees "\*", and the
>> easiest way to do that is a raw string:
>>
>>     >>> re.match(r'<H\*>', string).group()
>>     '<H*>'
>>
>> I hope this makes some amount of sense; I've had to write it up
>> piecemeal and will never get it posted at all if I don't go ahead and
>> post :).  If you still have questions, I'm happy to try again.  You
>> may also want to have a look at the Regex HowTo in the Python docs:
>> http://docs.python.org/3/howto/regex.html
>>
>
> In addition to all this:
> * You may confuse raw strings with "regex escaping" (a tool func that
> escapes special regex characters for you).
> * For simplicity, always use raw strings for regex formats (as in your
> second example); this does not prevent you to escape special characters,
> but you only have to do it once!
>
>
> d
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
D. Santosh Kumar
RHCE | SCSA
+91-9703206361


Every task has a unpleasant side .. But you must focus on the end result
you are producing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140219/3b0814bf/attachment-0001.html>


More information about the Tutor mailing list