<div class="gmail_quote">On Tue, Jan 17, 2012 at 3:07 AM, Chris Kavanagh <span dir="ltr">&lt;<a href="mailto:ckava1@msn.com">ckava1@msn.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hey guys, girls, hope everyone is doing well.<br>

<br>

Here&#39;s my question, when using Regular Expressions, the docs say when using parenthesis, it &quot;captures&quot; the data. This has got me confused (doesn&#39;t take much), can someone explain this to me, please??<br>


<br>

Here&#39;s an example to use. It&#39;s kinda long, so, if you&#39;d rather provide your own shorter ex, that&#39;d be fine. Thanks for any help as always.<br></blockquote><div><br></div><div>Here&#39;s a quick example:</div>


<div><br></div><div>import re</div><div><br></div><div>data = &#39;Wayne Werner fake-phone: 501-555-1234, fake-SSN: 123-12-1234&#39;</div><div><div>parsed = re.search(&#39;([\d]{3})-([\d]{3}-[\d]{4})&#39;, data)</div></div>


<div>print(parsed.group())</div><div>print(parsed.groups())</div><div><br></div><div><div>parsed = re.search(&#39;[\d]{3}-[\d]{3}-[\d]{4}&#39;, data)</div></div><div>print(parsed.group())</div><div>print(parsed.groups())</div>


<div><br></div><div>You&#39;ll notice that you can access the individual clusters using the .groups() method. This makes capturing the individual groups pretty easy. Of course, capturing isn&#39;t just for storing the results. You can also use the captured group later on. </div>


<div><br></div><div>Let&#39;s say, for some fictitious reason you want to find every letter that appears as a double in some data. If you were to do this the &quot;brute force&quot; way you&#39;d pretty much have to do something like this:</div>


<div><br></div><div>for i in range(len(data)-1):</div><div>   found = []</div><div>   if data[i] == data[i+1]:</div><div>      if not data[i] in found:</div><div>        found.append(i)</div><div>   print(found)</div><div>


<br></div><div>The regex OTOH looks like this:</div><div><div><br></div><div>In [29]: data = &#39;aaabababbcacacceadbacdb&#39;</div><div><div><br></div><div>In [32]: parsed = re.findall(r&#39;([a-z])\1&#39;, data)</div><div>


<br></div><div>In [33]: parsed</div><div>Out[33]: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]</div></div><div><br></div></div><div>Now, that example was super contrived and also simple. Very few real-world applications will be as simple as that one - usually you have much crazier specifications, like find every person who has blue eyes AND blue hair, but only if they&#39;re left handed. Assuming you had data that looked like this:</div>


<div><br></div><div>Name    Eye Color    Hair Color   Handedness     Favorite type of potato</div><div>Wayne    Blue             Brown            Dexter             Mashed</div><div>Sarah      Blue             Blonde           Sinister            Spam(?)</div>


<div>Kane       Green          White             Dexter             None</div><div>Kermit     Blue             Blue               Sinister            Idaho</div><div><br></div><div><br></div><div>You could parse out the data using captures and backrefrences [1].</div>


<div><br></div><div>HTH,</div><div>Wayne</div><div><br></div><div>[1] In this situation, of course, regex is overkill. It&#39;s easier to just .split() and compare. But if you&#39;re parsing something really nasty like EDI then sometimes a regex is just the best way to go[2].</div>


<div><br></div><div>[2] When people start to understand regexes they&#39;re like the proverbial man who only has a hammer. As Jamie Zawinski said[3], &quot;Some people, when confronted with a problem, think </div><div>“I know, I&#39;ll use regular expressions.”   Now they have two problems.&quot; I&#39;ve come across very few occasions that regexes were actually useful, and it&#39;s usually extracting very specifically formatted data (money, phone numbers, etc.) from copious amounts of text. I&#39;ve not yet had a need to actually process words with it. Especially using Python.</div>


<div><br></div><div>[3]<a href="http://regex.info/blog/2006-09-15/247">http://regex.info/blog/2006-09-15/247</a></div></div>