regex pattern to extract repeating groups
MRAB
python at mrabarnett.plus.com
Mon Aug 27 20:53:09 EDT 2018
On 2018-08-28 00:58, Malcolm wrote:
> On 28/08/2018 7:09 AM, John Pote wrote:
>> On 26/08/2018 00:55, Malcolm wrote:
>>> I am trying to understand why regex is not extracting all of the
>>> characters between two delimiters.
>>>
>>> The complete string is the xmp IFD data extracted from a .CR2 image
>>> file.
>>>
>>> I do have a work around, but it's messy and possibly not future proof.
>> Do you mean future proof your workaround or Cannon's .CR2 raw image
>> files might change? I guess .CR2's won't change but Cannon have
>> brought out the new .CR3 raw image file for which I needed to upgrade
>> my photo editing suit (at least I didn't but used their tool to
>> convert .CR3s from the camera to the digital negative format which
>> many photo editors can handle.) Can send you sample .CR3 if you want
>> to compare.
>>
>> Regards,
>> John
> John
>
> Thank you.
>
> Some background
> The application is for personal use. Why I'm familiar with python
> generally (and thanks to all who post code and answer questions), this
> is the first time I have used structs to read a binary file, xml parsers
> to parse some of the RFD contents and re.
>
> First
> I have now discovered that when print the return of re.search that the
> matched='truncates the matched characters'. To see/get all found
> characters I need to use the span as indexes to the original string. I'm
> not sure if this is mentioned in the re documentation. But all the
> samples I've seen on the web use only small strings. This was the cause
> of my question.
>
re.search returns a "match object". When you print it, you get what is
basically a summary. If you want the matched portion of the string, use
the match object's .group method:
[snip]
re_pattern = r'( *<dc:.*</dc:)'
x = re.search(re_pattern, data, re.DOTALL)
print(x.group())
More information about the Python-list
mailing list