How to escape strings for re.finditer?
Jen Kris
jenkris at tutanota.com
Mon Feb 27 19:39:57 EST 2023
string.count() only tells me there are N instances of the string; it does not say where they begin and end, as does re.finditer.
Feb 27, 2023, 16:20 by bobmellowood at gmail.com:
> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> python-list at python.org> > wrote:
>
>>
>> I went to the re module because the specified string may appear more than once in the string (in the code I'm writing). For example:
>>
>> a = "X - abc_degree + 1 + qq + abc_degree + 1"
>> b = "abc_degree + 1"
>> q = a.find(b)
>>
>> print(q)
>> 4
>>
>> So it correctly finds the start of the first instance, but not the second one. The re code finds both instances. If I knew that the substring occurred only once then the str.find would be best.
>>
>> I changed my re code after MRAB's comment, it now works.
>>
>> Thanks much.
>>
>> Jen
>>
>>
>> Feb 27, 2023, 15:56 by >> cs at cskk.id.au>> :
>>
>> > On 28Feb2023 00:11, Jen Kris <>> jenkris at tutanota.com>> > wrote:
>> >
>> >> When matching a string against a longer string, where both strings have spaces in them, we need to escape the spaces.
>> >>
>> >> This works (no spaces):
>> >>
>> >> import re
>> >> example = 'abcdefabcdefabcdefg'
>> >> find_string = "abc"
>> >> for match in re.finditer(find_string, example):
>> >> print(match.start(), match.end())
>> >>
>> >> That gives me the start and end character positions, which is what I want.
>> >>
>> >> However, this does not work:
>> >>
>> >> import re
>> >> example = re.escape('X - cty_degrees + 1 + qq')
>> >> find_string = re.escape('cty_degrees + 1')
>> >> for match in re.finditer(find_string, example):
>> >> print(match.start(), match.end())
>> >>
>> >> I’ve tried several other attempts based on my reseearch, but still no match.
>> >>
>> >
>> > You need to print those strings out. You're escaping the _example_ string, which would make it:
>> >
>> > X - cty_degrees \+ 1 \+ qq
>> >
>> > because `+` is a special character in regexps and so `re.escape` escapes it. But you don't want to mangle the string you're searching! After all, the text above does not contain the string `cty_degrees + 1`.
>> >
>> > My secondary question is: if you're escaping the thing you're searching _for_, then you're effectively searching for a _fixed_ string, not a pattern/regexp. So why on earth are you using regexps to do your searching?
>> >
>> > The `str` type has a `find(substring)` function. Just use that! It'll be faster and the code simpler!
>> >
>> > Cheers,
>> > Cameron Simpson <>> cs at cskk.id.au>> >
>> > --
>> > >> https://mail.python.org/mailman/listinfo/python-list
>> >
>>
>> --
>> >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> --
> **** Listen to my CD at > http://www.mellowood.ca/music/cedars> ****
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > bob at mellowood.ca
> WWW: > http://www.mellowood.ca
>
More information about the Python-list
mailing list