Ask for help on using re
Jach Feng
jfong at ms4.hinet.net
Fri Aug 6 22:23:55 EDT 2021
jak 在 2021年8月6日 星期五下午4:10:05 [UTC+8] 的信中寫道:
> Il 05/08/2021 11:40, Jach Feng ha scritto:
> > I want to distinguish between numbers with/without a dot attached:
> >
> >>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>>> re.compile(r'ch \d{1,}[.]').findall(text)
> > ['ch 1.', 'ch 23.']
> >>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> > ['ch 23', 'ch 4 ', 'ch 56 ']
> >
> > I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
> >
> > --Jach
> >
> import re
> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>
> res = r.findall(t)
>
> dot = [x[1] for x in res if x[1] != '']
> udot = [x[0] for x in res if x[0] != '']
>
> print(f"dot: {dot}")
> print(f"undot: {udot}")
>
> out:
>
> dot: ['ch 4', 'ch 56']
> undot: ['ch 1.', 'ch 23.']
The result can be influenced by the order of re patterns?
>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)
[('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]
>>> re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)
[('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]
--Jach
More information about the Python-list
mailing list