Ask for help on using re

Fri Aug 6 22:23:55 EDT 2021

jak 在 2021年8月6日 星期五下午4:10:05 [UTC+8] 的信中寫道：
> Il 05/08/2021 11:40, Jach Feng ha scritto: 
> > I want to distinguish between numbers with/without a dot attached: 
> > 
> >>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n' 
> >>>> re.compile(r'ch \d{1,}[.]').findall(text) 
> > ['ch 1.', 'ch 23.'] 
> >>>> re.compile(r'ch \d{1,}[^.]').findall(text) 
> > ['ch 23', 'ch 4 ', 'ch 56 '] 
> > 
> > I can guess why the 'ch 23' appears in the second list. But how to get rid of it? 
> > 
> > --Jach 
> >
> import re
> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M) 
> 
> res = r.findall(t) 
> 
> dot = [x[1] for x in res if x[1] != ''] 
> udot = [x[0] for x in res if x[0] != ''] 
> 
> print(f"dot: {dot}") 
> print(f"undot: {udot}") 
> 
> out: 
> 
> dot: ['ch 4', 'ch 56'] 
> undot: ['ch 1.', 'ch 23.']
The result can be influenced by the order of re patterns?

>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)
[('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]

>>> re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)
[('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]

--Jach