Code improvement question
Mike Dewhirst
miked at dewhirst.com.au
Tue Nov 14 22:41:20 EST 2023
On 15/11/2023 10:25 am, MRAB via Python-list wrote:
> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>> I'd like to improve the code below, which works. It feels clunky to me.
>>
>> I need to clean up user-uploaded files the size of which I don't know in
>> advance.
>>
>> After cleaning they might be as big as 1Mb but that would be super rare.
>> Perhaps only for testing.
>>
>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>> xxxxxxx-xx-x eg., 1012300-77-4
>>
>> def remove_alpha(txt):
>>
>> """ r'[^0-9\- ]':
>>
>> [^...]: Match any character that is not in the specified set.
>>
>> 0-9: Match any digit.
>>
>> \: Escape character.
>>
>> -: Match a hyphen.
>>
>> Space: Match a space.
>>
>> """
>>
>> cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>
>> bits = cleaned_txt.split()
>>
>> pieces = []
>>
>> for bit in bits:
>>
>> # minimum size of a CAS number is 7 so drop smaller clumps
>> of digits
>>
>> pieces.append(bit if len(bit) > 6 else "")
>>
>> return " ".join(pieces)
>>
>>
>> Many thanks for any hints
>>
> Why don't you use re.findall?
>
> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
I think I can see what you did there but it won't make sense to me - or
whoever looks at the code - in future.
That answers your specific question. However, I am in awe of people who
can just "do" regular expressions and I thank you very much for what
would have been a monumental effort had I tried it.
That little re.sub() came from ChatGPT and I can understand it without
too much effort because it came documented
I suppose ChatGPT is the answer to this thread. Or everything. Or will be.
Thanks
Mike
More information about the Python-list
mailing list