Code improvement question
Mike Dewhirst
miked at dewhirst.com.au
Tue Nov 14 18:14:10 EST 2023
I'd like to improve the code below, which works. It feels clunky to me.
I need to clean up user-uploaded files the size of which I don't know in
advance.
After cleaning they might be as big as 1Mb but that would be super rare.
Perhaps only for testing.
I'm extracting CAS numbers and here is the pattern xx-xx-x up to
xxxxxxx-xx-x eg., 1012300-77-4
def remove_alpha(txt):
""" r'[^0-9\- ]':
[^...]: Match any character that is not in the specified set.
0-9: Match any digit.
\: Escape character.
-: Match a hyphen.
Space: Match a space.
"""
cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
bits = cleaned_txt.split()
pieces = []
for bit in bits:
# minimum size of a CAS number is 7 so drop smaller clumps of digits
pieces.append(bit if len(bit) > 6 else "")
return " ".join(pieces)
Many thanks for any hints
Cheers
Mike
More information about the Python-list
mailing list