splitting a string into 2 new strings

Andrew Dalke adalke at mindspring.com
Wed Jul 2 21:17:13 CEST 2003

> I'm, assuming that these are chemical compounds, so you're not limited to
> one-character symbols.

The problem is underspecified.  Usually 2-character (or 3-character for some
elements with high atomic number, and not assuming the newer IUPAC names
like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
capitalized and the rest in lower case.

> re_pat = re.compile('([A-Z]+)(\d+)')

So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
both lower and upper case trailing letters to be more accepting.  (In some
systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
write "1 copper".  Though I suspect it's not allowed in the OP's problem.)

                    dalke at dalkescientific.com

More information about the Python-list mailing list