Cutting slices
dn
PythonList at DancesWithMice.info
Sun Mar 5 19:28:01 EST 2023
On 06/03/2023 11.59, aapost wrote:
> On 3/5/23 17:43, Stefan Ram wrote:
>> The following behaviour of Python strikes me as being a bit
>> "irregular". A user tries to chop of sections from a string,
>> but does not use "split" because the separator might become
>> more complicated so that a regular expression will be required
>> to find it. But for now, let's use a simple "find":
>> |>>> s = 'alpha.beta.gamma'
>> |>>> s[ 0: s.find( '.', 0 )]
>> |'alpha'
>> |>>> s[ 6: s.find( '.', 6 )]
>> |'beta'
>> |>>> s[ 11: s.find( '.', 11 )]
>> |'gamm'
>> |>>>
>>
>> . The user always inserted the position of the previous find plus
>> one to start the next "find", so he uses "0", "6", and "11".
>> But the "a" is missing from the final "gamma"!
>> And it seems that there is no numerical value at all that
>> one can use for "n" in "string[ 0: n ]" to get the whole
>> string, isn't it?
>>
>>
>
> I would agree with 1st part of the comment.
>
> Just noting that string[11:], string[11:None], as well as string[11:16]
> work ... as well as string[11:324242]... lol..
To expand on the above, answering the OP's second question: the numeric
value is len( s ).
If the repetitive process is required, try a loop like:
>>> start_index = 11 #to cure the issue-raised
>>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'
However, if the objective is to split, then use the function built for
the purpose:
>>> s.split( "." )
['alpha', 'beta', 'gamma']
(yes, the OP says this won't work - but doesn't show why)
If life must be more complicated, but the next separator can be
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by
an earlier split() or ... ie there may be no reason to work strictly
from left to right
- can't really help with this because the information above only shows
multiple "." characters, and not how multiple separators might be
interpreted.
A straight-line approach might be to use maketrans() and translate() to
convert all the separators to a single character, eg white-space, which
can then be split using any of the previously-mentioned methods.
If the problem is sufficiently complicated and the OP is prepared to go
whole-hog, then PSL's tokenize library or various parser libraries may
be worth consideration...
--
Regards,
=dn
More information about the Python-list
mailing list