string processing - some problems whenever I have to parse a more complex string
tjreedy at udel.edu
Wed Oct 22 00:03:41 CEST 2014
On 10/21/2014 10:32 AM, CWr wrote:
> Hello together,
> currently I have to parse a string in an atomic way. Normally - in this case too - I have a counter variable to keep the current position inside the string. So far, I think this is the most flexible solution to do some lookaround's inside the string if necessary. Subroutines will be feed by the underlying data and the current position. A subroutine returns a tuple of the new position and the result. But I would like process subroutines with the same flexibillity (slicing/lookaround) but without returning the new position every again.
> Is there any implementation like C++ StringPiece class?
I am going to guess that this is a string view class that encapsulates a
piece of an underlying class. Otherwise there is no point.
A view class depends on a primary, independently accessible class for
its data. There are two main categories. A subview gives the primary
class interface to a part of the primary data. Numpy had array subviews
an I presume you are talking about string subviews here. An altview
class gives an alternative interface to the primary data. Dict views
If the primary object is mutable, one reason to use a view instead of a
copy is to keep the data for two objects synchronized. This does not
apply to strings.
Another reason is to save memory space. The downside is that the
primary data cannot be erased until *both* objects are deleted.
Moreover, if the primary data is small or the subview data is a small
fraction of the primary data, the memory saving is small. So small
subviews that persist after the primary object may end up costing more
memory than they save. This is one reason Python does not have string
subview. The numpy array view use case is large subarrays of large
arrays that have to persist through a calculation anyway.
Another reason Python lack sequence subviews is that the extra data
needed for a contiguous slice are only the start and stop indexes.
These can easily be manipulated directly without wrapping them in a
class. And anyone who does want a method interface can easily create a
class to their liking.
To answer your question, I tried
and did not find anything. 'view' matches the generic use of 'view', as
well as 'views', 'viewed', 'viewer', 'review', and 'preview'.
The third answer here
has a StringView class that could be modifed to work on 3.x by removing
the unneeded use of buffer.
> Or something like the following behavior:
>>>> s = StringSlice('abcdef')
s = 'abcdef'
a, b = 0, len(s) # s start, s end
> StringSlice('abcdef') at xxx
>>>> s.chop(1) # chop the first item
>>>> s # 'b' is the new first item
a += 1
>>>> s.chop(-1) # chop the last item
b -= 1
>>>> while s != 'e':
while s[a] != 'e':
a += 1
> Subroutines could chop the number of processed items internally if no error occours.
> Another possibillty will be to chop the current item manually. But I don't know how efficient this is in case of large strings.
>>>> while string:
> c = string
> # process it ...
> string = string[1:]
This is extremely bad as it replaces the O(n) processing (below) with
O(n*n) processing. In general, the right way to linearly process any
for item in iterable:
for index, item in enumerate(iterable):
or even, for sequences, (but not when the first option above suffices)
for index in range(len(sequence)):
Terry Jan Reedy
More information about the Python-list