[Tutor] text processing lines variable content

Wed Feb 6 13:51:58 EST 2019

On 06/02/2019 19:07, Mark Lawrence wrote:

> That's going to a lot of work slicing and dicing the input lists. 
> Perhaps a chunked recipe like this 
> https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked 
> would be better.

The length of the text chunks varies from a single character to a list 
of ~30 3D vectors.

>> I'd like to adapt the order in that 
>> the functions are applied, but how?
> 
> I suspect that you're trying to over complicate things, what's wrong 
> with a simple if/elif chain, a switch based on a dict or similar?
> 

You mean create a list with the order=[a,b,e,d...]
if a in order:
   f_vector_array(a, 3)
elseif b in order:
   f_value(max_radius)

that would run the proper function, but not in the right order?

>>
>> for i, line in enumerate(open("vorodat.vol",'r')):
>>    points = i+1
> 
> enumerate takes a start argument so you shouldn't need the above line.

points is needed later on in the program and I don't know beforehand how 
many lines I have.

>> I thought about putting the functions in a dict and then create a list 
>> with the proper order, but can't get it to work.
> 
> Please show us your code and exactly why it didn't work.
> 

def f_vector_array(outlist, length):
   rv = pop_left_slice(line, length)
   rv = [f'<{i[1:-1]}>' for i in rv]  #i format is: '(1.234,2.345,3.456)'
   rv = ",".join(rv)
   outlist.append(f"  //label: {lbl}\n  array[{length}]"+"{\n "+rv+"\n 
}\n")

functions={
  'a':f_number(num_vertex),
  'b':f_vector_array(rel_vertex,v)
}
where rel_vertex is the list where to move the processed data to and v 
the amount of text to chop of the front of the line. v is not known when 
defining the dictionary. v comes from an other function 
v=f_number(num_vertex) that also should live in the dict.

then loop order=[a,b,e,d...] for each line

> 
> I'm not absolutely sure what you're saying here, but would something 
> like the SortedList from 
> http://www.grantjenks.com/docs/sortedcontainers/ help?

Maybe this explains it better, assume the split input lines:
line1=[a,b,c,d,e,f,...]
line2=[a,b,c,d,e,f,...]
line3=[a,b,c,d,e,f,...]
...
line100000=...

all data on position a should go to list a

a=[a1,a2,a3,...a_n]
b=[b1,b2,b3,...b_n]
c=[c1,c2,c3,...n_n]
etc.

this is what for example the function f_vector_array(a, 3) does.

All these lists have to be written to a single file, each list contains 
100000 items. Instead of keeping it all in memory I could write a1 to a 
temp file A instead of putting it in a list first and b1 to a temp file 
B etc. in the next loop a2 to file A, b2 to file B etc. When all lines 
are processed combine the files A,B,C ... to a single file. Or is there 
a more practical way? Speed is not important.

ingo