[Tutor] text processing lines variable content

Mark Lawrence breamoreboy at gmail.com
Wed Feb 6 15:45:27 EST 2019


On 06/02/2019 18:51, ingo janssen wrote:
> 
> On 06/02/2019 19:07, Mark Lawrence wrote:
> 
>> That's going to a lot of work slicing and dicing the input lists. 
>> Perhaps a chunked recipe like this 
>> https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked 
>> would be better.
> 
> The length of the text chunks varies from a single character to a list 
> of ~30 3D vectors.

So what, you still don't need to chop the front from the list, just 
process the data.

> 
>>> I'd like to adapt the order in that the functions are applied, but how?
>>
>> I suspect that you're trying to over complicate things, what's wrong 
>> with a simple if/elif chain, a switch based on a dict or similar?
>>
> 
> You mean create a list with the order=[a,b,e,d...]
> if a in order:
>    f_vector_array(a, 3)
> elseif b in order:
>    f_value(max_radius)
> 
> that would run the proper function, but not in the right order?

Again I've no idea what you're saying here.

> 
>>>
>>> for i, line in enumerate(open("vorodat.vol",'r')):
>>>    points = i+1
>>
>> enumerate takes a start argument so you shouldn't need the above line.
> 
> points is needed later on in the program and I don't know beforehand how 
> many lines I have.

Now you tell us :-(

> 
>>> I thought about putting the functions in a dict and then create a 
>>> list with the proper order, but can't get it to work.
>>
>> Please show us your code and exactly why it didn't work.
>>
> 
> def f_vector_array(outlist, length):
>    rv = pop_left_slice(line, length)
>    rv = [f'<{i[1:-1]}>' for i in rv]  #i format is: '(1.234,2.345,3.456)'
>    rv = ",".join(rv)
>    outlist.append(f"  //label: {lbl}\n  array[{length}]"+"{\n "+rv+"\n 
> }\n")
> 
> functions={
>   'a':f_number(num_vertex),
>   'b':f_vector_array(rel_vertex,v)
> }
> where rel_vertex is the list where to move the processed data to and v 
> the amount of text to chop of the front of the line. v is not known when 
> defining the dictionary. v comes from an other function 
> v=f_number(num_vertex) that also should live in the dict.

You don't need to specify the parameters in the dict, just give the 
function name.
> 
> then loop order=[a,b,e,d...] for each line
> 

What has a loop order got to do with using a dict?

>>
>> I'm not absolutely sure what you're saying here, but would something 
>> like the SortedList from 
>> http://www.grantjenks.com/docs/sortedcontainers/ help?
> 
> Maybe this explains it better, assume the split input lines:
> line1=[a,b,c,d,e,f,...]
> line2=[a,b,c,d,e,f,...]
> line3=[a,b,c,d,e,f,...]
> ...
> line100000=...
> 
> all data on position a should go to list a
> 
> a=[a1,a2,a3,...a_n]
> b=[b1,b2,b3,...b_n]
> c=[c1,c2,c3,...n_n]
> etc.
> 
> this is what for example the function f_vector_array(a, 3) does.

Why bother, just have a list of lists and index on the position, or are 
we talking at cross purposes?

> 
> All these lists have to be written to a single file, each list contains 
> 100000 items. Instead of keeping it all in memory I could write a1 to a 
> temp file A instead of putting it in a list first and b1 to a temp file 
> B etc. in the next loop a2 to file A, b2 to file B etc. When all lines 
> are processed combine the files A,B,C ... to a single file. Or is there 
> a more practical way? Speed is not important.

What is your definition of "combine the files A,B,C ... to a single file"?

> 
> ingo

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



More information about the Tutor mailing list