[Tutor] Fwd: Python text file read/compare

Alan Gauld learn2program at gmail.com
Thu Oct 6 17:00:30 EDT 2022


Forwarding to tutor list. Please aleways use Reply-All or reply-List
when responding to tutr posts.

================

> The data is logged in Timestamps which starts at 488 milliseconds.

That still doesn't help much.
If we look at your data file which field/s represent the tie stamps?
All of them or just some? Here is your sample data:


-1.75, 1.08, 10.35, -0.10, -0.01, -0.01, 23.19, *488*
-1.75, 1.12, 10.39, -0.10, -0.01, -0.01, 23.20, *521*

9.65, -1.31, -1.95, -0.11, -0.06, -0.02, 22.05, *15339436*

I see 488 as the last feld in the first line. but the next line
has 521 so is getting bigger not smaller. And the third is much,
much bigger.

But what about the other numbers are they also timestamps of
some kind(deltas on the last field maybe?) or do they represent
something else?

If you only need to extract the last field then it is vdery easy,
just split and use a -1 index:

ts = line.split()[-1)

And if they are always integers rather than actual times comparison
is trivial too, just subytract them.

> some point the number supposed to drop below this number..
> As the data is large,I am trying to find out when that happens
> and what number it is so I can split the file to two for the
> moment so I can work on the two separate file.

It is easier to just keep loading each line into the new
file until you reach one with zero (or less?). That way
you don't need to store your data anywhere except in
the final files.


> I hope this helps you to understand what I am trying to
> do and kindly assist me with clear explanation how I can proceed.

Only slightly, it depends on my assumptions above.

Can you use a file search tool(or text editor?) to find where
the timestamps reach zero? It would be most useful to see the
lines just before-just after that point.


Alan G.




On Tue., Oct. 4, 2022, 5:42 a.m. Alan Gauld via Tutor, <tutor at python.org
<mailto:tutor at python.org>> wrote:

    On 03/10/2022 21:14, samira khoda wrote:

    > Thank you very much for your feedback.I just started working on
    the file
    > again. Unfortunately I am still not getting anywhere with the
    modifications
    > I made to the codes.  I don't know when I can create the new file
    and write
    > to it and close it.

    Can you write a program to simply split your input file(s) after every N
    lines? ie something like:

    open the input
    open output
    linecount = 0
    for line in input
        linecount += 1
        if linecount > N
           close output
           open new output
           linecount = 1
        write line to output

    If you can get that to work then you can replace the line

    if linecount > N

    with

    if zerotimestamp(line)

    And write a function to determine if the timestamp
    indicates a new file is needed.

    But get the basic structure in place first, don't
    try to solve both problems at once.

    > Basically as I mentioned before I need to find where
    > the timestamps jump down to zero or the lowest number then start
    time so I
    > can split the file and make a new file.***For your information the
    > timestamps are in milliseconds***

    I think you will need to explain how that works. Your sample
    data below is not clear. You need to show us a sample where the
    timestamp is not zero then one that is. And explain which field
    defines that.


    >
    > kyk=( 'hfx-test-.txt'  , 'r')

    Not that kyk is just a tuple of 2 strings.


    >
    > line_numbers=[]
    > i=0
    > current_time=""
    > count=1
    >
    > for line in kyk:

    So this will return the 2 strings 'hfx-test-.txt' and 'r'

    >     if i < 488:
    >         i=0
    >         line_numbers.append(current_time)
    >         current_time=""
    >     else:
    >         current_time+=line
    >         i+=1

    This doesn't make much sense to me. Where does the magic
    number 488 fit in? First time through the loop i will
    always be zero so you will always start the list with
    an empty string in line_numbers. (BTW line_numbers is a
    very misleading name since it appears to be a list of times?)

    And because i is always 0 you never go into the else part
    so it never gets increased. So the for loop executes twice
    storing 2 empty strings in the line_numbers list.

    Even if you opened the file and iterated over it you would
    still just get a blank string for each line in the file.

    > **************this is where It does not write to the file*****
    >
    > *opf=open("kyk_txt_new_file.txt","w")*
    >
    > *    if current_time <start_time:*
    >
    > *        splitlines(True).add(line_number);*
    >
    > *    opf.write("this is the timestamps after the line splits.")*
    >
    > *    kyk.close()*


    And this makes even less sense.
    where is start_time coming from?
    what is splitlines()? Where is it defined?
    And the only thing you write to the file is the
    message, you never write any data?

    And you close kyk but not opf?

    > *Here are the first and last few lines of my text file data.  *
    >
    > time stamps
    > -1.75, 1.08, 10.35, -0.10, -0.01, -0.01, 23.19, *488*
    > -1.75, 1.12, 10.39, -0.10, -0.01, -0.01, 23.20, *521*
    >
    > 9.65, -1.31, -1.95, -0.11, -0.06, -0.02, 22.05, *15339436*
    > 9.56, -1.32, -1.97, -0.10, -0.00, -0.01, 22.05, *15339495*

    Does a zero timestamp appear in any of those lines?
    This is just a set of numbers. Are they all timestamps?
    If so, why is the last number much bigger than the rest?
    And is the 488 at the end of line 1 significant in being
    the same as the magic number in your code?


    >
    > *I was also provided with the * pseudocode * below which I am
    trying to
    > follow if that helps to guide me along the way.*

    The pseudo code makes some sense - although a for loop
    would be simpler than the while. And it does not appear
    to be doing what you describe as the required task.

    But your code is not even close to what it does.

    > -> load sourceFile (a copy of the raw data file)

    I think that should say open sourcefile, not load...

    > line_number = 0
    > start_time = 0
    > split_numbers = []
    > not_done = true
    >
    > while(not_done):
    >        ->read line from sourcefile
    >        ->split line on ','
    >        ->convert last item on line to unsigned long and store in
    > current_time
    >
    >        if current_time < start_time:
    >               split_numbers.add(line_number)
    >               start_time = current_time
    >        if end_of_file:
    >               not_done = false;
    >
    > for s in split_numbers:
    >        ->create newfile
    >        for i = 0, i < s, i++:
    >               ->read line from sourcefile
    >               ->write line to newfile
    >
    >        ->close newfile
    >
    > ->Close sourcefile

    --     Alan G
    Author of the Learn to Program web site
    http://www.alan-g.me.uk/ <http://www.alan-g.me.uk/>
    http://www.amazon.com/author/alan_gauld
    <http://www.amazon.com/author/alan_gauld>
    Follow my photo-blog on Flickr at:
    http://www.flickr.com/photos/alangauldphotos
    <http://www.flickr.com/photos/alangauldphotos>




    _______________________________________________
    Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org>
    To unsubscribe or change subscription options:
    https://mail.python.org/mailman/listinfo/tutor
    <https://mail.python.org/mailman/listinfo/tutor>



More information about the Tutor mailing list