[Tutor] string delimiters
Alan Gauld
alan.gauld at btinternet.com
Wed Jun 3 22:29:48 CEST 2015
On 03/06/15 21:13, richard kappler wrote:
> I was trying to keep it simple, you'd think by now I'd know better. My
> fault and my apology.
>
> It's definitely not all dates and times, the data and character types
> vary. This is the output from my log parser script which you helped on
> the other day. there are essentially two types of line:
>
> Tue Jun 2 10:22:42 2015<usertag1
> name="SE">SE201506012200310389PS01CT1407166S0011.40009.00007.6IN
> 000000000018.1LB000258]C10259612019466862270088094]L0223PDF</usertag1>
> Tue Jun 2 10:22:43 2015<usertag1
> name="SE">SE0389icdim01307755C0038.20033.20012.0IN1000000000
> 0032]C10259612804038813568089577</usertag1>
>
> I have to do several things:
> the first type can be of variable length, everything after the ] is an
> identifier that I have to separate, some lines have one, some have
> more than one, variable length, always delimited by a ]
So why not just split by ']'?
identifiers = line.split(']')[1:] # lose the first one
> and finally, I have to break these apart and put a descriptor with each.
>
Nope. I don't understand that.
Break what apart? and how do you 'put a descriptor with each'?
What is a descriptor for that matter?!
> While I was waiting for a response to this, I put together a script to
> start figuring things out (what could possibly go wrong?!?!?! :-) )
>
> and I can't post the exact script but the following is the guts of it:
>
> f1 = open('unformatted.log', 'r')
> f2 = open('formatted.log', 'a')
>
> for line in f1:
> for tag in ("icdm"):
> if tag in line:
> newline = 'log datestamp:' + line[0:24] # + and so on to
> format the lines with icdm in them including adding 14 x's for the
> missing timestamp
> f2.write(newline) #write the formatted output to the new log
> else:
> newline = 'log datestamp:' + line[0:24] # + and so on to
> format the non-icdm lines
> f2.write(newline)
>
So this checks each line for the 4 tags: i,c,d and m.
if the tag is in the line it does the if clause, including writing to f2
If the tag is not in the line it does the else which also writes to f2.
So you always write 4 lines to f2. Is that correct?
> The problems are:
> 1. for some reason this iterates over the 24 line file 5 times, and it
> writes the 14 x's to every file, so my non-icdm code (the else:) isn't
> getting executed. I'm missing something basic and obvious but have no
> idea what.
That's not what I'd expect. I'd expect it to write 4 lines out for every
input line.
What gets written depending on however many of the 4 tags are found in
the line.
Since we only have partial code we don't know what the formatted lines
look like.
> 2. I still don't know how to handle the differences in the end of the
> non-icdm files (potentially more than identifier ] delimited as
> described above).
I'm not clear on this yet either.
I suspect that once you clarify what you are trying to do you will know
how to do it...
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list