parsing tab and newline delimited text
MRAB
python at mrabarnett.plus.com
Tue Aug 3 23:05:56 EDT 2010
elsa wrote:
> Hi,
>
> I have a large file of text I need to parse. Individual 'entries' are
> separated by newline characters, while fields within each entry are
> separated by tab characters.
>
> So, an individual entry might have this form (in printed form):
>
> Title date position data
>
> with each field separated by tabs, and a newline at the end of data.
> So, I thought I could simply open a file, read each line in in turn,
> and parse it....
>
> f=open('MyFile')
> line=f.readline()
> parts=line.split('\t')
>
> etc...
>
> However, 'data' is a fairly random string of characters. Because the
> files I'm processing are large, there is a good chance that in every
> file, there is a data field that might look like this:
>
> 899998dlKKlS\lk3#kdf\nllllKK99
>
> or like this:
>
> LLLSDKJJJdkkf334$\ttttks)))K99
>
> so, you see the random strings '\n' and '\t' are stopping me from
> being able to parse my file correctly. Any
> suggestions on how to overcome this problem would be greatly
> appreciated.
>
When you say random strings '\n', etc, are they the backslash character
\ followed by the letter n? If so, then you don't have a problem. They
are \ followed by n.
If, on the other hand, by '\n' you mean the newline character, then,
well, that's a newline character, and there's (probably) nothing you can
do about it.
More information about the Python-list
mailing list