Simple Text Processing Help
John Machin
sjmachin at lexicon.net
Sun Oct 14 17:17:12 EDT 2007
On Oct 14, 11:48 pm, patrick.wa... at gmail.com wrote:
> Hi all,
>
> I started Python just a little while ago and I am stuck on something
> that is really simple, but I just can't figure out.
>
> Essentially I need to take a text document with some chemical
> information in Czech and organize it into another text file. The
> information is always EINECS number, CAS, chemical name, and formula
> in tables. I need to organize them into lines with | in between. So
> it goes from:
>
> 200-763-1 71-73-8
> nátrium-tiopentál C11H18N2O2S.Na to:
>
> 200-763-1|71-73-8|nátrium-tiopentál|C11H18N2O2S.Na
>
> but if I have a chemical like: kyselina močová
>
> I get:
> 200-720-7|69-93-2|kyselina|močová
> |C5H4N4O3|200-763-1|71-73-8|nátrium-tiopentál
>
> and then it is all off.
>
> How can I get Python to realize that a chemical name may have a space
> in it?
>
Your input file could be in one of THREE formats:
(1) fields are separated by TAB characters (represented in Python by
the escape sequence '\t', and equivalent to '\x09')
(2) fields are fixed width and padded with spaces
(3) fields are separated by a random number of whitespace characters
(and can contain spaces).
What makes you sure that you have format 3? You might like to try
something like
lines = open('your_file.txt').readlines()[:4]
print lines
print map(len, lines)
This will print a *precise* representation of what is in the first
four lines, plus their lengths. Please show us the output.
More information about the Python-list
mailing list