Python editing .txt file
MRAB
python at mrabarnett.plus.com
Tue Jun 15 22:06:58 EDT 2010
187braintrust at berkeley.edu wrote:
> I am trying to write a program in Python that will edit .txt log files
> that contain regression output from R. Any thoughts or suggestions
> would be greatly appreciated.
>
> To get an idea of what I am trying to do, note that I include fixed
> effects in the R regressions, resulting in hundreds of extra lines per
> regression which I am not interested in right now. Basically, I want to
> save a shortened version of the .txt files in which the blocks of fixed
> effects coefficients are replaced by a line that says includes fixed
> effects for whatever variable it is.
>
> All the lines that are to be deleted start with the same six characters
> -- 'factor(xyz)' where xyz is the variable name -- so my idea is to have
> Python copy each line to a new file if the first six characters do not
> match 'factor('.
>
> That part I at least know how to approach. However, I am not sure how
> to approach adding the line that says, "includes fixed effects for xyz."
> The problem I am having is how to approach the following:
>
>
> 1. In the resulting file, I will be skipping blocks of lines, say
> anywhere from 10 to 500 or so, and inserting one line -- i.e.,
> whether it inserts the line needs to depend on whether it's the
> first line or one of the remaining 499 lines.
>
> 2. the xyz variable name is different lengths depending on what
> variable it is. For example, one block might be 'state' and another
> block might be 'yr'. Maybe I can use the fact that the var name
> starts after the first '(' and ends at the first ')' in the line? I
> think I can use the re module for this?
>
>
> Any suggestions on any aspect of this, but especially the latter part,
> would be greatly appreciated. Thank you.
>
How's this:
input_file = open(input_path)
output_file = open(output_path, "w")
for line in input_file:
if line.startswith("factor("):
open_paren = line.find("(")
close_paren = line.find(")")
variable = line[open_paren + 1 : close_paren]
output_file.write("*** Factors for %s ***\n" % variable)
prefix = line[ : close_paren + 1]
while line.startswith(prefix):
line = input_file.readline()
output_file.write(line)
input_file.close()
output_file.close()
More information about the Python-list
mailing list