[Tutor] Newbie Trouble Processing SRT Strings In Text File

Dave Angel davea at davea.name
Tue Nov 4 04:43:28 CET 2014


Please evaluate your email program.  Some of your newline s are
 being lost in the paste into your email.

Matt Varner <matt.l.varner at gmail.com> Wrote in message:
> TL:DR - Skip to "My Script: "subtrans.py"
> 
> <beg>
> 
> Optional Links to (perhaps) Helpful Images:
> 1. The SRT download button:
> http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/tutor1_zps080f20f7.png
> 
> 2. A visual comparison of my current problem (see 'Desire Versus
> Reality' below):
> http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/newline_problem_zps307f8cab.jpg
> 
> ============
> The SRT File
> ============
> 
> The SRT file that you can download for every lesson that has a video
> contains the caption transcript data and is organized according to
> text snippets with some connecting time data.
> 
> ========================================
> Reading the SRT File and Outputting Something Useful
> ========================================
> 
> There may be a hundred different ways to read one of these file types.
> The reliable method I chose was to use a handy editor for the purpose
> called Aegisub.  It will open the SRT file and let me immediately
> export a version of it, without the time data (which I don't
> need...yet).  The result of the export is a plain-text file containing
> each string snippet and a newline character.
> 
> ==========================
> Dealing with the Text File
> ==========================
> 
> One of these text files can be anywhere between 130 to 500 lines or
> longer, depending (obviously) on the length of its attendant video.
> For my purposes, as a springboard for extending my own notes for each
> module, I need to concatenate each string with an acceptable format.
> My desire for this is to interject spaces where I need them and kill
> all the newline characters so that I get just one big lump of properly
> spaced paragraph text.  From here, I can divide up the paragraphs how
> I see fit and I'm golden...
> 
> ==============================
> My first Python script: Issues
> ==============================
> 
> I did my due diligence.  I have read the tutorial at www.python.org.

But did you actually try out and analyze each concept? Difference
 between read and study.

> I went to my local library and have a copy of "Python Programming for
> the Absolute Beginner, 3rd Edition by Michael Dawson."  I started
> collecting what seemed like logical little bits here and there from
> examples found using Uncle Google, but none of the examples anywhere
> were close enough, contextually, to be automatically picked up by my
> dense 'noobiosity.'  For instance, when discussing string
> methods...almost all operations taught to beginners are done on
> strings generated "on the fly," directly inputted into IDLE, but not
> on strings that are contained in an external file.

When it's in the file, it's not a str. Reading it in produces a
 string or a list of strings.  And once created you can not tell
 if they came from a file, a literal,  or some arbitrary
 expression. 

>  There are other
> examples for file operations, but none of them involved doing string
> operations afterward.  After many errors about not being able to
> directly edit strings in a file object, I finally figured out that
> lists are used to read and store strings kept in a file like the one
> I'm sourcing from...so I tried using that.  Then I spent hours
> unsuccessfully trying to call strings using index numbers from the
> list object (I guess I'm dense).  Anyhow, I put together my little
> snippets and have been banging my head against the wall for a couple
> of days now.
> 
> After many frustrating attempts, I have NEARLY produced what I'm
> looking to achieve in my test file.
> 
> ================
> Example - Source
> ================
> 
> My Test file contains just twelve lines of a much larger (but no more
> complex) file that is typical for the SRT subtitle caption file, of
> which I expect to have to process a hundred...or hundreds, depending
> on how many there are in all of the courses I plan to take
> (coincidentally, there is one on Python)
> 
> Line 01: # Exported by Aegisub 3.2.1
> Line 02: [Deep Dive]
> Line 03: [CSS Values & Units Numeric and Textual Data Types with
> Guil Hernandez]
> Line 04: In this video, we'll go over the
> Line 05: common numeric and textual values
> Line 06: that CSS properties can accept.
> Line 07: Let's get started.
> Line 08: So, here we have a simple HTML page
> Line 09: containing a div and a paragraph
> Line 10: element nested inside.
> Line 11: It's linked to a style sheet named style.css
> Line 12: and this is where we'll be creating our new CSS rules.
> 
> ========================
> My Script: "subtrans.py"
> ========================
> 
> # Open the target file, create file object
> f = open('tmp.txt', 'r')
> 
> # Create an output file to write the changed strings to
> o = open('result.txt', 'w')
> 
> # Create a list object that holds all the strings in the file object
> lns = f.readlines()
> 
> # Close the source file you no longer
> # need now that you have
>  your strings
> f.close()
> 
> # Import sys to get at stdout (standard output) - "print" results will
> be written to file
> import sys
> 
> # Associate stdout with the output file
> sys.stdout = o
> 

No, just use o.write directly.  Going through print is a waste of
 yout energy. 


> # Try to print strings to output file using loopback variable (line)
> and the list object
> for line in lns:
>     if ".\n" in line:
>         a = line.replace('.\n','.  ')
>         print(a.strip('\n'))
>     else:
>         b = line.strip('\n')
>         print(b + " ")
> 

Consider joining all the strings in your list with 

"".join (lns)

And just do one o.write of the result. 

> # Close your output file
> o.close()
> 
> =================
> Desire Versus Reality
> =================
> 
> The source file contains a series of strings with newline characters
> directly following whatever the last character in the snippet...with
> absolutely no whitespace.  This is a problem for me if I want to
> concatentate it back together into paragraph text to use as the
> jumping off point for my additional notes.  I've been literally taking
> four hours to type explicitly the dialogue from the videos I've been
> watching...and I know this is going to save me a lot of time and get
> me interacting with the lessons faster and more efficiently.
> However...
> 
> My script succeeds in processing the source file and adding the right
> amount of spaces for each line, the rule being "two spaces added
> following a period, and one space added following a string with no
> period in it (technically, a period/newline pairing (which was the
> only way I could figure out not target the period in 'example.css' or
> 'version 2.3.2'.
> 
> But, even though it successfully kills these additional newlines that
> seem to form in the list-making process

They aren't extra, they're in the file.

> ...I end up with basically a
> non-concatenated file of strings...with the right spaces I need, but
> not one big chunk of text, like I expect using the s.strip('\n')
> functionality.

That's because you're using print  () which defaults to a trailing
 newline. To avoid that there's a keyword parameter to print
 function which can suppress the newline.

Note that you haven't explicitly addressed the file encodings for
 input or output.

>
> 
> 


-- 
DaveA



More information about the Tutor mailing list