[Tutor] Text Proccessing/Command Line Redirection/XML Parsing etc in Python.

Stefan Behnel stefan_ml at behnel.de
Mon Nov 28 09:42:36 CET 2011


Pritesh Ugrankar, 28.11.2011 07:56:
> First of all, my apologies for writing this very long post.

Welcome to the list. :)


> I have been through some related questions about this in Stack Overflow as
> well as googled it and found that Perl and Python are the two languages
> that offer most what I need. As a SAN Administrator, I have a very limited
> time to learn a scripting language so I can concentrate on only one. Most
> of my questions below may make you think that I prefer Perl, but its
> nothing like...Just that I tried learning Perl before for doing stuff I
> want to try, but am thinking now what advantages will I have if I try out
> Python?

There are two anecdotes that people from both camps frequently report. With 
Perl, people write their script, and then, several months later, they come 
back, look at it, don't understand it anymore, and rewrite it. With Python, 
people write their script, forget about it over time, write it again when 
they need it, and when they happen to find the old one and compare it to 
the new one, they find that both look almost identical.

It's all in the syntax.


> All my SAN Management Servers are Windows only.
>
> Following is what I am specifically looking at:
>
> 1) Consider the following output:
> symdev -sid 1234 list devs
> 0D62 Not Visible    ???:? 07C:D13 RAID-5        N/A     (DT) RW  187843
> 0D63 Not Visible    ???:? 08C:C11 RAID-5        N/A     (DT) RW  187843
> 0D64 Not Visible    ???:? 07C:C12 RAID-5        N/A     (DT) RW  62614
> 0D65 Not Visible    ???:? 08C:D14 RAID-5        N/A     (DT) RW  62614
> 0D66 Not Visible    ???:? 07C:D15 RAID-5        N/A     (DT) RW  31307
> 0D67 Not Visible    ???:? 08C:C13 RAID-5        N/A     (DT) RW  31307
> 0D68 Not Visible    ???:? 07C:C14 RAID-5        N/A     (DT) RW  31307
>
>   Whats given above is only a small part of the output. There are many other
> fields that appear but I have left those out for brevity.
>
> The symdev commands generates a list of devices that can be used for SAN
> Allocation.
>
> What I want to do is, on the Windows Machines, do something like a grep or
> awk so that the 10th field, which contains the size of the devices will be
> filtered and I can generate an output like.
>
> Devices of 187 GB = 3
>
> Devices of 62 GB = 2
>
> Devices of 31 GB = 3
>
> Thing is, this output will differ on each storage box. Some may have 10
> devices, some may have 100....
>
> I can use grep or awk for Windows, but looking at a bigger picture here.
>
> what I want to do is do some kind of filtering of the command line output
> so that it will count the type of devices and seggregate them according to
> their size.

That's really easy. You open the file (see the open() function) and it 
returns a file object. You can iterate over it with a for-loop, and it will 
return each line as a string. Use the split() method on the string object 
to split the string by whitespace. That returns a list of separate fields. 
Then, pick the fields you want. In code:

     with open('thefile.txt') as f:
         for line in f:
             fields = line.split()
             print(fields[9])       # the 10th field, for example

If you are not reading the output from a file but from a process you 
started, take a look at the subprocess module in the standard library.

http://docs.python.org/library/subprocess.html

Also take a look at string formatting for output.

http://docs.python.org/tutorial/inputoutput.html

http://docs.python.org/library/stdtypes.html#string-formatting-operations


> Tried Perl, but I found that the syntax was a little difficult to remember.
> This is again my own shortcoming as I am not a trained programmer. I only
> got to work on the script after a gap of many weeks and by that time, I
> forgot what the script was supposed to do so had to start from the
> scratch....May be commenting will help :)

Yep, that's Perl at it's best.


> Which language will generate Binary executable that is smaller in size and
> faster?

You usually don't do that. Instead, you'd install Python on all machines 
where you need it and then just run your code there.

If you really want to go through the hassle to build a self-contained 
executable from each program you write, you will have to bundle the runtime 
for either language with it, so it won't be small.


> 4) I also want to try out playing with XML output....The storage commands I
> use allow me the output to be directed to an XML Format....Is Python better
> suited at this ?

Absolutely. Python has ElementTree. You'll just love working with it.

http://docs.python.org/library/xml.etree.elementtree.html

A quick tutorial is here:

http://effbot.org/zone/element-index.htm


> Few more questions pop up like, Which will give me more freedom and ease to
> maintain ? Which scripting language is better from the employability point
> of view?
>
> I dont want to start with one language and six months or a year down think
> "Heck, this was better in the other one".....because I really can
> concentrate on only one langauge.

There are always certain types of problems that can be solved very 
beautifully in a particular language. That's why there's more than one 
language. You won't miss anything by choosing Python, though.

Stefan



More information about the Tutor mailing list