[Tutor] Text Proccessing/Command Line Redirection/XML Parsing etc in Python.

Pritesh Ugrankar pritesh.ugrankar at gmail.com
Mon Nov 28 08:56:06 CET 2011

First of all, my apologies for writing this very long post.

I have been through some related questions about this in Stack Overflow as
well as googled it and found that Perl and Python are the two languages
that offer most what I need. As a SAN Administrator, I have a very limited
time to learn a scripting language so I can concentrate on only one. Most
of my questions below may make you think that I prefer Perl, but its
nothing like...Just that I tried learning Perl before for doing stuff I
want to try, but am thinking now what advantages will I have if I try out

All my SAN Management Servers are Windows only.

Following is what I am specifically looking at:

1) Consider the following output:
symdev -sid 1234 list devs
0D62 Not Visible    ???:? 07C:D13 RAID-5        N/A     (DT) RW  187843
0D63 Not Visible    ???:? 08C:C11 RAID-5        N/A     (DT) RW  187843
0D64 Not Visible    ???:? 07C:C12 RAID-5        N/A     (DT) RW  62614
0D65 Not Visible    ???:? 08C:D14 RAID-5        N/A     (DT) RW  62614
0D66 Not Visible    ???:? 07C:D15 RAID-5        N/A     (DT) RW  31307
0D67 Not Visible    ???:? 08C:C13 RAID-5        N/A     (DT) RW  31307
0D68 Not Visible    ???:? 07C:C14 RAID-5        N/A     (DT) RW  31307

 Whats given above is only a small part of the output. There are many other
fields that appear but I have left those out for brevity.

The symdev commands generates a list of devices that can be used for SAN

What I want to do is, on the Windows Machines, do something like a grep or
awk so that the 10th field, which contains the size of the devices will be
filtered and I can generate an output like.

Devices of 187 GB = 3

Devices of 62 GB = 2

Devices of 31 GB = 3

Thing is, this output will differ on each storage box. Some may have 10
devices, some may have 100....

I can use grep or awk for Windows, but looking at a bigger picture here.

what I want to do is do some kind of filtering of the command line output
so that it will count the type of devices and seggregate them according to
their size.

Tried Perl, but I found that the syntax was a little difficult to remember.
This is again my own shortcoming as I am not a trained programmer. I only
got to work on the script after a gap of many weeks and by that time, I
forgot what the script was supposed to do so had to start from the
scratch....May be commenting will help :)

I could only get to a point where I was able to store the out put of the
whole line in an array but nothing beyond that because workload kept me
really busy.

When I did that, each element of the array seem to have one line of the
output, like: The following was one element.
0D62 Not Visible    ???:? 07C:D13 RAID-5        N/A     (DT) RW  187843

 The following was the next element.
0D63 Not Visible    ???:? 08C:C11 RAID-5        N/A     (DT) RW  187843

 and so on.....

What I wanted instead was a way to printout and count the last field.....I
guess I will have to use hashes in Perl. Most examples of Hashes I have
seen are pre created....But is there a way to create a Hash on the fly?
Because I dont know how many devices will be a part of that hash....it will
differ on each storage box....Is there something like this available in
Python that will let me filter/add/printout the last field in a way that it
will refer to it as a row and column kind of stuff? Is there a Hash
equivalent in Python?

Note I am giving Perl examples because I started with Perl first....though
personally, I find Python syntax easier to understand...(Again, my
bad....my limitation...not of the language)..

2) Automate storage allocation. Whats given below is only a small part of
what I want to do.... Given is a brief output and explanation.

All storage devices of my storage boxes have hexamdecimal LUN IDs.....

Lets say I have a free available LUN IDs between say 5* to A .....meaning,
the command output looks something like this:
symcfg list -sid 1234 -sa 04B -p 0 -addresses -available
Symmetrix ID: 000184501234
Director Device Name Attr Address
---------------------- ----------------------------- ---- --------------
Ident Symbolic Port Sym Physical VBUS TID LUN
------ -------- ---- ---- ----------------------- ---- --- ---
FA-4B 04B 0 - AVAILABLE 0 0 000 *
0029 /dev/rdsk/c1t0d1s2 0 0 001
0033 /dev/rdsk/c1t0d2s2 0 0 002
003D /dev/rdsk/c1t0d3s2 0 0 003
0046 Not Visible        0 0 004
- AVAILABLE             0 0 005 *
0075 Not Visible        0 0 00A
- AVAILABLE             0 0 00B *

 When there is a "*", from there on, till the next hex number, th LUN IDs
are available. Meaning, from 000* to 1, nothing is available, but from 005*
to 00A I have 006 through 009 available. I want to redirect this output to
an array or a hash or something like that, then filter the last field, and
then on the fly generate the LUN IDs between the 005 to 009 as well..Then
using some commands, automate the process of allocating the LUN IDs to some
free avaiable LUNs which I found in the first command output....

Is Perl better at manipulating hex or python?

I know its possible to redirect the above output to a text file as well as
a CSV file or an XML File and do I/Os on those files and then .but is Perl
better for that or Python?

3) I want to generate reports on Performance, like which LUN has more
IOs....What will be helpful here is a language that can help me create
these graphs in excel....run the script and the output should generate a
graph in Excel....

Which language is better suited for my needs? I found Perl syntax a little
cryptic, but if Perl will be faster and better suited than Python, then I
am ready to invest more time with Perl....

Which language will be faster for text processing/automation?

Which language will generate Binary executable that is smaller in size and
I played a little with Python too...today is my third day....Found the
syntax much easier to learn....Also came across cxfreeze which creates
independent binary executables...is something such available in Perl?

4) I also want to try out playing with XML output....The storage commands I
use allow me the output to be directed to an XML Format....Is Python better
suited at this ?

Few more questions pop up like, Which will give me more freedom and ease to
maintain ? Which scripting language is better from the employability point
of view?

I dont want to start with one language and six months or a year down think
"Heck, this was better in the other one".....because I really can
concentrate on only one langauge.

My apologies in advance if any questions above seem to be dumb or naive.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111128/ded43c3c/attachment.html>

More information about the Tutor mailing list