[Tutor] 2016-02-01 Filter STRINGS in Log File and Pass as VARAIBLE within PYTHON script
Cameron Simpson
cs at zip.com.au
Mon Feb 1 05:50:41 EST 2016
On 01Feb2016 15:53, knnleow GOOGLE <knnleow at gmail.com> wrote:
>trying out on how to port my unix shell script to python.
>get more complicated than i expected.....: (
>i am not familiar with the modules available in python.
>anyone care to share how to better the clumsy approach below.
>regards,
>kuenn
>
> timestamp02 = time.strftime("%Y-%m-%d-%H%M%S")
> banIPaddressesFile = os.popen("cat
>/var/log/fail2ban.log| egrep ssh| egrep Ban| egrep " + myDate + "| awk
>\'{print $7}\'| sort -n| uniq >/tmp/banIPaddressesFile." +
>timestamp02).read()
First up, this is still essentially a shell script. You're constructing a shell
pipeline like this (paraphrased):
cat >/var/log/fail2ban.log
| egrep ssh
| egrep Ban
| egrep myDate
| awk '{print $7}'
| sort -n
| uniq
>/tmp/banIPaddressesFile-timestamp
So really, you're doing almost nothing in Python. You're also writing
intermediate results to a temporary filename, then reading from it. Unless you
really need to keep that file around, you won't need that either.
Before I get into the Python side of things, there are a few small (small)
criticisms of your shell script:
- it has a "useless cat"; this is a very common shell inefficiency there people
put "cat filename | filter1 | filter2 ..." when they could more cleanly just
go "filter1 <filename | filter2 | ..."
- you are searching for fixed strings; why are you using egrep? Just say "grep"
(or even "fgrep" if you're old school - you're new to this so I presume not)
- you're using "sort -n | uniq", presumably because uniq requires sorted input;
you are better off using "sort -un" here and skipping uniq. I'd also point
out that since these are IP addresses, "sort -n" doesn't really do what you
want here.
So, to the Python:
You seem to want to read the file /var/log/fail2ban.log and for certain
specific lines, record column 7 which I gather from the rest of the code
(below) is an IP address. I gather you just want one copy of each unique IP
address.
So, to read lines of the file the standard idom goes:
with open('/var/log/fail2ban.log') as fail_log:
for line in fail_log:
... process lines here ...
You seem to be checking for two keywords and a date in the interesting lines.
You can do this with a simple test:
if 'ssh' in line and 'Ban' in line and myDate in line:
If you want the seventh column from the line (per your awk command) you can get
it like this:
words = line.split()
word7 = words[6]
because Python arrays count form 0, therefore index 6 is the seventh word.
You want the unique IP addresses, so I suggest storing them all in a set and
not bothering with a sort until some other time. So make an empty set before
you read the file:
ip_addrs = set()
and add each address to it for the lines you select:
ip_addrs.add(word7)
After you have read the whole file you will have the desired addresses in the
ip_addrs set.
Try to put all that together and come back with working code, or come back with
completed but not working code and specific questions.
Cheers,
Cameron Simpson <cs at zip.com.au>
More information about the Tutor
mailing list