Comments on my first script?

John Salerno johnjsal at NOSPAMgmail.com
Thu Jun 12 17:10:44 CEST 2008


"Phillip B Oldham" <phillip.oldham at gmail.com> wrote in message 
news:7e3a7c92-6204-46dd-8df8-90218f2fb314 at 26g2000hsk.googlegroups.com...
> I'd like the community's thoughts/comments on what I've done;
> improvements I can make, "don'ts" I should be avoiding, etc. I'm not
> so much bothered about the resulting data - for the moment it meets my
> needs. But any comment is welcome!

I'm not expert, but here are a few thoughts. I hope they help.

> #!/usr/bin/env python
> ## Open a file containing a list of domains (1 per line),
> ## request and parse it's whois record and push to a csv
> ## file.

You might want to look into doc strings as a method of providing longer 
documentation like this about what your program does.

> dest = open('./whois.csv', 'w');

Semicolon!!!! :)

> def trim( txt ):
> x = []
> for line in txt.split("\n"):
> if line.strip() == "":
> continue
> if line.strip().startswith('WHOIS'):
> continue
> if line.strip().startswith('>>>'):
> continue
> if line.strip().startswith('%'):
> continue
> if line.startswith("--"):
> return ''.join(x)

Is all this properly indented? One thing you can do is put each of these on 
one line, since they are fairly simple:

if line.strip().startswith('WHOIS'): continue

although I still like proper indentation. But you have a lot of them so it 
might save a good amount of space to do it this way.

Also, just my personal preference, I like to be consistent with the type of 
quotes I use for strings. Here, you mix both single and double quotes on 
different lines.

> return "\n".join(x);

Semicolon!!!!  :) :)

> details = ['','','','','','','','','']

I don't have Python available to me right now, but I think you can do this 
instead:

details = [''] * 9

> except:
> continue

Non-specific except clauses usually aren't preferred since they catch 
everything, even something you might not want to catch.

> if domain == '':
> continue

You can say:

if not domain

instead of that equivalence test. But what does this if statement do?

> if rec.startswith("No whois server") == True:
> continue
>
> if rec.startswith("This TLD has no whois server") == True:
> continue

Like above, you don't need "== True" here.

> if domain.endswith(".net"):
> rec = clean_net(rec)
>
> if domain.endswith(".com"):
> rec = clean_net(rec)
>
> if domain.endswith(".tv"):
> rec = clean_net(rec)
>
> if domain.endswith(".co.uk"):
> rec = clean_co_uk(rec)
>
> if domain.endswith(".info"):
> rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if tests:

for extension in [<list all the extensions as strings here>]:
    rec = clean_net(extension)

But for that to work, you may need to generalize the clean_net function so 
it works for all of them, instead of having to call different functions 
depending on the extension.

Anyway, I hope some of that helps! 





More information about the Python-list mailing list