Comments on my first script?
John Salerno
johnjsal at NOSPAMgmail.com
Thu Jun 12 11:10:44 EDT 2008
"Phillip B Oldham" <phillip.oldham at gmail.com> wrote in message
news:7e3a7c92-6204-46dd-8df8-90218f2fb314 at 26g2000hsk.googlegroups.com...
> I'd like the community's thoughts/comments on what I've done;
> improvements I can make, "don'ts" I should be avoiding, etc. I'm not
> so much bothered about the resulting data - for the moment it meets my
> needs. But any comment is welcome!
I'm not expert, but here are a few thoughts. I hope they help.
> #!/usr/bin/env python
> ## Open a file containing a list of domains (1 per line),
> ## request and parse it's whois record and push to a csv
> ## file.
You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.
> dest = open('./whois.csv', 'w');
Semicolon!!!! :)
> def trim( txt ):
> x = []
> for line in txt.split("\n"):
> if line.strip() == "":
> continue
> if line.strip().startswith('WHOIS'):
> continue
> if line.strip().startswith('>>>'):
> continue
> if line.strip().startswith('%'):
> continue
> if line.startswith("--"):
> return ''.join(x)
Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:
if line.strip().startswith('WHOIS'): continue
although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.
Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.
> return "\n".join(x);
Semicolon!!!! :) :)
> details = ['','','','','','','','','']
I don't have Python available to me right now, but I think you can do this
instead:
details = [''] * 9
> except:
> continue
Non-specific except clauses usually aren't preferred since they catch
everything, even something you might not want to catch.
> if domain == '':
> continue
You can say:
if not domain
instead of that equivalence test. But what does this if statement do?
> if rec.startswith("No whois server") == True:
> continue
>
> if rec.startswith("This TLD has no whois server") == True:
> continue
Like above, you don't need "== True" here.
> if domain.endswith(".net"):
> rec = clean_net(rec)
>
> if domain.endswith(".com"):
> rec = clean_net(rec)
>
> if domain.endswith(".tv"):
> rec = clean_net(rec)
>
> if domain.endswith(".co.uk"):
> rec = clean_co_uk(rec)
>
> if domain.endswith(".info"):
> rec = clean_info(rec)
Hmm, my first thought is to do something like this with all these if tests:
for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)
But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.
Anyway, I hope some of that helps!
More information about the Python-list
mailing list