comparing all values of a list to regex

Alex Martelli aleax at aleax.it
Fri Sep 27 04:07:24 EDT 2002


<posted & mailed>

Manuel Hendel wrote:
        ...
> This is the input, it is "|" seperated textfile with about 15000 lines
> of pop3 accounts:
> 
|Number|String|Number|String(Domain)|String(Account)|String(Login/Email)|String(Password)|String|String|
> 
> These are the fields I care about.
> 
> String(Domain): This is the domain of the Account
> String(Account): This is the part in front of the @doamin. This can
> also be a * for a catchall account.
> String(Login/Email): This is the local pop3-account or a emailaddress,
> or both, comma seperated.
> String(Password): This is the password if String(Login/Email) is a
> pop3-account.
> 
> This should be the output:
> 
> Three text files. One with only forwardings (only emailaddresses in
> the String(Login/Email). One with only pop3-accounts in the
> String(Login/Email), and one mixed, where String(Login/Email) has
> pop3-accounts and emailaddresses.

OK, for each line you want fields line.split('|')[4:8], each field
space-stripped (at least, you strip them in the code you posted,
though that's not in the specs you give above), _plus_ you need
to know the set of domains (uniquely) and classify per field 4
(domain) -- you build this in the code you posted, althogh, again,
I can't see any trace of that requirement in the specs.

So we'll want an auxiliary function to tell us which of the three
"bins" to put an entry to depending on the login/email field, e.g.:

def classify(login_email):
    l_e = login_email.split(',')
    assert 1 <= len(l_e) <= 2, "More than one comma in (%s)" % login_email
    if len(l_e)==2: return 2         # both
    elif l_e.find('@'): return 1     # email, I guess
    else return 0                    # other case (local account, I guess)


Now the rest, net of imports and file open/close ops:

classified = [ [] for i in range(3) ] # 3 separate empty lists
per_domanin = {}                      # initially-empty dict

for line in inputfile:
    fields = [ field.split() for field in line.split('|')[4:8] ]
    per_domain.setdefault(fields[0],[]).append(fields)
    classified[classify(field[2])].append(fields)


Now you only need the output -- presumaby each line must be
output in the same way, e.g. with another auxiliary function:

def outline(fileobj, fields):
    fileobj.write('|'.join(fields))
    fileobj.write('\n')

so you only need to loop on each of the three lists of lists
in 'classified', and on the keys of dictionary 'per_domain'
(sort them too, if you wish, of course), in order to emit
the results to appropriate files.  Presumably your exact
specs are not quite as I tried to guess them from a mix of
what you wrote and what you coded, but I hope this outline
can still be useful to you.


Alex




More information about the Python-list mailing list