[Tutor] Better way to remove lines from a list?

boB Stepp robertvstepp at gmail.com
Tue May 12 19:33:38 EDT 2020


On Tue, May 12, 2020 at 11:59:18PM +0200, Peter Otten wrote:
>boB Stepp wrote:
>
>>   I have a test file with the following contents:
>>
>> ADR;TYPE=HOME:;;11601 Southridge Dr;Little Rock;AR;72212-1733;US;11601
>> Sout
>>   hridge Dr\nLittle Rock\, AR 72212-1733\nUS
>> ADR;TYPE=WORK:;;1912 Green Mountain Dr;Little Rock;AR;72212;US;1912 Green
>> M
>>   ountain Dr\nLittle Rock\, AR 72212\nUS
>>   more meaningless stuff
>>   even more meaningless stuff
>> ADR:100;;4700 E McCain Blvd;North Little Rock;AR;72117;US;4700 E McCain
>> Blv
>>   d\n100\nNorth Little Rock\, AR 72117\nUS

>I doubt that the extra stuff in the ADR lines is illegitimate and think that
>the best solution would be to find a tool that can parse the data as-is.

I have to disagree about the illegitimacy of the data.  According to "vCard 3.0
format specification" at https://www.evenx.com/vcard-3-0-format-specification an
example of a properly formatted ADR property would be:

ADR;TYPE=dom,home,postal,parcel: ;;123 Main Street;Any Town;CA;91921;

where the bare semicolons indicate that post office address, extended
address and country content fields, respectively, have been omitted.  There
is no provision for tacking on additional content fields to the ADR
property.

I double-checked this at the official standard RFC 2426
(https://tools.ietf.org/html/rfc2426#section-3.2.1)

I have noted, though, that my full Google contacts vCard file includes
quite a few extension fields to the standard vCard ones.  So perhaps you
are correct in the sense that Google intended this addition of a
duplication of the address (with newlines embedded), but it is not the
standard format.  The parser I have found expects a standard vCard format
and does parse extensions properly begun with "X-".  Another possibility is
that there is some non-standard expectation by Apple for their accepted
vCard format.  The Google Contacts page states that the export in vCard
format is intended for iOS Contacts.

>However, practicality beats purity. So how about merging the line and then
>removing everything starting with the 8th semicolon? Like

># assuming that the colon after one of your ADRs is a typo

That eighth semicolon is no typo.  That is a direct copy and paste from a
Google contacts export in vCard format.

>def cleaned(line):
>    if line.startswith("ADR;"):
>        line = ";".join(line.split(";")[:8])
>    return line + "\n"
>
>cleaned_text = "".join(
>    cleaned(line) for line in text.replace("\n ", "").splitlines()
>)
>
>where text is the complete file as a string.

However, this "practical" code looks useful for me.  Thanks!

-- 
Wishing you only the best,

boB Stepp


More information about the Tutor mailing list