[Tutor] extract uri from beautiful soup string

Norman Khine norman at khine.net
Mon Oct 15 00:10:12 CEST 2012


Hi thanks, i changed the code to http://pastie.org/5059153

One thing is that when I try to write the assoc_data into a CSV file,
it groaks on

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 0:

here some sample data from the print:

[u'Social', u'Action9', u'ash-nimes at aol.com',
mise en place d'ateliers, d'animations hebdomadaires et ponctuelles
afin de lutter contre toutes les formes d'exclusion., Mme Liberté
Bisbal, 04.66.27.24.84, 3002 Rte de Courbessac, 04.66.27.24.84, 30000
NIMES, Madame BISBAL Liberté, 04.66.27.24.84,  ]
[u'Social', u'Adapei30', u'contact at adapi30.org', deux lieux d'echanges
et d'infos des publics concernes par le probleme du handicap mental
representation aupres de divers organismes d'etat et du departement.,
17b, RUE CHILDEBERT, 04.66.21.21.49, 30900 NIMES, Monsieur FLUTTE
Bernard,  ]
[u'Sport', u'Aero-club de nimes-courbessac', u'aeroclubnimes at free.fr',
promouvoir , de faciliter et d'organiser la pratique de l'aviation, 65
Aerodrome de Nimes Courbessac, 04.66.28.16.00, 30000 NIMES, Monsieur
VASSAL PATRICK,  ]

How do I change to code to take note of the latin-1 encoding?



On Sun, Oct 14, 2012 at 8:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 15/10/12 05:05, Norman Khine wrote:
>
>> for page_no in pages:
>
> [...]
>
>>         try:
>>                 urllib2.urlopen(req)
>>         except urllib2.URLError, e:
>>                 pass
>>         else:
>>                 # do something with the page
>>                 doc = urllib2.urlopen(req)
>
>
> This is a bug. Just because urlopen(req) succeeded a moment ago, doesn't
> mean
> that it will necessarily succeed now.
>
> This is an example of the general class of bug known as a "race condition":
>
> http://en.wikipedia.org/wiki/Race_condition
>
>
> Better to write this as:
>
>
> try:
>     doc = urllib2.urlopen(req)
>
> except urllib2.URLError, e:
>     pass
> else:
>     # do something with the page
>     [rest of code goes here]
>
>
>
>
> --
> Steven
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



-- 
%>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or
chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )


More information about the Tutor mailing list