[Chicago] Chicago - Web Crawlers

Tathagata Dasgupta tathagatadg at gmail.com
Fri Oct 12 23:49:34 CEST 2012


While it is not Python, you might wanna look into
http://nutch.apache.org/ also ...

On Fri, Oct 12, 2012 at 4:28 PM, Philip Doctor
<Philip.S.Doctor at gmail.com> wrote:
> Hi Paul,
> You might strongly consider looking into Beautiful Soup for scraping in
> python if you haven't already.  I've worked with it plenty of times and it
> beats the stuffing out of trying to regex it.
>
> http://www.crummy.com/software/BeautifulSoup/
>
> Good luck.
>
> -Phil
>
>
>
> On Fri, Oct 12, 2012 at 4:25 PM, Paul Wallenberg <p.wallenberg at gmail.com>
> wrote:
>>
>> Hi ChiPy,
>>
>> I work for LaSalle Network and hosted what used to be the "best meeting
>> ever" of ChiPy (until the following month). We were recently engaged on an
>> initative that involves building web crawlers and/or working with web
>> scraping techniques to extract data from selected web sites.
>>
>> If you have had similar exposure, are well versed in Linux OS, and have
>> worked with search engine technologies like Lucerne or Solr, please let me
>> know and advise if it would sense for us to set up a time to chat.
>>
>> Thanks in advance for your time and interest.
>>
>> All my best,
>>
>> Paul
>>
>> Paul Wallenberg
>> Project Manager - Technology Services
>> LaSalle Network
>> pwallenberg at lasallenetwork.com
>> p. 312-413-1700
>> d. 312-924-3683
>> c. 847-738-3685
>>
>>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>>
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>



-- 
Cheers,
T


More information about the Chicago mailing list