[BangPypers] Orglex Update

Tue Feb 12 11:54:44 CET 2008

Hi Everybody,

Hope things are going well. I wanted to send a quick update on our service
(the last one may have been many months ago) to share some of our learnings
for building our service (its Python/Django based). I also have a small
recruiting pitch at the end of the note :)

I know that there has been quite a bit of discussion around machine
learning, AI and other statistical based approaches to text and data mining
on this mailing list. I wanted to share our experience while building our
service ( http://www.orglex.com )

As some of you may know, from my previous emails, we aggregate all types of
content (e.g. News, Blogs, Jobs, People etc...) focused on Industries and
Organizations (e.g. You can see a hub here
http://www.orglex.com/hubs/clinical-trials ). Our content aggregation is
completely automated and very relevant to the topic at hand (See another
example at www.orglex.com/hubs/venture-capital ). While we believe that we
have achieved a far greater degree of relevance than anything existing in
the market, we still need to make a lot of progress.

We also white label our aggregated content and we have a very important
validation from one of the leading technology and venture capital blog
networks, VentureBeat- http://www.venturebeat.com/vc-news/ . Additionally,
the traffic to our site has also been growing quite fast over the past few
months.

To achieve this level of relevance, we experimented, implemented and
iterated with many purely algorithmic techniques (e.g. TFIDF, Bayesian
methods etc...) to assign and tag our content. However, we were not
satisfied with the relevance of the content and had to apply a hybrid
approach. One of the issues with a purely algorthmic approach is that it
works broadly for generic content (e.g. using topical/document similarities
see here-http://blogoscoped.com/archive/2006-07-28-n49.html ) but has decay
issues for narrow topics.

Our platform has many pieces to it but some of the important elements are:
=>We utilize industry specific semantic ontologies to help the system
appropriately understand the content. This is classic semantic web stuff (
http://www.semanticweb.com/article.php/3721831 ).
=>We understand the importance of relevant sources within an industry and
appropriately weight them while looking at the source

Given this hybrid approach, we are able to keep the relevance to a very high
quality and yet automate the process.

We are still a small team but given all this exciting progress, we are
looking to expand our team by 1-2 more people. We have a preference for
people with 0-2 years of experience but wont hold experience against folks
:)

Looking forward to hearing feedback from folks.

Nik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/bangpypers/attachments/20080212/073129ca/attachment.htm