Analytics Behind LinkedIn
by Paul O'Rorke on Jul.21, 2009, under Meeting Notes
DJ Patil talked about the “Analytics Behind LinkedIn: A New Model for Analytics and Business Intelligence” at the SDForum BI SIG meeting on Tuesday, July 21st, 2009. Patil is the Chief Scientist at LinkedIn and plays several other important roles including Sr. Director of Product Analytics and Chief Information Security Officer. Patil is in charge of analytics reporting on LinkedIn’s businesses (and has some responsibility for P&L – profit and loss) and also is in charge of product analytics. While many companies including startups strive to be “data-driven” Patil’s presentation focused on LinkedIn’s efforts to build products leveraging data.
How does LinkedIn make money? LinkedIn has been profitable since 2006 and makes money on a growing portfolio of products and services including:
- ads
- subscriptions
- job postings
- market research surveys
- white papers
- enterprise clients (at customer’s sites supporting recruiting)
Analytics is very important to LinkedIn. The Analytics team is called the “A Team.” Patil cited an article (“Math Will Rock Your World” Business Week, 1/2006) that described how analytics was in the process of revolutionizing a wide range of businesses. Currently, widely available tools like Google Analytics and emerging tools like Mint for personal finance are raising the level of analytics usage in the general population. At LinkedIn, analytics is used in everything from relatively general things like AB testing to specialized products offered by LinkedIn like “TalentMatch”, a service developed by Monica Rogati that aims to find the most qualified applicants given a job posting.
The organization at LinkedIn seems healthy and impressive. Analytics is part of the product organization and not under the CTO in LinkedIn as it is in many other companies. This helps ensure that it is wired in to the organization and makes important contributions and gets good funding. Analytics is broken down into “product analytics” and “data insights and BI” (including AB testing and reporting). One of the strategies being pursued is to “cut the biggest swaths first” and tackle the largest applications such as search.
Not long ago, in order to stop being inundated by internal requests for custom reports, some simple tools were built to make roughly half of the reports “self service.” Instead of just telling internal customers “you’re on your own” support shifted to an “office hours” model and training was provided including classes on SQL. In addition to offloading many reports to the requesters, this had the added benefit that requesters became more knowledgeable about what they were requesting, how much effort is involved, and about computational requirements and problems that can arise in satisfying requests.
Decisions have to be made about whether to build or buy. Examples were given where the decision was made to build and in other cases LinkedIn bought products or services. LinkedIn built their own lightweight reporting tool “Reportal” with one person month of effort.
Decisions have to be made about how much data to include or show in a product. You don’t want to overwhelm users and “nobody uses advanced features.” Patil gave the example of Mapquest: originally it had a lot of bells and whistles but customers did not use them so the current version of the product is relatively simple.
Products and enabling technologies shown and discussed included:
- Aster Data (recently deployed)
- collaborative filtering (e.g., for serving ads)
- Cloudera Hadoop (just starting out in clusters of relatively small size, around 20 nodes)
- Lucene
- Microstrategy (in the process of deployment)
- “People You May Know” (invented at LinkedIn)
- “People Who Viewed This Profile Also Viewed These Profiles” (originally built using SQL and delivered in a week using ad server technology)
- Oracle
- Prefuse, Processing
- SAS is not used because it is considered to be too expensive for now but if market segmentation is needed SAS may be used in the future
- Voldemort (an open source project developed in-house)
- “Who Viewed My Profile?” (the free version only shows partial data on viewers and up-selling to get more detail happens often)
LinkedIn faces huge data quality issues since it has so much data and it faces them up front. Extensive efforts have been made to increase standardization of terms in order to facilitate analysis because there are thousands of variants of titles and misspellings of company names!
Patil showed some interesting applications of data available to LinkedIn:
- one can see large scale changes in the popularities of different fields (e.g., jobs involving “analytics” have made a comeback in recent years after having declined after the end of the lunar landings)
- one can see how frequently people change jobs in different regions and over time
- one can see large scale relationships between users across all the countries using LinkedIn
- one can do some kinds of career planning based on:
- who is viewing your profile (e.g., senior managers or relatively junior people)
- positions people have gone to from the position you are in or
- positions people have come from that led to the position you want
- which credentials will help you get a job
- which schools will give you the best return on your investment
Several of DJ Patil’s most humorous statements were also among the most informative or revealing about corporate culture or strategy. For example: ”If you’re not cheating with data, you’re bullshitting yourself.” In this case, “cheating” refers to augmenting the data or processing it for example by bucketing it in order to get it into a usable form. To this end, instead of having an anti-NIH (Not Invented Here) strategy, LinkedIn is happy to use external sources of data such as census data or geocoding data or zip codes or WikiPedia if they have already been cleaned or extensively worked on. A motto often followed / question often asked is: ”How clever can we do this on the cheap?” Mechanical Turk is used extensively as one way of avoiding having to do work in-house.
Leave a Reply
You must be logged in to post a comment.
December 11th, 2009 on 7:37 pm
[...] (source: http://ororke.com/paul/blog/?p=225) [...]