<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paul O'Rorke</title>
	<atom:link href="http://ororke.com/paul/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://ororke.com/paul/blog</link>
	<description>On Intelligence &#38; Software</description>
	<lastBuildDate>Sat, 29 May 2010 06:56:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Salesforce&#8217;s Realtime Analytics</title>
		<link>http://ororke.com/paul/blog/?p=722</link>
		<comments>http://ororke.com/paul/blog/?p=722#comments</comments>
		<pubDate>Wed, 19 May 2010 00:37:56 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[SDForum BI SIG]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=722</guid>
		<description><![CDATA[Salesforce&#8217;s CRM analytics architect, Donovan Schneider, presented an overview at the SDForum BI SIG meeting on May 18th, 2010.  Salesforce&#8217;s view of analytics is that it should deliver insight that is accessible to mere mortals, real-time, and trustworthy.
Salesforce&#8217;s analytics strategy is to start simple and grow from there.  Donovan counted dashboards and reports as analytics [...]]]></description>
			<content:encoded><![CDATA[<p>Salesforce&#8217;s CRM analytics architect, Donovan Schneider, presented an overview at the SDForum BI SIG meeting on May 18th, 2010.  Salesforce&#8217;s view of analytics is that it should deliver insight that is accessible to mere mortals, real-time, and trustworthy.</p>
<p><span id="more-722"></span>Salesforce&#8217;s analytics strategy is to start simple and grow from there.  Donovan counted dashboards and reports as analytics (in contrast to others who define analytics as what you have left when you subtract ETL and reporting from BI). He noted that Salesforce users have built three quarters of a million dashboards and dashboards are viewed 700,000 times per day.  There are over 12 million reports and 2.5 million reports are run each day.  In addition to dashboards and reports, Donovan included list views and search in the collection of analytics tools currently available.  Future additions to analytics will build from this collection instead of adding relatively high end analytics.</p>
<p>Donovan showed a nice architecture slide contrasting the traditional datawarehouse based approach to the new cloud-based approach taken by Salesforce and claimed that the datawarehouse approach is out of date, too complicated, and too rigid.  Salesforce&#8217;s new approach takes advantage of their cloud-based force.com API, multitenant architecture and platform so it is easy to use, realtime, and flexible.</p>
<p>Donovan gave a brief review of Salesforce&#8217;s multi-tenant architecture. Craig Weissman, Salesforce&#8217;s CTO, gave a presentation on the multi-tenant architecture for SDForum&#8217;s Software Architecture SIG back when I helped organize that group. Click <a href="http://ororke.com/paul/blog/?p=174">here</a> for more on this topic.</p>
<p>Salesforce achieves realtime analytics and simplification by not having a datawarehouse.  But unlike many companies using NoSQL data stores Salesforce does use a traditional RDBMS: Oracle. Currently, there are around fifteen databases for production each containing around ten terabytes. Salesforce relies on ACID transactions because they are supporting companies doing business on their platform.  When transactions are committed, the changes are available immediately everywhere and this supports the goal of realtime analytics.</p>
<p>Interestingly, Salesforce does not cache results of queries and so on because things change so frequently.  Salesforce is able to get away without caching and does without a datawarehouse in part by making query execution efficient.  Donovan went into some depth on how queries are optimized.  Salesforce has its own optimizer and does its own indexing in part because the multi-tenant architecture of their data often makes it impossible to directly use Oracle&#8217;s indexing and optimization.</p>
<p>In closing, Donovan said his top priorities now are adding analytical capabilities, improving scalability and usability, and supporting collaboration.  You can try out analytics for free as it is available in Salesforce&#8217;s sandbox along with a new report builder.  Donovan&#8217;s slides are available <a title="here" href="http://ororke.com/paul/blog/wp-content/uploads/2010/05/100518SalesforceAnalytics.pptx">here</a> and at <a href="http://sdforum.org/index.cfm?fuseaction=Page.viewPage&amp;pageId=620&amp;parentID=483&amp;nodeID=1">SDForum.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=722</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Combining Performance and Decision Management</title>
		<link>http://ororke.com/paul/blog/?p=326</link>
		<comments>http://ororke.com/paul/blog/?p=326#comments</comments>
		<pubDate>Tue, 20 Apr 2010 18:59:10 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[SDForum BI SIG]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=326</guid>
		<description><![CDATA[James Taylor, CEO of Decision Management Solutions, gave a talk on &#8220;Performance Management and Agility&#8221; at the monthly meeting of the SDForum BI SIG on Tuesday, April 20th.  He argued that traditional BI and performance management result in dashboards that measure and monitor like instrument clusters in cars.  But what is needed is [...]]]></description>
			<content:encoded><![CDATA[<p>James Taylor, CEO of <a href="http://decisionmanagementsolutions.com/">Decision Management Solutions</a>, gave a talk on &#8220;Performance Management and Agility&#8221; at the monthly meeting of the SDForum BI SIG on Tuesday, April 20th.  He argued that traditional BI and performance management result in dashboards that measure and monitor like instrument clusters in cars.  But what is needed is something more like the cockpits in airplanes:  there should be buttons and levers and so on that enable the &#8220;pilot&#8221; to act on the information presented by the dashboard.  James argued for combining performance management with decision management (a term he pioneered) so that information supports decision-making that leads to action.<br />
<span id="more-326"></span><br />
The analogy with airplanes, pilots, and cockpits provided several additional useful examples since planes have autopilots and pilots learn on simulators.  James claimed that decision management can help automate some decisions (providing an autopilot) and can allow for simulation supporting experimentation.</p>
<p>Agility was a key theme of the presentation and part of James&#8217; approach is intended to support quick actions.  When an event occurs, in some cases it is important to notice quickly and act almost immediately as the value of action based on the information about the event decreases rapidly as time passes.  With other events, you have more time to consider options.  But the key is to notice and act appropriately in a timely manner.  It isn&#8217;t enough just to make information available or notice things quickly or even in real time.  Sometimes noticing immediately isn&#8217;t even necessary or useful (e.g., when the required action would take a long time anyway).</p>
<p>James uses a three stage approach to improving operational decisions:</p>
<ol>
<li>discover the important decisions</li>
<li>build (automate) decision services</li>
<li>analyze decisions and create a closed loop between analytics and decision services.</li>
</ol>
<p>The slides for James&#8217;s presentation are available at the BI SIG&#8217;s web page at <a href="http://SDForum.org/BISIG">http://SDForum.org/BISIG</a> and <a href='http://ororke.com/paul/blog/wp-content/uploads/2010/04/100420Performance-Management-and-Agilty-SDForum.pptx'>here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=326</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Analytics Revolution</title>
		<link>http://ororke.com/paul/blog/?p=293</link>
		<comments>http://ororke.com/paul/blog/?p=293#comments</comments>
		<pubDate>Sat, 10 Apr 2010 06:59:00 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=293</guid>
		<description><![CDATA[The first SDForum conference on analytics, &#8220;The Analytics Revolution,&#8221; was held in Mountain View on Friday, April 9th, 2010.  The conference focused on recent advances in analytics, new opportunities afforded by these advances, and ways companies can take advantage of the analytics revolution in progress.
Over 250 people attended. Some leading analytics gurus and data [...]]]></description>
			<content:encoded><![CDATA[<p>The first SDForum conference on analytics, &#8220;The Analytics Revolution,&#8221; was held in Mountain View on Friday, April 9th, 2010.  The conference focused on recent advances in analytics, new opportunities afforded by these advances, and ways companies can take advantage of the analytics revolution in progress.</p>
<p><span id="more-293"></span>Over 250 people attended. Some leading analytics gurus and data scientists gave presentations and participated in panel discussions.  This blog post links to summaries of the keynotes and panels.</p>
<p>There were four panel discussions at the conference:</p>
<ul>
<li><a href="http://ororke.com/paul/blog/?p=483">Competing on Analytics</a></li>
<li><a href="http://ororke.com/paul/blog/?p=489">Analyzing Big Data</a></li>
<li><a href="http://ororke.com/paul/blog/?p=573">New Frontiers</a></li>
<li><a href="http://ororke.com/paul/blog/?p=564">The Investor Perspective</a></li>
</ul>
<p>Keynote speakers representing an array of large companies with strong analytics capabilities gave presentations on a wide range of analytics efforts:</p>
<ul>
<li>Ronny Kohavi (Microsoft) &#8220;<a href="http://ororke.com/paul/blog/?p=530">Online Controlled Experiments:  Listening to the Customers, Not to the HiPPO</a>&#8220;</li>
<li>Sanjay Poonen (SAP) &#8220;<a href="http://ororke.com/paul/blog/?p=506">Leading the Analytics Revolution</a>&#8220;</li>
<li>Peter Norvig (Google) &#8220;<a href="http://ororke.com/paul/blog/?p=295">The Unreasonable Effectiveness of Data</a>&#8220;</li>
<li>Jeff Kreulen (IBM) &#8220;<a href="http://ororke.com/paul/blog/?p=556">Analytics:  An Applied Researcher&#8217;s Perspective</a>&#8220;</li>
<li>Jaap Suermondt (HP) &#8220;<a href="http://ororke.com/paul/blog/?p=560">Research in Analytics for Operational Impact at HP</a>&#8220;</li>
</ul>
<p>You can click on the links above to get quick summaries.  The keynote presentations and panel discussions were recorded and the audio and video recordings are available individually from the pages above or at <a href="http://www.dyyno.com/channel/sdforum">dynno.com</a> or at SDForum <a href="http://sdforum.org/index.cfm?fuseaction=Page.ViewPage&amp;PageID=997">here</a>.  The conference was sponsored by Accel Partners, Dynno, IBM, Impetus, Microsoft and SAP and the organizing committee included Stacey Bishop (Scale Venture Partners),<strong> </strong>Jim Claussen (IBM), Lars Leckie (Hummer Winblad), Sonja London (Summit Software), Paul O&#8217;Rorke (Kraken Data), and Julia Queck (SAP).</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=293</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  Competing on Analytics at the Highest Level</title>
		<link>http://ororke.com/paul/blog/?p=483</link>
		<comments>http://ororke.com/paul/blog/?p=483#comments</comments>
		<pubDate>Sat, 10 Apr 2010 05:37:59 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=483</guid>
		<description><![CDATA[The &#8220;Competing on Analytics&#8221; panel at the SDForum Conference on &#8220;The Analytics Revolution&#8221; included people from companies using analytics to &#8220;compete at the highest level&#8221; according to the five stage maturity model in the book &#8220;Competing on Analytics: The New Science of Winning.&#8221;  The panelists (Amr Awadallah, Cloudera; Joshua Klahr, Yahoo!; James Phillips, Northscale; Joydeep Sen [...]]]></description>
			<content:encoded><![CDATA[<p>The &#8220;Competing on Analytics&#8221; panel at the SDForum Conference on &#8220;The Analytics Revolution&#8221; included people from companies using analytics to &#8220;compete at the highest level&#8221; according to the five stage maturity model in the book &#8220;<a href="http://www.amazon.com/Competing-Analytics-New-Science-Winning/dp/1422103323">Competing on Analytics: The New Science of Winning</a>.&#8221;  The panelists (Amr Awadallah, Cloudera; Joshua Klahr, Yahoo!; James Phillips, Northscale; Joydeep Sen Sarma, Facebook) represented a good mix from the relatively new Twitter to the larger, older, more established eBay.  David Steier, PriceWaterhouseCoopers, moderated the panel.</p>
<p><span id="more-483"></span>David began by asking:  how do the panelists use analytics to get a competitive advantage?  What is an example of something they found that surprised them?</p>
<p>Ken Rudin and Zynga use analytics to achieve their goals of increasing revenue, improving user retention and increasing viral spread for their online social games including Farmville.  They collect 3-4 terabytes of data from their 40-50 million daily users.  Initially they were reactive:  producing reports in response to requests.  Now they use analytics as an integral part of game design:  they use AB testing and experiments in development just as QA is an integral part of development.  Analysts are part of the design team.  An example of something that surprised them occurred when they analyzed Mafia game players.  There are two groups of players, the &#8220;crime jobbers&#8221; and the &#8220;fighters.&#8221;  They discovered that fighters spend twice as much because they are trying to compete with their friends so they purchase weapons to arm their mafia members so they can fight and defeat their friend&#8217;s mafia gang.  Since they discovered this, they have changed the game to encourage players to fight more instead of just doing crime jobs.</p>
<p>Kevin Weil mentioned that a key accomplishment of the analytics team at Twitter has been to get everyone to consider data (and to make it possible for them to do so) in all important decisions.  He described an example of how analytics surprised them that involved the social network underlying Twitter.  Many people use Twitter largely as an information source and their network of &#8220;following&#8221; links says something about their interests.  But the bidirectional links can be ignored when trying to determine their interests.  The unidirectional links (e.g., the people they follow who don&#8217;t follow them) are the ones that carry the most important incoming information.</p>
<p>David Steier asked the panelists:  How do you organize people to achieve the company&#8217;s goals?  The panelists companies all have analytics teams and a team responsible for the company&#8217;s analytics platform and analysts who work with other teams but the way they work within the companies is different and several companies are adapting innovative approaches.</p>
<p>DJ Patil and LinkedIn started out by looking at other companies such as Google and Yahoo.  Yahoo had analytics in a separate research organization and it was difficult or impossible to get it into products.  Google is driven by technology and bolts products on top (see also <a href="http://techcrunch.com/2010/05/15/facebook-google/">http://techcrunch.com/2010/05/15/facebook-google/</a>).  So to ensure that analytics is integrated into products at LinkedIn, the Analytics team is a substantial part (1/4th) of the product team and has its own designers and developers so it is easier to go straight to production or to integrate with existing products.</p>
<p>Neel Sundaresan (eBay) claimed that &#8220;everybody should be an analytics scientist.&#8221;  eBay has an analytics platform team that provides data to the rest of the company and &#8220;the data tells you what the product should be.&#8221;  With 200 million users and a billion searches per day, eBay gets tremendous amounts of data and product managers and developers and even some machine learning scientists need to learn to look at the data.</p>
<p>Ken Rudin (Zynga) argued that the whole idea of having an &#8220;analytics team&#8221; is flawed.  He asked:  &#8221;Does Microsoft have an internet division?&#8221;  Like the internet, analytics is or should be fundamental to everything in the company.  So his goal is to work himself out of a job by putting analysts in development and product teams and by training almost everyone in the company in analytics starting with product managers, then engineers, and then quality assurance.</p>
<p>In summary, one of the key themes of the panel was that companies competing on analytics are finding innovative ways to integrate analytics throughout their companies.  One of the biggest problems companies face these days is that it is difficult to find good analytics people or data scientists.  Ken Rudin is looking for different kinds of people now as compared to five years ago.  Now he is looking for people with analytics and business abilities.  So for example, instead of just taking data that is given to you and looking for interesting patterns, you should be able to take a business goal like &#8220;increase player retention&#8221; and figure out how to do it, what data you need, and so on.</p>
<p>A recording of this panel is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1969">dyyno.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=483</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  Analyzing &#8220;Big Data&#8221;</title>
		<link>http://ororke.com/paul/blog/?p=489</link>
		<comments>http://ororke.com/paul/blog/?p=489#comments</comments>
		<pubDate>Sat, 10 Apr 2010 04:44:43 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=489</guid>
		<description><![CDATA[The panel on &#8220;Analyzing Big Data&#8221; at the SDForum Analytics Conference on &#8220;The Analytics Revolution&#8221; included representatives of two companies that analyze data on a petabyte scale (Joydeep Sen Sarma, Facebook and Joshua Klahr, Yahoo!) and two software companies that stand behind open source infrastructure components that are often used to build analytics platforms (Amr [...]]]></description>
			<content:encoded><![CDATA[<p>The panel on &#8220;Analyzing Big Data&#8221; at the SDForum Analytics Conference on &#8220;<a href="http://ororke.com/paul/blog/?p=293">The Analytics Revolution</a>&#8221; included representatives of two companies that analyze data on a petabyte scale (Joydeep Sen Sarma, Facebook and Joshua Klahr, Yahoo!) and two software companies that stand behind open source infrastructure components that are often used to build analytics platforms (Amr Awadalla, Cloudera/Hadoop and James Phillips, Northscale/Memcached and Membase).  The moderator, Owen Thomas of VentureBeat, started off by asking the panelists whether &#8220;big data&#8221; is a Silicon Valley phenomenon that will soon spread to the Fortune 500 and the rest of the world.</p>
<p><span id="more-489"></span>Amr Awadalla defined &#8220;big data&#8221; as 10 terabytes or more and noted that when Cloudera talks to prospective customers they often have &#8220;medium data&#8221; (less than 10 terabytes). Amr noted that individual nodes can have 4, 12, or soon even 24TB nodes.  So &#8220;medium data&#8221; problems don&#8217;t require a cluster or Hadoop at all just to deal with the size of the data.  The smallest Hadoop cluster requires three nodes. Most of Cloudera&#8217;s customers have tens to a few hundred node Hadoop clusters at the high end.  Facebook and Yahoo! have clusters with thousands of nodes.</p>
<p>Joshua Klahr said that Yahoo! collects tens of terabytes per day of ad data and weblogs telling them what ads are successful and what content is relevant.  Since introducing social applications and features, they have seen a dramatic increase in data because they are collecting text generated by users rather than just clicks.</p>
<p>Joydeep Sen Sarma said Facebook has 400 million users and collects over 12 terabytes of compressed data per day.  Facebook has over two petabytes of data.  Joydeep noted that the fact that things can now be measured that could not be measured previously and the fact that the value of web companies data per byte is much lower than for earlier kinds of companies have driven the collection of larger amounts of data and the shift toward open source platforms and tools.</p>
<p>Relatively few companies have petabyte scale data sets.  But the issue is not so much the size of the data.  The real issue is complexity.  Things like Hadoop are important not just because they enable companies to work with &#8220;stupendous&#8221; data sets but also (and more importantly) because they enable companies to work with complex datasets including data that has not been organized into a RDBMS or schema, and including text and weblogs.</p>
<p>The consensus of the panel was that &#8220;big data&#8221; and associated technologies already are spreading and will continue to spread.  Silicon Valley companies may be early developers and early adapters of technology for analyzing &#8220;Big Data&#8221; but it is spreading from web companies to other sectors including banking, games, government, and telecommunications and from the West to the East Coast and overseas.</p>
<p>Owen invited the panelists to comment on the NoSQL movement. Perhaps surprisingly, several panelists identified with NoSQL came out against the term in one way or another.</p>
<p>Amr Awadalla (Cloudera) prefers &#8220;NOSQL&#8221; (Not Only SQL) instead of &#8220;NoSQL&#8221; since more than half of all analysts use SQL. NoSQL doesn&#8217;t make sense if it argues against having SQL.  The main issue for him is &#8220;agility&#8221;:  the ability to make changes quickly and to be flexible with regard to how things are done.  Amr breaks this down into two kinds of agility:  &#8221;agility of data types&#8221; and &#8220;agility of language.&#8221;</p>
<p>Amr explained his concept of &#8220;agility of data types&#8221;:  When traditional RDBMs are used and rigid schemas are required, it is necessary to go thru a DBA whenever a change in the schema is required (for example, to add a new column) and this can take a long time.  Ditto for loading new data into the schema:  ETL is required to load the data and this can take too long.  NoSQL approaches have the benefit that they allow operating without a schema or they allow for changing schemata easily and quickly.</p>
<p>Amr&#8217;s concept of &#8220;agility of language&#8221; is:  Taking a purely SQL-based approach with traditional RDBMs&#8217;s is too inflexible.  approaches based on Hadoop can go beyond SQL and allow the use of programming languages more powerful than SQL that accomodate the preferences of your developers, (e.g., Java, Python, C, Perl).</p>
<p>James Phillips (Northscale) identified himself as a NoSQL advocate but said it is not about SQL:  it&#8217;s not about the query language.   The issues are really storage, scaling, and performance.  The ACID transaction guarantees provided by traditional RDBMSs come with performance and scalability costs and many applications don&#8217;t need the guarantees but rather need greater scalability and higher performance.</p>
<p>Joydeep said NoSQL is like a religion and he hates religion.  Although Hadoop is often considered to be part of NoSQL systems, his Hive project introduced a simplified version of SQL on top of Hadoop because many analysts prefer to work with SQL.  The real advance has not been to eliminate SQL but rather it is the breaking down or deconstruction of previously monolithic systems into separable components and layers. The components and layers include storage (e.g., the filesystem) and processing (e.g., indexing, mapreduce, query processing, and text processing). One can &#8220;rack and stack&#8221;and build systems out of the components according to ones needs.</p>
<p>Joshua said that as a Product Manager, he uses Excel extensively.  And he said that Yahoo finds it easier to find SQL coders than MapReduce programmers.  The consensus was that Excel and SQL are here to stay.</p>
<p>Owen asked the panelists how we can avoid having a &#8220;data priesthood&#8221; and how we can promote the &#8220;democratization of data.&#8221;  Several panelists referred to the panel on &#8220;Competing on Analytics at the Highest Level&#8221; because the practices of the analytics competitors on that panel addressed this issue.  In addition, several other ways to make data usable across the company were mentioned, for example providing tools that make it easier for people with various backgrounds and knowledge and skills to use data.  For example, Hadoop is written in Java but programmers more familiar with other languages like Python and SQL programmers can also use it (e.g., using Streaming or Hive).  Going forward, we will see more connections from Hadoop, Hive, and Pig to existing BI tools like Microstrategy (e.g., thru ODBC connectors under development by Cloudera and Facebook) and this should further &#8220;democratize the data.&#8221;</p>
<p>A recording of the panel discussion is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1972">dyyno.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=489</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  Online Controlled Experiments &#8211; Listening to the Customers, Not to the HiPPO</title>
		<link>http://ororke.com/paul/blog/?p=530</link>
		<comments>http://ororke.com/paul/blog/?p=530#comments</comments>
		<pubDate>Sat, 10 Apr 2010 04:14:17 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=530</guid>
		<description><![CDATA[Ronny Kohavi (Microsoft) started out by telling a famous true story about Greg Linden&#8217;s experience moving a recommender to the shopping cart at Amazon.  A Senior VP of Marketing vetoed Greg&#8217;s proposal fearing that it would distract customers from checking out and paying for the items already in their shopping basket reducing conversion.  This is [...]]]></description>
			<content:encoded><![CDATA[<p>Ronny Kohavi (Microsoft) started out by telling a <a href="http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html">famous true story</a> about Greg Linden&#8217;s experience moving a recommender to the shopping cart at Amazon.  A Senior VP of Marketing vetoed Greg&#8217;s proposal fearing that it would distract customers from checking out and paying for the items already in their shopping basket reducing conversion.  This is where the &#8220;HiPPO&#8221; in the title of Ronny&#8217;s presentation comes from.  It stands for the &#8220;Highest Paid Person&#8217;s Opinion&#8221; and sometimes for the person (e.g., the VP) holding the opinion.  The Amazon story had a happy ending because Jeff Bezos had established a corporate culture that allowed for experiments to be run so Greg was able to run an experiment to test the hypothesis of the HiPPO.  It turned out that conversions did indeed drop but the increased revenue due to customers purchasing recommended items was substantially greater than the loss.</p>
<p><span id="more-530"></span>The online controlled experiments advocated by Ronny including AB tests are like the trials used to test drugs and get at the causes of observed effects since all the experimental subjects are exposed to the same non-causal factors. Ronny ran the audience through a series of examples showing how difficult it is to make correct decisions about a series of alternative web page designs.  Since people are bad at evaluating proposals, especially in evaluating more novel innovations, it is important to test a lot of ideas, &#8220;fail fast&#8221; and try again quickly instead of doing elaborate planning and preparation in advance.</p>
<p>In Ronny&#8217;s years of experience, he has observed that people and organizations go through stages:  they go from hubris to getting insights through measurement followed by the &#8220;Semmelweis Reflex&#8221; and then fundamental understanding.  Initially, people are sure they know it all but then they realize it helps to do experiments and take measurements to get data.  The <a href="http://en.wikipedia.org/wiki/Semmelweis_reflex">Semmelweis reflex</a> is the reflex-like rejection of new knowledge because it contradicts entrenched beliefs.  It is named after <a href="http://en.wikipedia.org/wiki/Ignaz_Semmelweis">Ignaz Semmelweis</a> who proposed that doctors clean their hands to reduce the spread of infections but who was rejected in spite of the fact that he had data to support his claim.</p>
<p>Ronny&#8217;s &#8220;take home&#8221; points were:</p>
<ul>
<li>data trumps intuition &#8211; it&#8217;s hard to assess the value of ideas so do experiments;</li>
<li>get your organization to agree on what to optimize and use data to drive decisions.</li>
</ul>
<p>A video recording of Ronny&#8217;s presentation is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1968">dyyno.com</a>.  More technical information including information on how to conduct experiments is available at <a href="http://www.exp-platform.com">http://www.exp-platform.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=530</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  The Unreasonable Effectiveness of Data</title>
		<link>http://ororke.com/paul/blog/?p=295</link>
		<comments>http://ororke.com/paul/blog/?p=295#comments</comments>
		<pubDate>Sat, 10 Apr 2010 03:30:05 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=295</guid>
		<description><![CDATA[Peter Norvig focused on a major lesson learned at Google and elsewhere in recent years and gave a fascinating keynote presentation on &#8220;The Unreasonable Effectiveness of Data&#8221; at the SDForum conference on &#8220;The Analytics Revolution&#8221; April 9th, 2010.  The lesson is that data can be surprisingly effective:  it can be used to get [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Peter_Norvig">Peter Norvig</a> focused on a major lesson learned at Google and elsewhere in recent years and gave a fascinating keynote presentation on &#8220;The Unreasonable Effectiveness of Data&#8221; at the SDForum conference on &#8220;<a href="http://ororke.com/paul/blog/?p=293">The Analytics Revolution</a>&#8221; April 9th, 2010.  The lesson is that data can be surprisingly effective:  it can be used to get better performance improvements than one can get from improvements in algorithms.<br />
<span id="more-295"></span><br />
In contrast to Wigner&#8217;s &#8220;<a href="http://www.physik.uni-wuerzburg.de/fileadmin/tp3/QM/wigner.pdf">The Unreasonable Effectiveness of Mathematics in the Natural Sciences</a>&#8221; Norvig&#8217;s presentation pointed out that in biology, natural language, and other complex domains, often it does not pay to strive for elegant mathematical formulas or compact, simple models or theories.  And it never pays to waste time trying for perfect models because as George Box said &#8220;&#8230;all models are wrong, but some are useful.&#8221;  Relatively simple methods can often be used to take advantage of ample data to build useful models.  The models may be relatively complex but sometimes the data seems to demand this and even more laborious methods for constructing models &#8220;by hand&#8221; may produce results that are at least as complex and more brittle.  An example of a rule base for spelling correction taken from <a href="http://www.htdig.org/">HTDig</a> was shown and it seemed to be very complex.  Peter pointed out that it would be difficult to change that rule base to extend it to another language but it would be relatively easy in a more data-driven approach, you would just need a lot of examples in the new language.  Peter remarked that data-driven programming is the ultimate agile method.</p>
<p>In many cases three steps need to be taken: choosing a representation language, encoding a model in that language, and then performing inference on the model.  Peter summarized his recommended approach with the acronym DINO:  Data In Non-parametric model Out.  Google&#8217;s Seti system for using machine learning to acquire models by learning from massive data sets is described by Simon Tong in the Google research blog at &#8220;<a href="http://googleresearch.blogspot.com/2010/04/lessons-learned-developing-practical.html">Lessons Learned Developing a Practical Large Scale Machine Learning System</a>.&#8221;</p>
<p><a href="http://www.hpl.hp.com/personal/Jaap_Suermondt/">Jaap Suermondt</a> gave a counterexample later in the day in his closing keynote.  In the example, unmanageable amounts of data needed to be processed to solve an optimization problem.  It was a linear programming problem but if turned out to be a special case that had a more efficient solution.  Even so, it turned out to be necessary to improve on that for the special problem at hand in order to get a solution in a reasonable time.  In this case, they had tons of data but it was just clutter until an improved algorithm was found that made it possible to get what was wanted out of the data.</p>
<p>Peter&#8217;s response to this counterexample is that Google also invests time into improving their algorithms.  They have a lot of nearest neighbor problems and need to avoid searching for nearest neighbors so they invest effort into locality sensitive hashing resulting in a simple algorithm.  So they are not dogmatic.  Even so, the point is that it is surprisingly often the case that data is more important than programs.</p>
<p>In trying to capture the gist of Peter&#8217;s presentation, I have skipped over a lot of great examples and interesting points.  A complete video recording of Peter&#8217;s presentation provided by Dyyno is available at &#8220;<a href="http://www.dyyno.com/channel/sdforum#vod=1984">Analytics Conference &#8211; Keynote &#8211; Peter Norvig</a>.&#8221;  <a href="http://www.computer.org/portal/web/csdl/doi/10.1109/MIS.2009.36">&#8220;The Unreasonable Effectiveness of Data&#8221;</a> also appears as an &#8220;expert opinion&#8221; article published in IEEE Intelligent Systems by Alon Halevy, Peter Norvig, and Fernando Pereira, pp. 8-12, March/April, 2009.  Seeds of the notion that more data can be better or more important than trying for better algorithms on smaller datasets appeared in an earlier presentation Peter gave at PARC Forum in 2006 on &#8220;<a href="http://www.parc.com/event/499/web-search-as-a-product-of-and-catalyst-for-ai.html">Web Search as a Product of and Catalyst for AI</a>.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=295</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  Research in Analytics for Operational Impact at HP</title>
		<link>http://ororke.com/paul/blog/?p=560</link>
		<comments>http://ororke.com/paul/blog/?p=560#comments</comments>
		<pubDate>Sat, 10 Apr 2010 02:04:51 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=560</guid>
		<description><![CDATA[Jaap Suermondt (HP Labs) talked about how his lab uses analytics to support operations at HP. Jaap started with an example of procurement of disk drives.  HP ships two PCs per second and buys more disk drives than anyone else so they have to have accurate estimates of demand. This is roughly equivalent to predicting how [...]]]></description>
			<content:encoded><![CDATA[<p>Jaap Suermondt (HP Labs) talked about how his lab uses analytics to support operations at HP. Jaap started with an example of procurement of disk drives.  HP ships two PCs per second and buys more disk drives than anyone else so they have to have accurate estimates of demand. This is roughly equivalent to predicting how the economy and stock market will do. But by combining genetic algorithms and economic and statistical models, HP was able to predict that the economy would improve and demand would improve by five percentage points more than others predicted. This was the subject of a Business Week <a href="http://www.businessweek.com/magazine/content/09_25/b4136044140573.htm">article</a> and <a href="http://feedroom.businessweek.com/?fr_story=7f04419fd226cabaea1599e034e81a7d50e81f71">video</a> in 2009 that talked about how earlier genetic algorithms work was revived and applied to this problem.</p>
<p><span id="more-560"></span>HP also forecasts demand for labor so as to be able to avoid having to fire or hire people.  When people are overcommitted, attrition goes through the roof.  When people are idle, production costs are unnecessarily high and this is an issue in PC production because it is so competitive and the margins are low.</p>
<p>Another example involved maximizing the revenue from covered orders (RCO &#8211; Revenue Coverage Optimization). Trying to maximize revenue by individual products produces results worse than random.  It&#8217;s important to consider combinations because of dependencies between products (e.g., it may not be possible to assemble a highly profitable order for a customer because some relatively unprofitable item is unavailable). When HP ranked their products by importance to revenue coverage, they got a nice Pareto effect:  80% of the revenue with 25% of the products.  This work won the coveted INFORMS <a href="http://www3.informs.org/article.php?id=1586">Edelman prize in 2009</a>.</p>
<p>Interestingly, this example runs counter to the idea that a lot of data makes it a bad idea or unnecessary to improve one&#8217;s algorithm.  There was no shortage of data in this case and in fact there was so much data that the initial algorithm, an integer programming algorithm, was too slow to be useful. Reformulating the problem as a Lagrangian Relaxation problem shortened the solution time.  But the breakthrough in this case involved reformulating the problem as a bipartite graph flow problem and coming up with a new algorithm for solving this problem using the data in real time.</p>
<p>In addition to operations, Jaap&#8217;s lab covers collaborative filtering, customer segmentation, marketing analytics, personalization, recommendation, and so on. He views personalization as the killer app for consumer facing services. He argued that it is crucial to lead with the customer experience and make it a win-win so that companies don&#8217;t face angry customers and backlash against privacy violations.</p>
<p>Another non-operations example Jaap described involved analytics work with Stanford Children&#8217;s hospital that has saved over thirty children&#8217;s lives. They analyzed patient&#8217;s data to find indicators that children were in trouble and used them to trigger more rapid responses.  There is a huge number of preventable deaths (100,000!) in the United States each year so there is a lot of room for more work using analytics to help save lives.</p>
<p>The video for Jaap&#8217;s talk is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1987">dyyno.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=560</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  An Applied Researcher&#8217;s Perspective</title>
		<link>http://ororke.com/paul/blog/?p=556</link>
		<comments>http://ororke.com/paul/blog/?p=556#comments</comments>
		<pubDate>Sat, 10 Apr 2010 01:56:33 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=556</guid>
		<description><![CDATA[Jeffrey Kreulen emphasized the question &#8220;What is the business problem that you have to solve?&#8221; and noted that IBM likes to hire &#8220;T-shaped people,&#8221; with breadth and depth and also with experience in business as well as technology. Kreulen showed several interesting examples of the work of his group at IBM&#8217;s Almaden Research Center. One example [...]]]></description>
			<content:encoded><![CDATA[<p>Jeffrey Kreulen emphasized the question &#8220;What is the business problem that you have to solve?&#8221; and noted that IBM likes to hire &#8220;T-shaped people,&#8221; with breadth and depth and also with experience in business as well as technology. Kreulen showed several interesting examples of the work of his group at IBM&#8217;s Almaden Research Center. One example involved corporate brand and reputation analysis implemented in a system called COBRA. The system is used by IBM&#8217;s global professional services group and parts of it will be included in Cognos. The system uses sentiment analysis and taxonomies and text analytics plus influence analysis to listen to how people are talking about brands.</p>
<p><span id="more-556"></span>Another example involved applications of analytics to the US Patents database. Search was made available on the database by IBM on the web earlier but now they are doing analytics on this database with some customers for example to find which strategic areas competitors are investing R&amp;D into.</p>
<p>Jeffrey&#8217;s talk is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1985">dyyno.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=556</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics:  Leading the Analytics Revolution</title>
		<link>http://ororke.com/paul/blog/?p=506</link>
		<comments>http://ororke.com/paul/blog/?p=506#comments</comments>
		<pubDate>Sat, 10 Apr 2010 00:23:14 +0000</pubDate>
		<dc:creator>Paul O&#39;Rorke</dc:creator>
				<category><![CDATA[Meeting Notes]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[SDForum]]></category>

		<guid isPermaLink="false">http://ororke.com/paul/blog/?p=506</guid>
		<description><![CDATA[Sanjay Poonen gave a presentation on how SAP is &#8220;Leading the Analytics Revolution&#8221; at the first SDForum conference on analytics: &#8220;The Analytics Revolution&#8221; on 4/9/2010.  Sanjay claimed that although SAP is a large corporation, like Gerstner&#8217;s IBM, &#8221;the elephant can dance.&#8221;  SAP is a leader and the largest vendor in the analytics space and considered by [...]]]></description>
			<content:encoded><![CDATA[<p>Sanjay Poonen gave a presentation on how SAP is &#8220;Leading the Analytics Revolution&#8221; at the first SDForum conference on analytics: &#8220;<a href="http://ororke.com/paul/blog/?p=293">The Analytics Revolution</a>&#8221; on 4/9/2010.  Sanjay claimed that although SAP is a large corporation, like Gerstner&#8217;s IBM, &#8221;the elephant can dance.&#8221;  SAP is a leader and the largest vendor in the analytics space and considered by Garner to be the most visionary. Their approach is to focus on helping businesses solve their business problems rather than focusing on technology.<span id="more-506"></span></p>
<p>Sanjay pointed to a recent issue of &#8220;The Economist&#8221; as evidence that business people and not just information technology professionals are becoming more aware of analytics and business intelligence.  The issue had a cover story about &#8220;<a href="http://www.economist.com/opinion/displaystory.cfm?story_id=15579717">The Data Deluge</a>&#8221; that talked about how businesses have more data than ever before and they are just beginning to exploit it using Business Intelligence software, one of fastest growing categories of software.</p>
<p>One of the ways SAP is leading its competitors is by pushing <a href="http://www.ondemand.com/businessintelligence/">Business Intelligence &#8220;on demand&#8221;</a>, SAP&#8217;s SaaS (Software as a Service) offering.  You can learn more and try out a free personal edition <a href="http://www.ondemand.com/businessintelligence/learnmore/">here</a>. SAP is also leading by leveraging and working on disruptive emerging technologies including cloud computing, in-memory computation, and mobile technologies.</p>
<p>Sanjay predicted that analytics would be used to close the loop between strategy and execution.  He said that people have tended to focus on strategy almost exclusively or execution but both are important.  He quoted Thomas Edison&#8217;s dictum &#8220;Vision without execution is hallucination.&#8221;  SAP wants to help companies go around the loop from strategizing to planning to executing to optimizing based on the results of execution.  They will do this by providing an analytics and BI platform at the center of the loop that supports their customers at each step.</p>
<p>An interesting example of an application of SAP&#8217;s analytics involved improving sustainability and reducing the negative environmental impacts of a corporation.  Several other examples involved specific verticals such as government and health care.  Going forward, SAP believes that much of the growth in the future will involve targeting specific verticals so SAP is partnering with a large number of smaller companies to build industry-specific solutions on SAP&#8217;s platforms, including BI on-demand.</p>
<p>Sanjay&#8217;s presentation is available at <a href="http://www.dyyno.com/channel/sdforum#vod=1971">dyyno.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ororke.com/paul/blog/?feed=rss2&amp;p=506</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
