Banks are not as Slow as You Might Think
by Paul O'Rorke on Jan.14, 2010, under Meeting Notes
David Newman, a Strategic Planning Manager at Wells Fargo Bank, gave a presentation on applications of “semantic technology” for financial services at a Meetup in San Francisco on January 14th, 2010. His presentation contradicted the stereotypes of financial institutions in general and of Wells Fargo in particular as being cautious, behind the times, and slow to adopt advanced information technology.
(continue reading…)
New Uses for WordNet
by Paul O'Rorke on Jan.13, 2010, under Meeting Notes
At the SF Freebase Meetup on January 13th, 2010, Jamie Taylor briefly described WordNet, the lexical database of words developed by George Miller and colleagues at Princeton. Jamie recently put WordNet 3.0 into Freebase, an open symantic database created by Metaweb Technologies. The aim of the Freebase project is to enlist a global community and to include much of the world’s knowledge in a relatively coherent and well-structured form in a huge database that everyone can access. The inclusion of WordNet is a powerful addition to the substantial body of information already in Freebase.
(continue reading…)
Web Mining in the Cloud
by Paul O'Rorke on Nov.01, 2009, under Meeting Notes
Ken Krugler gave an interesting presentation on elastic web data mining at the 2009 Silicon Valley Data Mining Camp. Ken is the founder of Bixo Labs, Inc. Ken’s session was part of the half-day “unconference” organized by the Bay Area ACM at the Hacker’s Dojo in Mountain View on Sunday, November 1st, 2009.
(continue reading…)
Analytics Behind LinkedIn
by Paul O'Rorke on Jul.21, 2009, under Meeting Notes
DJ Patil talked about the “Analytics Behind LinkedIn: A New Model for Analytics and Business Intelligence” at the SDForum BI SIG meeting on Tuesday, July 21st, 2009. Patil is the Chief Scientist at LinkedIn and plays several other important roles including Sr. Director of Product Analytics and Chief Information Security Officer. Patil is in charge of analytics reporting on LinkedIn’s businesses (and has some responsibility for P&L – profit and loss) and also is in charge of product analytics. While many companies including startups strive to be “data-driven” Patil’s presentation focused on LinkedIn’s efforts to build products leveraging data.
(continue reading…)
Apache’s Mahout Project
by Paul O'Rorke on Apr.21, 2009, under Meeting Notes
Jeff Eastman gave a presentation on Mahout at the SDForum Business Intelligence Special Interest Group’s meeting on April 21st, 2009. Mahout is a collection of machine learning algorithms adapted for use on very large data sets using the Hadoop map-reduce platform. Jeff’s presentation “BI Over Petabytes: Meet Apache Mahout” gave a good introduction to Mahout and a snapshot of the current status. His slides are available here and in the SDForum Archives. (continue reading…)
The Data Architecture of Force.com
by Paul O'Rorke on Mar.25, 2009, under Meeting Notes
Craig Weissman, Chief Software Architect (and just announced new CTO) of Salesforce.com, gave a presentation on “The Magic of Multitenancy: Under the Covers of the Data Architecture of Force.com” at the SDForum Software Architecture and Modeling Special Interest Group in Palo Alto on March 25th, 2009. This talk shared a lot of content with an earlier presentation at PARC Forum given by Todd McKinnon, ex SVP for Software Development at Salesforce. This is because both presentations derive from an earlier presentation given by Craig at Dreamforce 2008. However, Craig went into a bit more depth and detail in part because he was in a better position to do so, having invented some of the technology, and in part because of the relatively large number of software architects present and because of the interactive nature of SDForum SIG meetings. (continue reading…)
The Big Switch
by Paul O'Rorke on Mar.16, 2009, under Reviews
Nicholas Carr’s 2008 book “The Big Switch: Rewiring the World, from Edison to Google” follows up after his earlier book “Does IT Matter?” claimed that information technology is becoming a commodity like electricity and IT doesn’t matter anymore: it no longer provides a competitive advantage. ”The Big Switch” claims that computing is turning into a utility and the use of computers is undergoing a transformation similar to what happened when the production of electricity was centralized. Currently most companies have their own data centers just as factories used to have their own electrical power generators but in the future nearly all computing will be supplied by utilities like the data centers that have emerged at Amazon, Google, and other next-generation internet service providers. (continue reading…)
Hadoop Boot Camp
by Paul O'Rorke on Mar.06, 2009, under Meeting Notes
Scale Unlimited held its first public “Hadoop Boot Camp” at the Plug and Play Center in Redwood City on March 5th and 6th, 2009. Hadoop is an Apache open source project used by Yahoo that includes a bundle of related sub-projects supporting distributed computing using MapReduce. It is becoming a “virtual OS for your data center” for many large distributable problems. Yahoo is a major contributor and uses Hadoop extensively on large clusters. Yahoo and Hadoop won the Terabyte sort benchmark contest in 2008 (the first Java and open-source entrant to win) using 910 nodes with two quad core Xeons per node. Hadoop has been used on a two thousand node cluster and the current design goal is 10,000 nodes.
Scale Unlimited is a new company specializing in Hadoop training and Principals Chris Wensel and Stefan Groschupf serve as friendly “Drill Sergeants.” Their two day training session includes hands-on labs as well as lectures and it is a great way to learn a lot about Hadoop Core and related technologies in a short period of time. They strike a nice balance by making everything compact and concentrated while avoiding making things indigestible, opaque, or overwhelming.
Four Case Studies in R
by Paul O'Rorke on Feb.18, 2009, under Meeting Notes
Michael Driscoll and Jim Porzak organized an excellent panel on “Case Studies in R” at Predictive Analytics World at the Hotel Nikko on Mason Street in San Francisco, February 18th, 2009. Actually, they couldn’t resist having fun yet again with the name “R” and the actual title was “The R and Science of Predictive Analytics: Four Case Studies in R.” Jim and Mike organize the Bay Area useR Group and this meeting was their 2009 kickoff. Driscoll is a Principal in a Business Analytics startup called Dataspora. Michael chaired the session and Jim, now at The Generations Network, gave a quick overview of R and served as one of the four panelists. The other three panelists were: Bo Cowgill from Google, Itamar Rosenn from Facebook, and David Smith from Revolution Computing. (continue reading…)
Business Intelligence on a Budget: Open Source BI
by Paul O'Rorke on Feb.17, 2009, under Meeting Notes
I gave a presentation titled ”Business Intelligence on a Budget: Open Source Business Intelligence” (OSBI) at the monthly meeting of the SDForum BI Special Interest Group on February 17th, 2009. The presentation provided a quick snapshot of the current state of OSBI software. In general, open source software has become an accepted and integral part of many businesses’ core web applications and services. The trend toward open source has accelerated recently in the BI space as well. With open source offerings reaching new levels of maturity and trustworthiness and with total costs on the order of a tenth of the cost of closed source software, open source BI is expected to grow much faster than the overall BI market. The adoption of OSBI software is expected to triple by 2012. With budgets tightening in the ongoing recession, many companies would do well to consider whether they can save money by using OSBI software. (continue reading…)