Tag: Hadoop
Flurry’s Mobile App Analytics
by Paul O'Rorke on Oct.21, 2010, under Meeting Notes
Peter Farago and Sean Byrnes gave a juicy and surprising presentation about Flurry‘s mobile app analytics at the SDForum Business Intelligence Special Interest Group meeting on 10/19/2010 in Palo Alto. The title of their presentation was: ”Your Company’s Mobile App Blind Spot” and it provided both business and technical insights.
Flurry made a big splash in the news when Steve Jobs got pissed off at them and called them out by name in an interview because they outed Apple’s iPad when it was still a closely guarded secret. (See a short video outtake of the interview at VentureBeat.) Apple responded by changing legal agreements to exclude some third party analytics and some advertising.
Web Mining in the Cloud
by Paul O'Rorke on Nov.01, 2009, under Meeting Notes
Ken Krugler gave an interesting presentation on elastic web data mining at the 2009 Silicon Valley Data Mining Camp. Ken is the founder of Bixo Labs, Inc. Ken’s session was part of the half-day “unconference” organized by the Bay Area ACM at the Hacker’s Dojo in Mountain View on Sunday, November 1st, 2009.
(continue reading…)
Apache’s Mahout Project
by Paul O'Rorke on Apr.21, 2009, under Meeting Notes
Jeff Eastman gave a presentation on Mahout at the SDForum Business Intelligence Special Interest Group’s meeting on April 21st, 2009. Mahout is a collection of machine learning algorithms adapted for use on very large data sets using the Hadoop map-reduce platform. Jeff’s presentation “BI Over Petabytes: Meet Apache Mahout” gave a good introduction to Mahout and a snapshot of the current status. His slides are available here and in the SDForum Archives. (continue reading…)
Hadoop Boot Camp
by Paul O'Rorke on Mar.06, 2009, under Meeting Notes
Scale Unlimited held its first public “Hadoop Boot Camp” at the Plug and Play Center in Redwood City on March 5th and 6th, 2009. Hadoop is an Apache open source project used by Yahoo that includes a bundle of related sub-projects supporting distributed computing using MapReduce. It is becoming a “virtual OS for your data center” for many large distributable problems. Yahoo is a major contributor and uses Hadoop extensively on large clusters. Yahoo and Hadoop won the Terabyte sort benchmark contest in 2008 (the first Java and open-source entrant to win) using 910 nodes with two quad core Xeons per node. Hadoop has been used on a two thousand node cluster and the current design goal is 10,000 nodes.
Scale Unlimited is a new company specializing in Hadoop training and Principals Chris Wensel and Stefan Groschupf serve as friendly “Drill Sergeants.” Their two day training session includes hands-on labs as well as lectures and it is a great way to learn a lot about Hadoop Core and related technologies in a short period of time. They strike a nice balance by making everything compact and concentrated while avoiding making things indigestible, opaque, or overwhelming.
Four Case Studies in R
by Paul O'Rorke on Feb.18, 2009, under Meeting Notes
Michael Driscoll and Jim Porzak organized an excellent panel on “Case Studies in R” at Predictive Analytics World at the Hotel Nikko on Mason Street in San Francisco, February 18th, 2009. Actually, they couldn’t resist having fun yet again with the name “R” and the actual title was “The R and Science of Predictive Analytics: Four Case Studies in R.” Jim and Mike organize the Bay Area useR Group and this meeting was their 2009 kickoff. Driscoll is a Principal in a Business Analytics startup called Dataspora. Michael chaired the session and Jim, now at The Generations Network, gave a quick overview of R and served as one of the four panelists. The other three panelists were: Bo Cowgill from Google, Itamar Rosenn from Facebook, and David Smith from Revolution Computing. (continue reading…)
Functional Programming on the Rise
by Paul O'Rorke on Dec.07, 2008, under Reviews
According to Michael Swain, Editor at Large of Dr. Dobb’s Journal, a paradigm shift is underway and functional programming is “on the verge of becoming a must-have skill.” In the cover story of the January, 2009 issue of Dr. Dobb’s Journal, “It’s Time to Get Good at Functional Programming” Swain argues that functional programming is better suited to parallel computation than procedural and object-oriented programming and will be needed to more fully exploit multi-core and multi-CPU computer systems.