Implementing a Twitter Firehose in CDH5

While trying to implement the tutorial from the series How-to: Analyze Twitter Data with Apache Hadoop I stumbled upon two issues: CDH installed using parcels, which was the recommended method. The tutorial assumes that the installation was performed using packages. As a consequence, most of the libraries and programs are installed differently. Because of CDH5 […]

Experimenting Hadoop with Real Datasets

Introduction I have now an operational Cloudera Hadoop cluster with 4 nodes, as described in my previous post. My objective is to use interesting data to experiment MapReduce algorithms. This article presents how I selected the datasets, imported them in Hadoop, developed and ran the MapReduce Java programs, graphed and analyzed the results using R. […]