Implementing a Twitter Firehose in CDH5

While trying to implement the tutorial from the series How-to: Analyze Twitter Data with Apache Hadoop I stumbled upon two issues: CDH installed using parcels, which was the recommended method. The tutorial assumes that the installation was performed using packages. As a consequence, most of the libraries and programs are installed differently. Because of CDH5 […]


Today I went to conference in Geneva called ExpertDAY’14, (Twitter #ed14da). The main topics were: Big Data, Mobility and Cloud. I attended the Big Data and Mobility tracks, I learned very interesting things. I have the feeling that commercials tools and service offerings that integrates various features of Big Data are now available. There is […]

Cloudera Beta 5 Installation

In a previous article I have explained how to create a simple 4-nodes Hadoop cluster using Cloudera 4. Cloudera has released a beta of the version 5, so I decided to give it a try! Installation The procedure remains unchanged, apart from the installer binary path. The following binary installer should be used: The command […]

How To Fix HBase Browser “Localhost:9090” Error

In Hue 2.5, on Cloudera Manager 4.8, the HBase Browser is not configured to be operational out of the box. At first you only receive an error message: It took me some time, but I finally found in the documentation how to enable thrift for HBase Browser: Extract from Cloudera documentation: A Hue Service Enabling […]

Experimenting Hadoop with Real Datasets

Introduction I have now an operational Cloudera Hadoop cluster with 4 nodes, as described in my previous post. My objective is to use interesting data to experiment MapReduce algorithms. This article presents how I selected the datasets, imported them in Hadoop, developed and ran the MapReduce Java programs, graphed and analyzed the results using R. […]

Creating A Simple Hadoop Cluster With VirtualBox

I wanted to get familiar with the big data world, and decided to test Hadoop. Initially I used Cloudera’s pre-built virtual machine with their full Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and informative experience. The QuickStart VM is fully functional and you can test […]