Implementing a Twitter Firehose in CDH5

While trying to implement the tutorial from the series How-to: Analyze Twitter Data with Apache Hadoop I stumbled upon two issues: CDH installed using parcels, which was the recommended method. The tutorial assumes that the installation was performed using packages. As a consequence, most of the libraries and programs are installed differently. Because of CDH5 […]

Cloudera Beta 5 Installation

In a previous article I have explained how to create a simple 4-nodes Hadoop cluster using Cloudera 4. Cloudera has released a beta of the version 5, so I decided to give it a try! Installation The procedure remains unchanged, apart from the installer binary path. The following binary installer should be used: The command […]

Experimenting Hadoop with Real Datasets

Introduction I have now an operational Cloudera Hadoop cluster with 4 nodes, as described in my previous post. My objective is to use interesting data to experiment MapReduce algorithms. This article presents how I selected the datasets, imported them in Hadoop, developed and ran the MapReduce Java programs, graphed and analyzed the results using R. […]

Creating A Simple Hadoop Cluster With VirtualBox

I wanted to get familiar with the big data world, and decided to test Hadoop. Initially I used Cloudera’s pre-built virtual machine with their full Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and informative experience. The QuickStart VM is fully functional and you can test […]