Implementing a Twitter Firehose in CDH5

Twitter Firehose

While trying to implement the tutorial from the series How-to: Analyze Twitter Data with Apache Hadoop I stumbled upon two issues: CDH installed using parcels, which was the recommended method. The tutorial assumes that the installation was performed using packages. As a consequence, most of the libraries and programs are installed differently. Because of CDH5 […]

ExpertDAY’14

All the speakers of the expertDAY'14 on stage.

Today I went to conference in Geneva called ExpertDAY’14, (Twitter #ed14da). The main topics were: Big Data, Mobility and Cloud. I attended the Big Data and Mobility tracks, I learned very interesting things. I have the feeling that commercials tools and service offerings that integrates various features of Big Data are now available. There is […]

Cloudera Beta 5 Installation

Cloudera Manager 5 main screen is very similar to the version 4.5

In a previous article I have explained how to create a simple 4-nodes Hadoop cluster using Cloudera 4. Cloudera has released a beta of the version 5, so I decided to give it a try! Installation The procedure remains unchanged, apart from the installer binary path. The following binary installer should be used: The command […]

Experimenting Hadoop with Real Datasets

This map contains the data produced by the hadoop MapReduce, the average temperature measured by  weather stations located in the US over a 30 years period.

Introduction I have now an operational Cloudera Hadoop cluster with 4 nodes, as described in my previous post. My objective is to use interesting data to experiment MapReduce algorithms. This article presents how I selected the datasets, imported them in Hadoop, developed and ran the MapReduce Java programs, graphed and analyzed the results using R. […]

Creating A Simple Hadoop Cluster With VirtualBox

Four Virtual Machines running on VirtualBox, ready to be setup in the Cloudera cluster.

I wanted to get familiar with the big data world, and decided to test Hadoop. Initially I used Cloudera’s pre-built virtual machine with their full Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and informative experience. The QuickStart VM is fully functional and you can test […]