Implementing a Twitter Firehose in CDH5

Twitter Firehose

While trying to implement the tutorial from the series How-to: Analyze Twitter Data with Apache Hadoop I stumbled upon two issues: CDH installed using parcels, which was the recommended method. The tutorial assumes that the installation was performed using packages. As a consequence, most of the libraries and programs are installed differently. Because of CDH5 […]

ExpertDAY’14

All the speakers of the expertDAY'14 on stage.

Today I went to conference in Geneva called ExpertDAY’14, (Twitter #ed14da). The main topics were: Big Data, Mobility and Cloud. I attended the Big Data and Mobility tracks, I learned very interesting things. I have the feeling that commercials tools and service offerings that integrates various features of Big Data are now available. There is […]

Cloudera Beta 5 Installation

Cloudera Manager 5 main screen is very similar to the version 4.5

In a previous article I have explained how to create a simple 4-nodes Hadoop cluster using Cloudera 4. Cloudera has released a beta of the version 5, so I decided to give it a try! Installation The procedure remains unchanged, apart from the installer binary path. The following binary installer should be used: The command […]

How To Fix HBase Browser “Localhost:9090″ Error

This message is not very informative, but it means that Thrift has indeed not be configured for Hbase Browser.

In Hue 2.5, on Cloudera Manager 4.8, the HBase Browser is not configured to be operational out of the box. At first you only receive an error message: It took me some time, but I finally found in the documentation how to enable thrift for HBase Browser: Extract from Cloudera documentation: A Hue Service Enabling […]

Experimenting Hadoop with Real Datasets

This map contains the data produced by the hadoop MapReduce, the average temperature measured by  weather stations located in the US over a 30 years period.

Introduction I have now an operational Cloudera Hadoop cluster with 4 nodes, as described in my previous post. My objective is to use interesting data to experiment MapReduce algorithms. This article presents how I selected the datasets, imported them in Hadoop, developed and ran the MapReduce Java programs, graphed and analyzed the results using R. […]

Creating A Simple Hadoop Cluster With VirtualBox

Four Virtual Machines running on VirtualBox, ready to be setup in the Cloudera cluster.

I wanted to get familiar with the big data world, and decided to test Hadoop. Initially I used Cloudera’s pre-built virtual machine with their full Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and informative experience. The QuickStart VM is fully functional and you can test […]