We love to talk
iDataSci
  • Home
  • Blog
  • Information
    • Presentations
    • Videos
    • Roadmap
    • Books
    • Education
    • Blog
  • About
    • About Us
    • Banners
    • Contact Us

Raspberry Flavoured Hadoop (Part 1)

12/11/2013

0 Comments

 
Hadoop isn't that easy for a beginner to learn. It's a relatively new environment and the instructions tend to assume the implementer is quite computer literate and a fair number of Linux skills.  
Picture
I'm a firm believer that the best way to learn things is by doing them., and especially having to fix them when it all goes horribly wrong :-).

So when I started looking at Hadoop a while ago I decided that the best way to learn it was to build an Hadoop cluster. That presented a number of problems. The first was of course, what to build it on.

To build a meaningful cluster your going to need at least five or six machines to build it on. There are various ways you can do this.
  • You can do it using virtual machines, and in fact this is probably the easiest way to do it. If you look around any number of people will offer you pre-built Hadoop VMs for you to play with. But that breaks the first rule of learning, your not doing the install so your not going to learn anything about how you install Hadoop and it's inner workings.  You can certainly build your own VMs, but that divorces you from the hardware :-(
  • You can do it on a Cloud Service such as Amazon EC2 - but that can get expensive and it's still divorcing you from the hardware :-(
  • You can build it on a number of second hand or scrounged PCs. This'll certainly work and you will definitely get yours hand dirty with the hardware - probably very dirty as you clean out several years worth of grim that always infests older PCs. There are other disadvantages to this approach that may not be immediately obvious. The cost of running 5 or 6 PCs, the heat they generate, the amount of desk space they take up, and the objections from your better half about the jet engine like noise from the fans as you start them all up. A colleague of mine who followed this approach used to start his cluster up remotely for demo purposes but had to stop when his wife threatened to disassemble it if it he wasn't present when it started.

Picture
Picture
So what's the alternative?

Meet the Raspberry Pi, a credit-card sized computer that was launched about 18 months ago by the Raspberry Pi Foundation as an education tool. It's a complete computer with an ARM CPU, 512MB RAM, video, 10/100Mb ethernet, USB ports and SC card storage on a single board the size of a credit card. And the killer bit - it costs $35 (about £25).

You see where I'm going with this :-D

Make no mistake about it - there are challenges to using the Raspberry Pi - it's very resource limited. The CPU is a 700MHz ARM processor, the RAM is only 512MB and the network is only 100Mb. But overcoming challenges helps you learn - though you may lose some hair in the process ;-)

There's a great quote from Meet The Robinsons - "From failure you learn; from success, no so much". Implementing an Hadoop cluster on Raspberry Pi's certainly provided me with some failures :-)

To get started I built a single node setup - the good news is the hardware only costs about £40. A Raspberry Pi model B, a 16GB SD Card, a PSU and a network cable.

TIP: Only buy quality SD Cards and try to get a Class 10 card. I know a lot of people have problems with SD cards corrupting on Raspberry Pi's. So far I haven't had this happen to me (touch wood). 

Picture
And just to be really adventurous I decided that I'd install Hadoop 2.2 as it the latest release version. Why is this adventurous? There are various blog entries around on how to install Hadoop 1.x, 2.0 & 2.1 (beta) on Raspberry Pi's but nothing on 2.2, and 2.2 introduced some changes in Hadoop that affects the installation.

Of course I did not do this in isolation and I used as my starting point many of the great blogs posts from people round the world who have installed Hadoop and have installed earlier versions of Hadoop on Raspberry Pi's. For example Michael G. Noll, Toby Myer, Rasesh Mori, Y12 Studio, raspberrypicloud, and Sarah Secret.jp. Thanks for sharing, guys!

Part 2 will cover the single node install. Part 3 will cover the multi-node hardware & Part 4 will cover the multi-node install.
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    Jamie has been hooked on technology since building his first electronic circuit back in the 70s. It used transistors, not valves, despite what the rumours say ;)

    Archives

    November 2013

    Categories

    All
    Data Science
    Hadoop
    Ignite
    Presentation
    Raspberry Pi

    RSS Feed


© 2013 iDataSci - All materials on this site are copyright and if used elsewhere must be credited to source.
Note: The views expressed on this site are those of the individuals concerned and do not necessarily reflect those of their employers.
Proudly powered by Weebly