Pages

Wednesday, November 4, 2020

Home built Hadoop analytics cluster: Part 5

Home built Hadoop analytics cluster: Part 5

Got the mysql database installed and configured on my secondary node.  Installed the driver on the primary node.  Set up a few users and a database.  Tested the connections.

Now hopefully all goes well with the install!



Sunday, November 1, 2020

Home built blob storage server

In the hopes of creating a "blob" like storage like Amazon S3, I recently did a Google for open source blob storage.  To my pleasure, I discovered minio.  Minio allows me to expose an S3 compatible service locally on my home network.  I can now work with large datasets in a S3-like fashion locally without having the overhead of dealing with an Internet connection.

I can also set up Minio to be a gateway to Amazon S3 or even to my local Hadoop cluster.

I also am able to set up the AWS CLI to interact with minio or have the minio client interact with AWS S3.

While redundancy can be implemented with minio, I'll save that as a project for later.

I picked up a Raspberry Pi 4 8 GB model from Amazon ($150) and a 8 TB external USB drive from Costco ($120).  One can always step down to a lower model / storage space if needed - just couldn't resist the savings on an 8 TB drive from Costco. :)

I downloaded the Raspbian Lite image, set up regionalization and my hostname.  Attached the USB drive and created a brand new ext3 partition on the USB drive, wiping out everything else.  Formatted and attached the drive and made sure it came up on reboots.


Then I downloaded minio using the wget process.  

    wget https://dl.minio.io/server/minio/release/linux-arm/minio
    wget https://dl.minio.io/client/mc/release/linux-arm/mc
    sudo ln -s /home/pi/minio /usr/bin/minio
    sudo ln -s /home/pi/mc /usr/bin/mc


Then I made a simple shell script (start.sh) to launch minio.

    #!/bin/bash
 
    export MINIO_ACCESS_KEY=SuperSecretAccessKey
    export MINIO_SECRET_KEY=SuperSecretSecretKey
    export MINIO_DOMAIN=blobstorage
    export MINIO_DISK_USAGE_CRAWL=off
 
    /usr/bin/minio server /mnt/data


Then I made a file 

    /etc/systemd/system/minio.service

Inside the file I put the following:

    [Unit]
    Description=Minio Storage Service
    After=network-online.target mnt-data.mount

    [Service]
    ExecStart=/home/pi/start.sh
    WorkingDirectory=/mnt/data
    StandardOutput=inherit
    StandardError=inherit
    Restart=always
    User=pi
 
    [Install]
    WantedBy=multi-user.target


I then verified the service worked as intended:

    $ sudo systemctl start minio
    $ sudo systemctl status minio

Opened my browser, and once I logged in, I was able to access the minio service via browser and I made an alias for my minio client.



Wednesday, October 28, 2020

Home built Hadoop analytics cluster: Part 4

Home built Hadoop analytics cluster: Part 4

So yay!  As mentioned for my next goals in my previous post, I finally got the remaining two boxes built out and added into my home network.  I opted to put the Hadoop cluster on it's own subnet with a dedicated unmanaged switch only for the cluster (primary and nodes).

I added the agent and metrics to all of the nodes and rebooted the servers.


Then I followed the instructions to set up the cluster, naming it "ds730" after the class that I'm currently taking - DS730: Big Data - High Performance Computing.

I also made sure I had DNS setup correctly by modifying /etc/systemd/resolved.conf and fixed my name resolution issues.

Removed firewall rules.

Removed timedatectl by doing: sudo timedatectl set-ntp no

Then installed ntp: sudo apt install ntp

Now I need to look at installing some database drivers, however I think I'm going to call it a night.

 



Sunday, October 18, 2020

Home built Hadoop analytics cluster: Part 3

Home built Hadoop analytics cluster: Part 3

In my previous post I covered Bill of Materials (BOM), hardware assembly, and installing Linux (Ubuntu).  In this post I will cover how I installed Ambari.

Installing Ambari
Rather than build from source, I opted to use the distribution from Cloudera (formerly HortonWorks). Ambari 2.7.5 requires official support from Cloudera, so I went down to 2.7.3 which doesn't require a support agreement with Cloudera.

Install some pre-requisites
sudo apt install python-dev
sudo apt install gcc

Add Cloudera as a distribution
sudo wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu18/2.x/updates/2.7.3.0/ambari.list
sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
sudo apt-get update

Verify packages show up
apt-cache showpkg ambari-server
apt-cache showpkg ambari-agent
apt-cache showpkg ambari-metrics-assembly

Install and setup the server on primary
sudo apt-get install ambari-server 

Install and setup agent / metrics on all
sudo apt-get install ambari-agent 
sudo apt-get install ambari-metrics-assembly
 

Cloudera also had some instructions that I followed on how to configure the Ambari server once installed.

Next up will be building the remaining boxes of the cluster and installing the agent on those.


Setbacks

So, for some reason some recent updates to Ubuntu decided to cause the resolution of my monitor to go down.

So I tried fixing it doing some updates, and lo and behold, nothing worked.

Then on top of that, my virtual machine that I was using for my DS730 class decided to go belly up and didn't come back after 2 reboots.

Not having a good day.

So, now I'm going to document the installation of my workstation using a blog post so I have some documentation to fall back onto when things go haywire again.  It looks like I'm going to reinstall Ubuntu again.

Then I need to hope that rebooting the virtual machine one last time will work.



Tuesday, October 13, 2020

Home built Hadoop analytics cluster: Part 2

Home built Hadoop analytics cluster: Part 2

In my previous post, I went through my overall plan that I will be following, along with goals and topics that I will be covering.  In this post, I will cover the initial building out of the cluster.

[Bill of Materials - BOM]
[Hardware assembly]
[Installing and configuring Linux]


Bill of Materials - BOM

ItemPriceLink
CPU$139.99Amazon
Motherboard$71.99Amazon
Memory (32 GB)
$109.99
Amazon
Storage (500 GB)
$57.99
Amazon
Power Supply (600W)
$62.99Amazon
Case$68.99Amazon
Total$511.94*** Total estimated price as of 10/12/2020
Does not include shipping/taxes

Obviously, you can swap out components as you see your needs fit.  I did not want to make a high end workstation with GPU, opting to use a CPU that had graphics built in.  I did opt to get 32 GB memory and 500 GB storage - I could have gone down to 16 GB for memory and 250 GB for storage, but I feel that memory and storage is something that I always seem to want more of.

Hardware assembly

I found the hardware assembly to be very straight forward.  Everything was compatible and I didn't have to upgrade the BIOS or do anything fancy.  The case worked well with the motherboard.  The only thing I wished for, was another fan power on the motherboard, as it only has one.  But overall, I'm pleased with the ease of assembly - it took less than 3 hours to put together and install Linux.  I now have a working process to build the secondary nodes of the Hadoop cluster.  Time to go back to Amazon and order some more parts! :)




Installing and configuring Linux

I made the decision to use Ubuntu 18.04.05 (Bionic Beaver) LTS.  While there are different flavors of Linux, Ubuntu is the distribution that seems to work well in the world of data science, and one that I've used often over the years and am most comfortable with.

I opted for the server installation ISO, as I don't want the overhead of running a GUI on the boxes and downloaded the ISO from here.

Once I did that, I followed the instructions to make a bootable USB stick.  They have instructions for Ubuntu, macOS, and Windows.

I have Firewalla Gold firewall on my home network, so I am thinking of creating a separate subnet for the Hadoop cluster.  I will most likely pick up a switch and some network cable for these machines to keep it separate from the rest of the home network.

I did install sshd so that I can run the box "headless", that is, without a monitor, keyboard and mouse, and do a remote login into the server.  I configured the server with the name of "hadoop-primary".  Make sure to update the firewall rules as well, e.g. sudo ufw allow ssh.

I held off on doing any upgrades to the server distribution, as I want to make sure that I don't upgrade and conflict with the Hadoop requirements.

I was able to ssh into the box from my Mac desktop.  The next step will be to install Ambari and Hadoop, which I'll cover in my next post.


Saturday, October 10, 2020

Hadoop Reading Material

Hadoop Reading Material

I'm starting to really get into my DS730 - Big Data: High Performance Computing class. I wanted to go beyond the instructors material and picked up some additional reading material.

Hoping this will help me be successful in the weeks to come.

GTC 2020: Jetson Nano

I attended GTC 2020 online and they had a set of sessions that talked about the Jetson Nano.  I realized I had let my projects go to the wayside and I wanted to get back in the fold. I picked up a Jetson Nano kit from Sparkfun as I wanted to go through V2 of the course.  (I completed V1 of the course already earlier.)

I managed to get some cool swag from GTC 2020 - mug, t-shirt and some Lego blocks.


Sparkfun was prompt in getting the order into my hands and I was able to unbox it earlier this evening.

NVIDIA has a great getting started with the Jetson Nano page, so I followed that to get the system up and running.

I'm going to document my experience going through the V2 course in my next post.



Friday, October 9, 2020

Home built Hadoop analytics cluster: Part 1

Home built Hadoop analytics cluster: Part 1



Background
I am pursuing the UW Master of Science in Data Science degree from UW-Wisconsin - University of Wisconsin System / UW Extended Campus.  My home school is at UW La Crosse.  For the Fall 2020 semester, I am taking DS730 - Big Data: High-Performance Computing.

I have been learning how to implement algorithms that allow for the distributed processing of large data sets across computing clusters; creating parallel algorithms that can process large data sets; and using tools and software such as Hadoop, Pig, Hive, and Python to compare large data-processing tasks using cloud-computing services e.g. AWS.

One of the things that struck me so far in this course is how everything was "setup" for me in this course.  While I understand the intent of the course is to focus on the programming skills to handle big data within data science, part of me wants to focus on how the application gets set up in the first place.  To me, part of the learning exercise is to put together the actual infrastructure / software in addition to the programming.  This way, when issues occur in the infrastructure / software, I can have some foundational skills to do some basic troubleshooting.

In class, we were given a Docker container and deployed it to a t3.xlarge instance (single-node) Hadoop cluster.  What I'd like to do is to expand beyond that and build out an actual 3-4 node cluster.  I will dedicate the servers that I build to Hadoop, rather than build and deploy a container.  One thing that I like that the instructors of my class did well was to show how Ambari could be used to manage the Hadoop cluster.  As a result, I will be leveraging Ambari in my home-built Hadoop solution to help administer and deploy the cluster.

Project Summary
So what are goals of doing this?  The objectives of doing this project is to:
  • Build a Hadoop cluster from the "ground up"
  • Learn Hadoop Administration
  • Learn how to use Python for Map Reduce problems
  • Practice writing Pig commands
  • Start poking around with larger data sets outside of my DS730 class
Overall it is my intent to have blog posts that cover the following:
  • Building out the cluster
  • Installation of the software stack
  • Client installation that covers connectivity to the cluster
  • Explore R and Python options for using the cluster
  • Data import and cleansing
  • Plotting and analytics projects making use of data from the cluster

Building the cluster
I went through the exercise of building a single node cluster, as outlined in Data Analytics with Hadoop: An Introduction for Data Scientists on my local Linux system.  My initial thoughts were to build a 4 node cluster, to model a realistic cluster layout.  However due to budgetary constraints, it looks like I will start with 2, maybe 3 nodes.  I set a target price of $400 per node is a system with 4 cores, 32 GB of memory and 250 GB SSD storage.  

In doing some building online, I realized that a $400 target was too low.  My memory, CPU and storage came from Newegg.com for $286.92.  My case, power supply and motherboard came from Amazon for $216.23.  Total out the door price with shipping was $503.15.

In my next post, I will list out the BOM, cover the actual building of the hardware and setting up Linux.

Tuesday, October 6, 2020

NVIDIA GTC

Over the next few days I am planning on attending the GTC (GPU Technology Conference) being hosted by NVIDIA.

https://www.nvidia.com/en-us/gtc/

Looking forward to some useful sessions to help with the Jetson line.

One of the skills I'd like to acquire is to train an model and then optimize it for the Jetson device.

More to come!



Saturday, October 3, 2020

Self Driving Car: Learning Steps

Linux Version

lsb_release -a

I determined I'm running Ubuntu 18.04.

Docker

I took the time to download Docker for Ubuntu and followed the instructions here: https://docs.docker.com/engine/install/ubuntu/

ROS Download / Setup

Based on the fact that I'm running Ubuntu 18.04, I will be using the Melodic version of ROS

http://wiki.ros.org/melodic/Installation

Managed to get it working under python 2.7.  I really want to make sure however that I'm going to be compatible in the future, so I want to get this working under python 3.x but that will come later.

 



GPU Fun

 I've been meaning to install my Titan RTX for a while now...

Finally got around to it!

brianchilders@education:~$ nvidia-smi --query-gpu=name --format=csv,noheader
TITAN RTX
GeForce RTX 2080 Ti

Will be putting the card through some TensorFlow exercises in the near future.


Friday, October 2, 2020

Self Driving Car: Revisiting again...

 Well it's been a busy life with COVID-19 and learning how to adjust to a new way of life.  I am starting to get back onto the self-driving car project and revisiting it.

I have about 3-4 different hardware solutions for self-driving cars - all in various states of assembly.  I think I just need to pick one platform and stick with it.

So to that effect I think I'm going to start re-reading about the ROS Robotics platform.

AWS CDK

So I found myself on Sunday night thinking that I need to brush up on some AWS skills.  I attended a workshop at AWS re:invent 2019 put on by Paul Vincent from AWS about CDK.  I decided to read up on "Getting Started with the AWS CDK" and set up a VPC to do some other cool networking tasks.




Sunday, February 2, 2020

Self Driving Car: Setbacks and progress

Setback: Ok - so I started assembling the PiCar that will use Google Coral and I realized that I put the heat sinks on when I shouldn't have, as the HAT board doesn't give enough clearance.

However, I managed to successfully flash the Jetson Nano.

Assembled most of the hardware.  Unfortunately I didn't get the needed batteries with my package - so I'm off to the Internet to find some batteries.

Here are some pictures of my build of the Jetbot so far.














Self-driving cars

So I have several projects going on at the moment...

The first project has not just one, but two self driving cars.


One will be powered by a Raspberry Pi 3B+ / Google Coral, the other with a NVIDIA Jetson Nano.

More to come!

-brian

Wednesday, January 15, 2020

Getting started with Serverless and Alexa development



(Image Credit/Source: Amazon Blog)

More and more I see myself tinkering around with serverless development.

To get started, I had to decide what I was going to use for development, so I started with installing node.js.

Node.js is available for multiple operating systems and can be downloaded here.

Once I downloaded node.js, I used npm to install serverless, e.g.:

npm install -g serverless

I set up my credentials using the following command:

serverless config credentials --provider aws --key AKexampleKey --secret wJalrexampleSecret

You can also install the ask (Alexa Skills Kit) cli using npm:

npm install -g ask-cli

As I plan on doing Alexa skill development, I need to associate my AWS profile with my Alexa development profile.  Excellent documentation provided by AWS can be found here.

I initialized the ask cli with the AWS account (using a key and secret set up previously):

ask init --aws-setup

I then associate the Amazon Developer account with the ask cli:

ask init

We can now start to create our Alexa skill and any associated Lambda functions that would be called as part of the skill, e.g.: ask new and ask deploy.



Wednesday, January 1, 2020

re:invent 2019 - IoTanium by Onica: Connecting

If you've been following along in the IoTanium journey, I was able to unbox and assemble my IoTanium Dev Kit.  I will be following the additional setup instructions and verifying that everything works in the online IoTanium documentation.

So the first thing I did was to power up and connect the dev kit to a USB power supply.  The keyboard USB on my mac keyboard was able to successfully power the device, so I just went with that.


I then cloned their preview repository by executing: git clone -b preview https://github.com/onicagroup/iotanium.git

Now that I have a copy of the repo on my hard drive, I first connected to the wifi of the IoTanium dev kit and then opened up iotanium/webrepl/webrepl.html from the downloaded repository.

There's a password for both the wifi and establishing a connection to it.  Please review the online IoTanium documentation for what that password is (link above).


The IoTanium device actually uses MicroPython, which seems to be use in other IoT devices like the BBC micro:bit.  While the micro:bit is advertised for students, I believe it could be useful for anyone wanting to learn Python programming for microcontrollers.

At any rate, I need to configure the IoTanium Dev Kit to connect to my home network, which is done via the iotanium_cfg.json file (file is located in the iotanium/ folder).  A quick "vi" to update the config and upload the config file to the dev kit and a click of the Disconnect / Connect button along with reconnecting to my home network and I was successfully connected via my home network instead of the IoTanium wifi.  This now allows the IoTanium Dev Kit to get "online" - e.g. to AWS and other services.




Now that the dev kit can reach the Internet - it's time to configure a connection to AWS.  Heading over to AWS console, I navigate to IoT Core.


Once at AWS IoT, I click on Secure then Certificates to generate new certificates.


I click on "Create Certificates" using the "One-click certificate creation".  Advanced IoT shops may manage their own certificates and simply upload a CSR.


You will get a notification that the certificates are created.  Download "A certificate for this thing" and "A private key".  You can download "A public key" and the Amazon Root CA key as well if needed for your implementation. (I did not as I'm just following the instructions and testing the dev kit out.)


You will need to create a policy, e.g. define what a device with this certificate can do so click on the "Create new policy" button.


Add the authorization to publish and connect to iot.




Don't forget to attach the newly created policy to the certificate that you created previously if AWS doesn't attach it automatically as part of the certificate creation process.  This is a step that I see people commonly missing in IoT lab exercises.

I also make a note of my endpoint that I'll be sending to.



I modified the hello_world.py file (located in the /iotanium folder) with my endpoint, uploaded the python file and the certificate / key and then I ran the program.


Ouch.  Failure.  It seems like the program was sending a message every 5 seconds.  I noticed that the performance of the ESP32 microcontroller is a bit slow (e.g. when I type something into the web terminal page, it sometimes "lags").  So I adjusted the message send rate in hello_world.py from 5 seconds to 15 seconds and success!


Over in AWS Console, subscribe to the topic 'iotanium'.


The messages successfully showed up in the Test console.


Stay tuned for my next blog post where I take a closer look at the dev board and start connecting some electronics to it.

-brian

Note: I am not being compensated by Onica / Rackspace for this blog post, nor does this post constitute an endorsement of this product by me or my employer.  I share my experiences in the hopes of educating and informing others about Internet of Things (IoT).

re:invent 2019 - IoTanium by Onica: Unboxing and Assembly

I attended re:Invent 2019 at Las Vegas, NV and one of the vendors, Onica, had a chance to win their IoTanium Dev Kit by (1) making motions and capturing that motion; (2) upload the captured motion to a ML algorithm to (3) generate a model with the captured motion and associate that with movement and action to then (4) play a game and make motions / actions that would move an avatar in the game.

So after 3 tries, I finally completed the exercise.  I was most likely not consistent in my motions and it was hard for the model to understand what I was trying to accomplish.  But I did it! :)

As a result, I received a t-shirt and their IoTanium Dev Kit.  I thought in this post I'd capture the unboxing of the Dev Kit and then in subsequent blog posts, show some projects that can be completed with this Dev Kit.

The Dev Kit comes in a basic black box, filled with parts and electronic bits for assembly.


Upon opening the box, I am greeted immediately with instructions for construction.  You will need a Phillips screwdriver to complete the assembly.


Two main pouches are in the box, one with the microcontroller dev board and the breakout board, the other with a breadboard, screws, risers and some electronics.


Here we have the microcontroller (ESP32) and the breakout board.  The goal with Onica providing this dev kit is that it adds an easy way to connect sensors and other external devices to the ESP32.  And then the breakout board enables simple connectivity to a breadboard (shown later).


From the 2nd parts bag I removed the risers and the screws that will be used to assemble the two boards.


Always check to make sure that you have a correct inventory when assembling something (saves from frustration later - in this case all the pieces were there).



I used the female riser along with a screw and attached it to the dev board.  Note that the pins will need to be on the bottom (for connectivity to the breakout board).


Completed assembly of the female risers and the associated screws.


Completed assembly of the dev board (flipped over) to show how the pins will be inserted into the breakout board.


Connecting the dev board to the breakout board.  It takes a firm push to have the pins seated.


Then I took the remaining 4 male risers and screwed the boards together.


Completed assembly of the dev board and breakout board.


Take the time to make sure the pins are lined up with each of the sockets in the breadboard.  I had one pin that was slightly bent so I had move it to the correct spot and then I was able to successfully push the breakout board pins into the breadboard.



Completed assembly and then I opened the electronics bag to see what I would be working with from a project standpoint.


This completes the unboxing and assembly of the Onica IoTanium Dev Kit.

Stay tuned for my next blog post where I connect the dev board to AWS IoT.  Future posts will include a walkthrough of the features of the dev board, connecting electronics to the board and discussing potential applications.

-brian

Note: I am not being compensated by Onica / Rackspace for this blog post, nor does this post constitute an endorsement of this product by me or my employer.  I share my experiences in the hopes of educating and informing others about Internet of Things (IoT).