Showing posts from October, 2020

Home built Hadoop analytics cluster: Part 4

Home built Hadoop analytics cluster: Part 4 So yay!  As mentioned for my next goals in my previous post , I finally got the remaining two boxes built out and added into my home network.  I opted to put the Hadoop cluster on it's own subnet with a dedicated unmanaged switch only for the cluster (primary and nodes). I added the agent and metrics to all of the nodes and rebooted the servers. Then I followed the instructions to set up the cluster, naming it "ds730" after the class that I'm currently taking - DS730: Big Data - High Performance Computing. I also made sure I had DNS setup correctly by modifying /etc/systemd/resolved.conf and fixed my name resolution issues. Removed firewall rules. Removed timedatectl by doing: sudo timedatectl set-ntp no Then installed ntp: sudo apt install ntp Now I need to look at installing some database drivers, however I think I'm going to call it a night.  

Home built Hadoop analytics cluster: Part 3

Home built Hadoop analytics cluster: Part 3 In my previous post I covered Bill of Materials (BOM), hardware assembly, and installing Linux (Ubuntu).  In this post I will cover how I installed Ambari. Installing Ambari Rather than build from source, I opted to use the distribution from Cloudera (formerly HortonWorks). Ambari 2.7.5 requires official support from Cloudera, so I went down to 2.7.3 which doesn't require a support agreement with Cloudera. Install some pre-requisites sudo apt install python-dev sudo apt install gcc Add Cloudera as a distribution sudo wget -O /etc/apt/sources.list.d/ambari.list sudo apt-key adv --recv-keys --keyserver B9733A7A07513CAD sudo apt-get update Verify packages show up apt-cache showpkg ambari-server apt-cache showpkg ambari-agent apt-cache showpkg ambari-metrics-assembly Install and setup the server on primary sudo apt-get install ambari-se


So, for some reason some recent updates to Ubuntu decided to cause the resolution of my monitor to go down. So I tried fixing it doing some updates, and lo and behold, nothing worked. Then on top of that, my virtual machine that I was using for my DS730 class decided to go belly up and didn't come back after 2 reboots. Not having a good day. So, now I'm going to document the installation of my workstation using a blog post so I have some documentation to fall back onto when things go haywire again.  It looks like I'm going to reinstall Ubuntu again. Then I need to hope that rebooting the virtual machine one last time will work.

Home built Hadoop analytics cluster: Part 2

Home built Hadoop analytics cluster: Part 2 In my previous post , I went through my overall plan that I will be following, along with goals and topics that I will be covering.  In this post, I will cover the initial building out of the cluster. [ Bill of Materials - BOM ] [ Hardware assembly ] [ Installing and configuring Linux ] Bill of Materials - BOM Item Price Link CPU $139.99 Amazon Motherboard $71.99 Amazon Memory (32 GB) $109.99 Amazon Storage (500 GB) $57.99 Amazon Power Supply (600W) $62.99 Amazon Case $68.99 Amazon Total $511.94 *** Total estimated price as of 10/12/2020 Does not include shipping/taxes Obviously, you can swap out components as you see your needs fit.  I did not want to make a high end workstation with GPU, opting to use a CPU that had graphics built in.  I did opt to get 32 GB memory and 500 GB storage - I could have gone down to 16 GB for memory and 250 GB for storage, but I feel that memory and storage is something that I always seem

Hadoop Reading Material

Hadoop Reading Material I'm starting to really get into my DS730 - Big Data: High Performance Computing class. I wanted to go beyond the instructors material and picked up some additional reading material. Hoping this will help me be successful in the weeks to come.

GTC 2020: Jetson Nano

I attended GTC 2020 online and they had a set of sessions that talked about the Jetson Nano.  I realized I had let my projects go to the wayside and I wanted to get back in the fold. I picked up a Jetson Nano kit from Sparkfun as I wanted to go through V2 of the course.  (I completed V1 of the course already earlier.) I managed to get some cool swag from GTC 2020 - mug, t-shirt and some Lego blocks. Sparkfun was prompt in getting the order into my hands and I was able to unbox it earlier this evening. NVIDIA has a great getting started with the Jetson Nano page , so I followed that to get the system up and running. I'm going to document my experience going through the V2 course  in my next post.

Home built Hadoop analytics cluster: Part 1

Home built Hadoop analytics cluster: Part 1 [ Background ] [ Project Summary ] [ Building the cluster ] Background I am pursuing the UW Master of Science in Data Science degree from UW-Wisconsin - University of Wisconsin System / UW Extended Campus.  My home school is at UW La Crosse .  For the Fall 2020 semester, I am taking DS730 - Big Data: High-Performance Computing . I have been learning how to implement algorithms that allow for the distributed processing of large data sets across computing clusters; creating parallel algorithms that can process large data sets; and using tools and software such as Hadoop , Pig , Hive , and Python to compare large data-processing tasks using cloud-computing services e.g. AWS . One of the things that struck me so far in this course is how everything was "setup" for me in this course.  While I understand the intent of the course is to focus on the programming skills to handle big data within data science, part of me wants to focus on ho


Over the next few days I am planning on attending the GTC (GPU Technology Conference) being hosted by NVIDIA. Looking forward to some useful sessions to help with the Jetson line. One of the skills I'd like to acquire is to train an model and then optimize it for the Jetson device. More to come!

Self Driving Car: Learning Steps

Linux Version lsb_release -a I determined I'm running Ubuntu 18.04. Docker I took the time to download Docker for Ubuntu and followed the instructions here: ROS Download / Setup Based on the fact that I'm running Ubuntu 18.04, I will be using the Melodic version of ROS Managed to get it working under python 2.7.  I really want to make sure however that I'm going to be compatible in the future, so I want to get this working under python 3.x but that will come later.  


 I've been meaning to install my Titan RTX for a while now... Finally got around to it! brianchilders@education:~$ nvidia-smi --query-gpu=name --format=csv,noheader TITAN RTX GeForce RTX 2080 Ti Will be putting the card through some TensorFlow exercises in the near future.

Self Driving Car: Revisiting again...

 Well it's been a busy life with COVID-19 and learning how to adjust to a new way of life.  I am starting to get back onto the self-driving car project and revisiting it. I have about 3-4 different hardware solutions for self-driving cars - all in various states of assembly.  I think I just need to pick one platform and stick with it. So to that effect I think I'm going to start re-reading about the ROS Robotics platform.


So I found myself on Sunday night thinking that I need to brush up on some AWS skills.  I attended a workshop at AWS re:invent 2019 put on by Paul Vincent from AWS about CDK.  I decided to read up on " Getting Started with the AWS CDK " and set up a VPC to do some other cool networking tasks.