Posts

Showing posts with the label pig

Home built Hadoop analytics cluster: Part 5

Image
Home built Hadoop analytics cluster: Part 5 Got the mysql database installed and configured on my secondary node.  Installed the driver on the primary node.  Set up a few users and a database.  Tested the connections. Now hopefully all goes well with the install!

Home built Hadoop analytics cluster: Part 4

Image
Home built Hadoop analytics cluster: Part 4 So yay!  As mentioned for my next goals in my previous post , I finally got the remaining two boxes built out and added into my home network.  I opted to put the Hadoop cluster on it's own subnet with a dedicated unmanaged switch only for the cluster (primary and nodes). I added the agent and metrics to all of the nodes and rebooted the servers. Then I followed the instructions to set up the cluster, naming it "ds730" after the class that I'm currently taking - DS730: Big Data - High Performance Computing. I also made sure I had DNS setup correctly by modifying /etc/systemd/resolved.conf and fixed my name resolution issues. Removed firewall rules. Removed timedatectl by doing: sudo timedatectl set-ntp no Then installed ntp: sudo apt install ntp Now I need to look at installing some database drivers, however I think I'm going to call it a night.  

Home built Hadoop analytics cluster: Part 2

Image
Home built Hadoop analytics cluster: Part 2 In my previous post , I went through my overall plan that I will be following, along with goals and topics that I will be covering.  In this post, I will cover the initial building out of the cluster. [ Bill of Materials - BOM ] [ Hardware assembly ] [ Installing and configuring Linux ] Bill of Materials - BOM Item Price Link CPU $139.99 Amazon Motherboard $71.99 Amazon Memory (32 GB) $109.99 Amazon Storage (500 GB) $57.99 Amazon Power Supply (600W) $62.99 Amazon Case $68.99 Amazon Total $511.94 *** Total estimated price as of 10/12/2020 Does not include shipping/taxes Obviously, you can swap out components as you see your needs fit.  I did not want to make a high end workstation with GPU, opting to use a CPU that had graphics built in.  I did opt to get 32 GB memory and 500 GB storage - I could have gone down to 16 GB for memory and 250 GB for storage, but I feel that memory and storage is something that ...

Hadoop Reading Material

Image
Hadoop Reading Material I'm starting to really get into my DS730 - Big Data: High Performance Computing class. I wanted to go beyond the instructors material and picked up some additional reading material. Hoping this will help me be successful in the weeks to come.

Home built Hadoop analytics cluster: Part 1

Home built Hadoop analytics cluster: Part 1 [ Background ] [ Project Summary ] [ Building the cluster ] Background I am pursuing the UW Master of Science in Data Science degree from UW-Wisconsin - University of Wisconsin System / UW Extended Campus.  My home school is at UW La Crosse .  For the Fall 2020 semester, I am taking DS730 - Big Data: High-Performance Computing . I have been learning how to implement algorithms that allow for the distributed processing of large data sets across computing clusters; creating parallel algorithms that can process large data sets; and using tools and software such as Hadoop , Pig , Hive , and Python to compare large data-processing tasks using cloud-computing services e.g. AWS . One of the things that struck me so far in this course is how everything was "setup" for me in this course.  While I understand the intent of the course is to focus on the programming skills to handle big data within data science, part of me wants to fo...