So I decided to go an easier route - I've now installed Anaconda and will leverage Jupyter notebooks for my learning exercises.
The software was easy to install and run - more later on what I accomplish...
Tuesday, April 30, 2019
Monday, April 29, 2019
Some code I'm playing around with
I'm getting some coding done and exploring Python... here is some sample code that I'm fiddling around with..
from __future__ import division # list of users users = [ { "id": 0, "name": "Alex" }, { "id": 1, "name": "Brian" }, { "id": 2, "name": "Cathy" }, { "id": 3, "name": "David" }, { "id": 4, "name": "Erica" }, { "id": 5, "name": "Frank" }, { "id": 6, "name": "Gary" }, { "id": 7, "name": "Hank" }, { "id": 8, "name": "Indigo" }, { "id": 9, "name": "Jason" } ] # relationship of friendship between users friendships = [(0,1), (0,2), (1,2), (1,3), (2,3), (3,4), (4,5), (5,6), (5,7), (6,8), (7,8), (8,9)] # initialize empty array for user in users: user["friends"] = [] # establish the frienships for each user for i, j in friendships: # this works because users[i] is the user whose id is i users[i]["friends"].append(users[j]) users[j]["friends"].append(users[i]) # this function simply returns the length of the array def number_of_friends(user): """How many friends does _user_ have?""" return len(user["friends"]) # this sums up the total amount of connections total_connections = sum(number_of_friends(user) for user in users) print "Total connections are:", total_connections # let's find the average # from __future__ import division num_users = len(users) avg_connections = total_connections / num_users print "Average amount of connections is:", avg_connections # find the most connected people - largest number of friends # create a list (user_id, number_of_friends) num_friends_by_id = [(user["id"], number_of_friends(user)) for user in users] print "Unsorted list:", num_friends_by_id print "Sorted list:", sorted(num_friends_by_id, key=lambda (user_id, num_friends): num_friends, reverse=True)
Linux Updated
I managed to get Oracle VirtualBox installed successfully and the Debian Linux distribution as well. I had to do some additional work, like install pip, sudo, git and update my user id, but the installation was relatively smooth.
Based on what I've read, it seems like the python libraries that data scientists use tend to work with python 2.7 - so while inclination might say go with the latest and greatest - do stick with python 2.7 - it may change in the future, but for now this seems like the version to go with.
Next up - going to start organizing my projects and start using git.
Based on what I've read, it seems like the python libraries that data scientists use tend to work with python 2.7 - so while inclination might say go with the latest and greatest - do stick with python 2.7 - it may change in the future, but for now this seems like the version to go with.
Next up - going to start organizing my projects and start using git.
Entering the world of datascience
I am hoping to learn more about the world of data science. I'm going to focus on three different key areas: (1) statistics (2) programming and (3) subject matter expertise.
While I haven't decided which area I want to focus on subject matter wise, I do want to pick up the statistics that are needed to do the analysis and then the corresponding programming techniques that will help me apply the statistics.
I thought I'd start by downloading Oracle VM VirtualBox to start, as this will let me run Linux within my Windows workstation. I also decided on using Debian as my Linux distribution.
Subscribe to:
Posts (Atom)