Thursday, February 23, 2012


Network graphs were first developed in the 18th century in order to solve a problem about 7 bridges in a town called Konigsburg. The problem was developing a route to cross over every bridge at least once, and it had many issues to solve. Leonard Euler envisioned this problem as a network with 4 nodes that are all connected to each other. This link has more details regarding the initial discovery of networks:
http://www.jcu.edu/math/vignettes/bridges.htm

The original network problem looked similar to this:
There are a lot of interesting aspects to a network graph including odd and even vertices. An odd vertex has an odd number of links. An even vertex has an even number of links. In addition to vertices, graphs also have edges. Edges connect nodes together. There is a lot of mathematics behind the network graphs including the edge and vertex concepts.

Network graphs can be also used to understand sequences. One interesting fact that we learned was the idea of 6 degrees of separation. A degree can apply to individual nodes or groups. I was skeptical at first when I heard this theory. The theory basically states that you can connect to anyone in the world with an average of 6 degrees of separation. In an interesting article, Microsoft proves that the six degrees of separation is valid. Source: http://www.guardian.co.uk/technology/2008/aug/03/internet.email

Microsoft decided to test this by using 30 billion electronic messages. It seems that using a social media network can also prove this topic. We are continuously becoming more interconnected with each other regardless of physical location. Facebook, twitter, and LinkedIn are allowing us to maintain connections that we would not have otherwise. It is important to be able to visualize these networks to make more sense. Below is an example of a LinkedIn network graph:

This image helps to paint a picture of how one person is connected with all of their connections. LinkedIn is able to do this by using the data when making connections with other people. I think it is a very useful tool.

Saturday, February 18, 2012

This week in Business Intelligence class we learned more about conversion rates that we should implement in our Google Adwords challenge. It is important to increase pageviews during the challenge, but one of the most important things is having conversions. One way that the professor suggested was to count the pageviews that occur after the registration page, which is usually a thank you page. This seems like an important way to see if the pageviews were worthwhile or else it was not worth the cost to advertise.

The other lecture this week involved a guest lecturer from an up and coming company called SocialFlow. More information about this company can be found here:

http://www.socialflow.com/

Gilad showed us how the company uses Twitter and other tools such as Gephi and Python code to visualize this information. An example of a Gephi graph is shown below:



Source: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2BSQkCU0cfmh4MxrHSuTuvkBTf9nKnNFvPlcuGPdSSjPtc0TeCzsfx5QMFulu6jWRb5w6R5o8H1qbNJMFPfF4NTgFhb0Kf51zm_Q6F75AK7zw-bIASyDtBwyM0p26Wv9L3VIjSEMk0PH6/s1600/gephi+sna.PNG

Graphs like these can be very useful for organizations that are trying to find out the type of people that are talking about their company. It can also help to understand where a Twitter tweet originated and how it progressed from there. Gilad showed us an example tweet where it moved from the original poster until it went viral.This is really cool since it can help to understand the origination of a tweet and what made it take off. For example, Gilad was saying that the tweets continued at a slow pace until a large follower tweeted with the hashtag, and then it started to take off. Twitter requires a large celebrity that is followed by many people to start the viral process.

In this class, we will need to create similar graphs, which should be interesting to learn more about. Sudha and Gilad informed us that data cleansing is one of the more time consuming parts, but is very important to have useful information derived from the data. I would like to learn more about how the information is being retrieved and stored.

Friday, February 10, 2012

This week in Business Intelligence class, we learned about Google and how it decides on which pages to display first. This is a very interesting topic since there are not many people that will continue past the first page when they are searching something. Due to this fact, if you are a website owner, it is very important to have your page on the initial result set.

According to Google, page rank is defined as follows:

"PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” Using these and other factors, Google provides its views on pages’ relative importance."
Source: http://searchengineland.com/what-is-google-pagerank-a-guide-for-searchers-webmasters-11068


Google has a page rank website that will rank any website that you enter. Since page ranking is very important for websites, the website owners will try to increase their page rank anyway possible. For example, websites may add keywords in white text to try to have higher page views from Google when keywords are searched. There is a war going on with websites in order to remain on the first page of results from a Google Search.


In addition to page ranking, Google uses Ad Rank to rank advertisements that will be shown based on search criteria. The ad rank score is calculated based on two factors, which are Cost per click, and the Quality score. It is more important to have a strong quality score than paying a high cost per click for the advertisers since they want to earn profits. The quality score is measured based on relevance to the search topic. 


Source: http://support.google.com/adwords/bin/answer.py?hl=en&answer=1722122&from=6111&rd=1


Here is a video that describes the Quality Score in depth:





Friday, February 3, 2012

Google Analytics

The first and second homework assignments gave us an in depth understanding of how Google Analytics can be used to increase Business Intelligence. The various analytical abilities that GA has installed allows us to create recommendations that are meaningful and useful.

In addition, Google Analytics can be used to help increase sales. Additional information can be found here:

http://www.comparebi.com/increasing-sales-with-google-analytics/

Different parts of Google Analytics such as sources, keywords, demographics, and page sequence can help to increase sales. For example, you can have more people land on your website if you use specific keywords. This can help people learn more about your website and increase overall traffic. In addition, you can target specific sources to try to increase the presence on your website. Once you know where the top traffic is coming from, you can continue to develop these sources and/or develop other sources to increase overall traffic.

Another important part that Google Analytics offers is the ability to find the top content and top pages that are being viewed. This can be very useful if you want to redesign your website. You want people to reach certain pages, so it would be useful to have an easy navigation throughout.

Also, bounce rate is an important statistic that Google Analytics offers. It helps to show the pages that people leave your website. Through analysis, you can create an hypothesis to reason why people are leaving the website at this point. If you can fix these issues, you will have higher page views.


This week in Business Intelligence class, we learned about the use of network graphs and how they play a vital role in Business Intelligence. The LinkedIn example is a good way to show how people are related to you.

One of the tools that I have exposure to is NodeXL when it comes to generating network graphs. More information about NodeXL can be found here:

http://cwebbbi.wordpress.com/2010/02/22/nodexl-network-graphs-and-the-eurovision-song-contest/

According to Chris Webb, NodeXL and network graphs in general are: "Basically a tool for displaying and analysing network graphs. That sounds a lot more complicated than it actually is – really all it means is that you can use it for analysing the relationships between things"

Network graphs can be useful for understanding how to group different people into a section. It is very useful when you want to understand how things are related to one another, and to have different groupings. Here is a picture of a typical network graph:



When I have used NodeXL, it has been primarily for understanding the different social media users or sources that are talking about a specific topic. This is useful for businesses to know where they are being talked about on the Internet, and if there are any relations with this.


Social Media networks seem to be the most interesting to see how things are interrelated. Facebook and LinkedIn both have nice tools to map this for you.