Aug 28, 2009

Iran election in Twitter

One of the most exciting events in social media would definitely be how the use of Twitter lead to some of the rallies and protests in Iran. Our team at MPI-SWS started looking at Twitter to study the patterns of information propagation. After numerous days of data collection and parsing, finally we are ready to investigate how tens of millions of users communicated with each other. Here's a sneak peek of our on-going research: (Disclaimer: this is definitely an exciting, yet preliminary result and could be changed later on.)

We've looked at whether users who posted tweet(s) on Iran election are connected in the social graph. Imagine the entire social graph of Twitter. Then mark all nodes (=users) who wrote at least one tweet about Iran election. Remove all other nodes in the social graph. Now focus on the remaining nodes in the network. What do they look like?


The plot above shows the size and the number of connected components, where each connected component represents a set of users who are connected by friendship. There were 200,000 users who talked about Iran election in our (sampled) dataset. Surprisingly, 85% of the users belonged to a single large component and 2% of the users to smaller tons. 15% of the users were singletons; they were not connected to any other users who talked about Iran election.

We initially expected to see a power-law distribution whose characteristic pattern is a straight line. This means that the number of connected component should have x^a relationship with the size of the connected component x. (a is called the power-law exponent). But we see two different exponents in the plot.

So why don't we see a straight line in the size distribution of connected components? I have several hypotheses for why we might see a multi-scaling trend, such as the language barrier and the effect of mass media.---I like these moments when I encounter unusual patterns. This is what makes research all the more challenging and fun.


Focusing on the largest connected component, users in this group do show a power-law distribution in their connectivity. Some users potentially influenced tweets of more than 1,000 others (meaning that these users had more than 1,000 fans who also wrote about Iran election); likewise, a user can be influenced by more than 1,000 others in one's subscription list. Both indegree and outdegree distributions follow a power-law trend; but interestingly these quantities turn out to be not related (correlation coefficient of 0.3066).

There are a lot that need to be done and I'm fascinated to investigate how social media like Twitter have changed the way we encounter new information and collaboratively propagate messages among users.

Aug 12, 2009

The 1st International Workshop on Mining Social Media

If you're working on social networks and data mining, here's a perfect workshop to consider at a wonderful south of Spain:

Mining Social Media (MSM'09)
Paper submission deadline: September 6th
Venue: November 9th, 2009 Seville, Spain