We've looked at whether users who posted tweet(s) on Iran election are connected in the social graph. Imagine the entire social graph of Twitter. Then mark all nodes (=users) who wrote at least one tweet about Iran election. Remove all other nodes in the social graph. Now focus on the remaining nodes in the network. What do they look like?
The plot above shows the size and the number of connected components, where each connected component represents a set of users who are connected by friendship. There were 200,000 users who talked about Iran election in our (sampled) dataset. Surprisingly, 85% of the users belonged to a single large component and 2% of the users to smaller tons. 15% of the users were singletons; they were not connected to any other users who talked about Iran election.
We initially expected to see a power-law distribution whose characteristic pattern is a straight line. This means that the number of connected component should have x^a relationship with the size of the connected component x. (a is called the power-law exponent). But we see two different exponents in the plot.
So why don't we see a straight line in the size distribution of connected components? I have several hypotheses for why we might see a multi-scaling trend, such as the language barrier and the effect of mass media.---I like these moments when I encounter unusual patterns. This is what makes research all the more challenging and fun.
Focusing on the largest connected component, users in this group do show a power-law distribution in their connectivity. Some users potentially influenced tweets of more than 1,000 others (meaning that these users had more than 1,000 fans who also wrote about Iran election); likewise, a user can be influenced by more than 1,000 others in one's subscription list. Both indegree and outdegree distributions follow a power-law trend; but interestingly these quantities turn out to be not related (correlation coefficient of 0.3066).
There are a lot that need to be done and I'm fascinated to investigate how social media like Twitter have changed the way we encounter new information and collaboratively propagate messages among users.
3 comments:
It's the percolation transition. If the probability that a twitter user mentioned Iranian election is below the percolation threshold of twitter social network, it would not have giant component. But if the probability exceeds the percolation threshold, the giant cluster emerges. It's natural.
Hi YY, thanks for your reply. I have more questions! We definitely see a giant component. But I also read a paper [Leskovec's "The Dynamics of Viral Marketing"] which says the size distribution of components also follows a power-law distribution. Do you know of other work on this?
I should look up some literature. The percolation theory is pretty old (but still very important) topic. It is related to error & attack tolerance, epidemic spreading...
Anyway, if the probability is very close to the percolation critical point, the size distribution of all clusters follows power-law distribution. above the threshold, there comes a giant component, and the size of small clusters may follow power-law distribution near the threshold. Also same below the threshold, only without the giant component.
Post a Comment