Aug 28, 2009

Iran election in Twitter

One of the most exciting events in social media would definitely be how the use of Twitter lead to some of the rallies and protests in Iran. Our team at MPI-SWS started looking at Twitter to study the patterns of information propagation. After numerous days of data collection and parsing, finally we are ready to investigate how tens of millions of users communicated with each other. Here's a sneak peek of our on-going research: (Disclaimer: this is definitely an exciting, yet preliminary result and could be changed later on.)

We've looked at whether users who posted tweet(s) on Iran election are connected in the social graph. Imagine the entire social graph of Twitter. Then mark all nodes (=users) who wrote at least one tweet about Iran election. Remove all other nodes in the social graph. Now focus on the remaining nodes in the network. What do they look like?


The plot above shows the size and the number of connected components, where each connected component represents a set of users who are connected by friendship. There were 200,000 users who talked about Iran election in our (sampled) dataset. Surprisingly, 85% of the users belonged to a single large component and 2% of the users to smaller tons. 15% of the users were singletons; they were not connected to any other users who talked about Iran election.

We initially expected to see a power-law distribution whose characteristic pattern is a straight line. This means that the number of connected component should have x^a relationship with the size of the connected component x. (a is called the power-law exponent). But we see two different exponents in the plot.

So why don't we see a straight line in the size distribution of connected components? I have several hypotheses for why we might see a multi-scaling trend, such as the language barrier and the effect of mass media.---I like these moments when I encounter unusual patterns. This is what makes research all the more challenging and fun.


Focusing on the largest connected component, users in this group do show a power-law distribution in their connectivity. Some users potentially influenced tweets of more than 1,000 others (meaning that these users had more than 1,000 fans who also wrote about Iran election); likewise, a user can be influenced by more than 1,000 others in one's subscription list. Both indegree and outdegree distributions follow a power-law trend; but interestingly these quantities turn out to be not related (correlation coefficient of 0.3066).

There are a lot that need to be done and I'm fascinated to investigate how social media like Twitter have changed the way we encounter new information and collaboratively propagate messages among users.

Aug 12, 2009

The 1st International Workshop on Mining Social Media

If you're working on social networks and data mining, here's a perfect workshop to consider at a wonderful south of Spain:

Mining Social Media (MSM'09)
Paper submission deadline: September 6th
Venue: November 9th, 2009 Seville, Spain

Jun 26, 2009

Timely research

I'm reading a nature paper that was published yesterday, Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic.

This is a 4-page letter paper. According to the Nature authors' guide, this means the paper provides an outstanding finding whose importance means that it will be of interest to scientists in other fields. Regular articles in Nature is 5 page long and needs to make a substantial advance in understanding of an important problem.

As the title says, the paper is about the recent 2009 swine flu. I'm amazed that scientists put together good work in such a short period of time. Well, informally, I've heard of a couple of immediate rejections on the Swine flu to Nature. This says, there are *lots* of scientists who are quick and good.

What can social network researchers do with the abundance of data and the recent Iran election? This could turn into another great Nature letter paper in a few months.



Jun 17, 2009

Twitter FAQ


So it happened that I finally came across something to ReTweet about. My very first use of "RT". For those of you who are not familiar with Twitter codes, here is a brief and useful tutorial on Twitter FAQ.

RT means ReTweet or Repeat.
Copy the message you want your want to retweet and start with "RT @UserName"

OH means OverHeard
When you hear something funny or insightful with your ears (as opposed to reading it on Twitter) and you want to repeat it, prefix it with OH. Generally, this is used anonymously, not for quoting people.

HT means HeardThrough
This is similar to OH in that you use it to repeat things you heard with your ears. A difference is that you can quote the person's name.

Starting with @ sign means Reply
When you want to reply to someone, start your tweet with @UserName.

Hash Tags (#) help to designate topics that people might search for
When you head to conferences, look for hash tags.

#FollowFriday means recommended followers
This is like book recommendations from a friend. You can list your top followers in the form of @UserName after this tag.




May 20, 2009

ICWSM'09 note - day three

Leveraging Diversity

- Ideas embracing diversity in opinions getting popular: Sidelines, Google moderator

- Goal is to project an accurate proportion of users supporting different opinions. let users get an exposure to challenges or new ideas.

- Quick Q: do people really like diversity?

- Similar work on news media bias "NewsCube" (CHI'09). This work looks at content to aggregate similar news and project different themes news articles. Sidelines paper simply looks at voting counts.



- "Diversity in user activity and content quality in online communities" by Tad Hogg. How many activities does a user do per day?

- Users with very little online time or little activity harder to model (i.e., difficulty of modeling in a heavy-tail distribution).

- Visibility (or exposure) is the key mechanism by which information spreads? (whether exposed by friends or by serendipitous browsing). Visibility and interest are different. (look paper)



- Check out Lada Adamic's write-up on social networks at HP labs.

"Unlike viruses, which spread indiscriminately from host to host, pieces of information are propagated by people who find them interesting and who pass the information to others who they think may be interested. Since people are most similar to their immediate contacts, and this similarity decays as the distance in the social network between individuals increases, information becomes less relevant further away from the source and is unlikely to spread throughout the network. This holds true even in networks with power-law connectivity distributions where highly connected individuals, known as hubs, have the opportunity to potentially spread information to a large number of people. " (See paper)



- Spetrum: retrieving different points of view from the blogosphere.

- Meta search engine for blogs. Would be nice to see memes in the search results. Predicting bloggers' interests in realtime difficult.

- Blog directory (blog category, blogging fusion, yahoo! directory) - Do these sites really work? Blogs are ephemeral.

May 19, 2009

Duncan Watts keynote speech at ICWSM


(1) evolution of social network structure over time

- Work with Gueorgi Kossinets

- Social network changes over time. Averages about network structure (e.g., path length, clustering coefficient, node degree) stay rather steady over time, but individual quantities (e.g., rank) change rapidly.

- What does this mean in terms of social applications?



(2) macro-sociological experiment on social influence

- Music lab experiment -- very cute idea.

- People do get influenced by others. However, the top favorites in the social influence world also did well in the independent world. (Same trend for the bottom ones in the popularity distribution. Lots of noise in medium hot content)



(3) network survey on facebook

- Launched "Friend Sense" app on Facebook and asked political preferences of users and their friends.



(4) influence of financial rewards on performance

- Crowd sourcing site Amazon's Mechanical Turk (AMT) is a fantastic place to launch quick and inexpensive social science studies.

- Increased pay resulted in more work, but not increased accuracy in work. People always feel they are underpaid---the anchoring effect in psychology



(5) final remarks

- We moved from having too little data to having too much data.

- Facebook's 200+M users don't fit on a single memory for us to analyze!

- Having one more zero in your dataset (huge-data) doesn't mean that you're asking big questions. It's more important to ask important questions and set up experiment that can answer the question.

- Lada Adamic's recent work on diffusion of gesture on second life

- Duncan said, "You might ask why are we even asking this obvious question. After many years of being a sociologist, nothing is obvious!"


Tag cloud on Sam Gosling's tutorial #icwsm

Based on my summary below, Wordle says

ICWSM'09 note - day two


Panel Discussion

- Cameron Marlow@Facebook: Look at data to do informed design

- Q: How do users find information on site? Are social links actively used? What about search?

- Slideshare A: We do see clicks from social networking sites, but most requests come from Google. So we put a lot of efforts and make sure our slides are easily searchable in Google. Front page is also important -- but that's more of a setting a tone about the site. (identity). We do lots of editorials in the front page.

- Facebook A: news feed is definitely the most important. This is a huge optimization problem. Facebook provides two feeds: live feed, highlights.



Modeling Social Dynamics

- Interesting paper "Stochastic models of user-contributory web sites" does modeling in social sites, similar to the famous Huberman et al. Science paper from 1998.

- One thought. Modeling is based on visible data. How would a model of user behavior change if we were to consider both visible and invisible activity? (e.g., browsing takes up most of our online time!)

- Prediction of content popularity based on early data possible--we've seen the same trend in YouTube. Is this a general trend in any massive content system?


- Flickr and YouTube are not about social relationship, but ultimately about information sharing. See evidence: paper "Personal Information Management vs. Resource Sharing"

- Social dimension (Shneiderman 2002): People search for other people's content on Flickr and YouTube. vs. People search for their own content on Delicious and Connotea. So is the purpose of tagging.


- Social computing & sustainability of sites: it's important to understand what, where, and why people share.

- Interesting paper by Nov et al.

- Motivation for sharing, two axes: self or others, intrinsic or extrinsic. All four combinations have positive loop to user behavior.

- This reminds me of comment from a director at Big Brothers. He said people post videos in YouTube, because they want to be the Steven Spielberg on the web.


May 18, 2009

ICWSM'09 note - day one

Some of the interesting comments and conversations I heard:

Keynote speech
- Lillian Lee's new book on Opinion mining and sentiment analysis available for free.
- In opinion mining, simply using bag of words is not enough. Structure of the text is also important, e.g., look for "still" and "however".
- ? and ! are negative feedback cues

Community
- Over a billion dollars spent on influential viral marketing.
- My social cascade work got some attention.
- People's exposure to feed is the most critical factor in social cascade. Number of friends didn't play any significant role (i.e., accidental influential users?).
- Search "Facebook data" in Facebook. Join the Page!
- In a system like Yahoo! Answers, people reward answerers the most on topics like Music, Computer, and Medicine. However, Science questions were rewarded low.

Psychology and Users
- sociogeek paper is based on 150,000 online surveys
- Like Stanley Milgram's familiar strangers in the offline world, we can find familiar strangers in the online counterpart. One key insight to do this is that it will benefit a lot of people. (Social networks have power-law degree distribution, where many people have few contacts. Knowing familiar stranger for people in the tail.)
- Download geotag data at DBpedia

Ranking
- CourseRank program in Stanford by Georgia Koutrika, popularly used by students to rank courses and plan course scheduling.
- Some cute findings: (a) the larger the department size, the smaller individual contribution is. (b) students who got better grades gave higher ranking to the course.
- High median node degree in social networks can reflect spurious relationships? Search "top friends" application on Facebook.
- Similar to "Predicting Tie Strength With Social Media" from CHI2009, paper "Using Transactional Information to Predict Link Strength in Online Social Networks" looked at wall and photo posts to guess tie strength. They used Top Friends app results as the ground truth.


Some thoughts on tie strength and influential users
- After listening to some of the talks, I had the following thought. Network-activity based studies on user interaction mostly focus on visible interactions, e.g., wall posts and photo tagging. This might be a good predictor when we want to measure the strength of ties between individuals (or simply ask who are my best friends?). Then, what does a tie strength mean in terms of viral marketing? Are friends of stronger ties a good indicator of influential people?
- I get a feeling that this is not the case. Based on the Facebook paper "Gesundheit! Modeling Contagion Through Facebook News Feed", we also learned that adoption of information has to do with simple exposure (but not on how popular a friend is). Indeed, exposure is an important and dominant factor in social networks. My recent work using social network clickstream data showed that more than 80% of user activity had to do with "silent" browsing---which a crawl-based study cannot capture. Which friends are important then in viral marketing? Best friends or those whom I get exposed to?
- Obviously there will be lots of overlap between the two groups. But, my conclusion today is that influential users are the ones whom I "actively" get exposed to. This means friends whom I follow up on their updates by visiting their pages. Would accidental influentials would be those whom I got opportunistically exposed to (e.g., Facebook's news feed) and joined cascades?

Tutorial: Psychology of Social Media

This was a 2-hour long tutorial at ICWSM'09, delivered by Sam Gosling (UT Austin) and Kate Niederhoffer (Nielsen Online). It was refreshing to listen to people in different fields who look at the very same problems (audience span psychologists, sociologists, computer scientists, wall street analysts). Sharing my note below.



1. Social media (like Facebook, Twitter) is serving some psychological needs. What?

- Everybody spends their day by doing something. What do they do and what does that tell us about the person?

- Psychologist Maslow says everyone has his own "hierarchy of human needs". This means that we all have our own set of priority in the action set.



2. Fundamental social needs of people: People want to (1) get along and (2) get ahead.

- Get along meaning, socialize. Get ahead meaning, step up in the social hierarchy level. How could this be projected in social media?

- What do our friends tell us about us? In a lab test, people with low self-esteem wanted to hang out with those who gave negative feedback, against those who gave positive feedback. Homophily?



3. People want to be seen accurately, than being projected more positively than they actually are.

- Sam looked at webpages of people, contacted the webpage owners, interviewed them and asked what person they'd like to be, asked their friends what the person is like, and found the projection is rather accurate.

- Would this hold in Facebook too?



4. What are identity claims?

- This mean the things about you that is deliberately chosen for other people. Examples are t-shirts that you wear, bumper sticker, webpage, pictures you hang in your room, etc.

- Music is not an identity claim. It's a tone-setter. It is typically chosen to set our mood in a particular way. You listen to upbeat music when you head out for clubbing vs. when you are home, you may listen to a different music.

- Books are inadvertent identity claims. This is not so deliberate, but is a behavioral residue over a longer time period. Just by looking at the variety, topics, and organization of books at home, we can tell so much about the person. For example, does the person read a wide variety of books (openness), are the books actually read, are there notes, how are they ordered -- neatly or messy, are cheesy books hidden behind, etc.



5. Language is the backbone of our expression.

- There are a lot of approach to understand human behaviors based on their linguistic styles. The use of pronouns (which we think is garbage word) can already reveal sex and age of a person.

- 140 characters in Twitter can tell so much.

- Usage of word "I" : more common among females. more common among people with low social status. Usage of word "We": group action. but also reflects future trouble (meaning, followed by "but")



6. What do we need to know to know a person?

(a) big five traits (openness = creativity + interest + opinions, conscientiousness = daily life, extraversion = documenting life, agreeableness = polite topic covered, neuroticism = cathartic or auto-therapeutic purpose - standard way, look Goldberg 1992, Costa and McCrae 1992)

(b) personal concerns (visions and goals)

(c) identity (personal myth - very difficult to measure)





May 15, 2009

John Wilkes

John (who is my Facebook friend!) visited MPI-SWS and gave a talk about data center storage design. His theme was the design of "light out data center": can we design a data center that will continue to work when all the light goes out (without human intervention?) This means automated control.

He also shared one very interesting thought. SLAs are ad-hoc and random (e.g., service outage should be within three-nines, or 0.999 availability). Sometimes meeting this arbitrary number means investing arbitrary money and effort. So why not have flexible SLA and see what are the losses as we fail to meet the needs?---what a refreshing idea. John further said cost of not meeting the needs at Google may mean loosing customers. He said "users are one click away from leaving to other services" and the cause of leaving may depend on many axes including a particular shade of blue they use in the logo.



May 9, 2009

Chocolate pie chart

I have been quite busy with a paper deadline, advisory board visit, etc. While looking for a nice graphics tool to plot a pie chart, I came across this irresistible chart made of 70% milk + 20% dark + 10% milk chocolate. I'd love to make a program to generate this.
If tempted to put on your paper, consult Mary and Matt store

Apr 18, 2009

Social Mobile Web 2009

I am looking forward to reviewing interesting papers for the Social Mobile Web 2009 workshop. We have very cool Program Committee members and I am happy to be part of the group. Mobile devices are everywhere and I would love to see innovations coming out in this area.

Call for Paper:
The mobile space is evolving at an astonishing rate. At present there are over 3.5 billion mobile subscribers worldwide and with continued advances in devices, services and billing models, the mobile web looks set to inspire a new age of anytime, anywhere information access. The inherent characteristics of mobile phones enable new types of interactions, e.g. mobile phones are personal to the individual, they are always on and always connected. And as such we are seeing a shift towards mobile devices for social mediated tasks. The world is also witnessing an explosion in social web services. Online social networking sites such as Facebook and MySpace continue to experience huge increases in usage, with more and more users seeking novel ways of interacting with their friends and family.

Topics:
Novel social interactions on mobile devices.
Social mobile content sharing and distribution services.
Context aware mobile services - beyond location based services.
Social mobile search and social mobile browsing.
User evaluations of social mobile services.
Mobile user interfaces that incorporate social elements.
Mobility and social networks.
Models of mobile social behavior and mobile traces.
Urban gaming, mobile mixed reality, etc.
Innovative social mobile applications.

Deadline: May 11th 2009

Apr 9, 2009

Plotting Venn diagrams

I'm having fun with this marvelous program Twitter Venn: a program that takes in search keywords and plots a Venn diagram that represents the overlap in the use of keywords in Twitter messages.  Making of this application appear in Jeff Clark's blog.
Twitter makes the search possible online (search.twitter.com), which allowed this to happen. Very nice. I want to make more sophisticated Venn diagram: similar to the Google's key word trend, I'd like to see how the relationship evolves. Or a given keyword, I'd like to see the hottest matching set of keywords. 

BTW, what went happened with such small overlap in cat and sleep? Am I the only one with a cat that sleeps all day long?  

Apr 7, 2009

Crawling YouTube


I started crawling YouTube site (again!) to get video comments.  This time, I'm using Google Data API and properly approaching the site.  The API makes the code very short and I like that it runs fast. Here is a sample code for getting comments.

comment_feed=yt_service.GetYouTubeVideoCommentFeed(video_id=video_id)
for comment_entry in comment_feed.entry:
print comment_entry.ToString()

Strangely, the code gives me a subset of video comments (say 100), even when there are thousands of comments I can see in YouTube. I'll have to go through the documentation or switch back to wget and urlopen. 

PS: Crawling YouTube is rather distracting.  I ended up watching 20 cat videos and participated in viral spreading of those videos (i.e., sending spam video links to friends).  My favorite of the day: cat massage.

Apr 6, 2009

Summary of the ACM SNS workshop


Here is my summary of the recent ACM Social Network Systems 2009


* Security at a Large Social Network, Tao Stein (Facebook) 
Tao's talk started with the "The Road to 200 Million" article from NYT.  Facebook has three data centers, each in charge of a major continent: VA (asia) SC (europe), SF (us). Data consistency is hard to achieve, so Facebook only uses a single server for writing and the other servers for read only operations. Servers use 25TB of RAM for MySQL.

There are obviously lots of attacks on Facebook. Their long term goal is to achieve that one identity in the system corrsponds to one real identity.  In every security policy, trade-off is at site integrity and user experience: throwing in more CAPCHA will increase security, but then users experience will degrade.   

The #1 problem is at account takeovers. Here are a few example attacks:
(a) photo/video scam (e.g., "This applet will show you which friends viewed your photo")
(b) 419 attack by Nigirian spammers (e.g., "I am lost in London, please send me $1000 to Western Union") 
(c) koobface (a botnet that sends spam URLs)
(d) fake chain letter (e.g., "Facebook is overpopulated")

Often users use the same login credentials across multiple sites. (Yes, I do too!) So if one site gets compromised, then all are compromised. Because most sites force users to use complex password, uses end up using a common password across sites. Facebook tries a lot to educate users with sophisticated privacy setting.   
  
How is the network security different in online social network? (a) education and (b) coefficient (= strength of ties) in the social graph. 


*Botnets vs. Social Networks, Elie Bursztein (Stanford)
Second talk was by Elie, who is a post-doc at Stanford. Elie gave a brief overview on his research: how to turn online social networks into a botnet.   Elie found that a number of existing systems (e.g., MSN messenger) have vulnerabilities: a malicious user can send codes to turn his friends' (and their friends') host machines into a botnet. 


* Eight Friends Are Enough: Social Graph Approximation via Public Listings, Joseph Bonneau (University of Cambridge)
I greatly enjoyed this talk. The talk demonstrated how revealing limited information about a social network (e.g., Facebook's public listing, which shows 8 random friends of a user) can say so much about the entire social graph structure. 

I've heard a new term "social graph privacy". It means to prevent data aggregators from reconstructing large portions of the social graph, composed of users and their friendship links. Joseph said protecting social graph is more difficult than protecting personal data, because personal data can be managed individually by users, while information about a user's place in the social graph can be revealed by any of the user's friends. This work got popular in media.  I also saw BBC interview with the authors.

Mar 30, 2009

Geographic distance of social ties

Barry Wellman published an interesting article on geographic distance of social ties. Below is from his publication webpage.  I'm all intrigued.  I'll be soon posting stories on the Flickr counterpart.

"Does Distance Still Matter in the Age of the Internet?" (Diana Mok, Juan-Antonio Carrasco and Barry Wellman, Urban Studies, 2009).  Our study is part of the broad debate about the role of distance and technology for interpersonal contact. To the best of our knowledge, this is the first study that systematically and explicitly compares the role of distance in social networks pre- and post-Internet. We analyze the effect of distance on the frequency of email, phone, face-to-face and overall contact in personal networks, and we compare the findings with its pre-Internet counterpart whose data were collected in 1978 in the same East York, Toronto locality. We use multilevel models with spline specification to examine the nonlinear effects of distance on the frequency of contact. We compare these effects for both very close and somewhat close ties, and for different role relationships: immediate kin, extended kin, friends and neighbours. The results show that email contact is generally insensitive to distance, but tends to increase for transoceanic relationships greater than 3,000 miles apart. Face-to-face contact remains strongly related to short distances (within five miles), while distance has little impact on how often people phone each other at the regional level (within 100 miles). The study concludes that email has only somewhat altered the way people maintain their relationships. The frequency of face-to-face contact among socially-close friends and relatives has hardly changed between the 1970s and the 2000s, although the frequency of phone contact has slightly increased. Moreover, the sensitivity of these relationships to distance has remained similar, despite the communication affordances of the Internet and low-cost telephony.

Mar 27, 2009

ACM Social Network Systems 2009 Workshop

Tao Stein (@Facebook) and I are organizing a wonderful workshop in Nuremberg next Tuesday. We will have 8 paper presentations and 2 invited talks on security -- all look very interesting.
ACM Social Network Systems 2009

Program
0900 - 1000: Security at a Large Social Network, Tao Stein (Facebook)
1000 - 1030: Botnets vs. Social Networks, Elie Bursztein (Stanford)
1030 - 1100: Break

1100 - 1230: Privacy and Security
Eight Friends Are Enough: Social Graph Approximation via Public Listings

Joseph Bonneau, Jonathan Anderson, Ross Anderson, Frank Stajano (University of Cambridge)
Anonymous Opinion Exchange over Untrusted Social Networks
Mouna Kacimi (Max Planck Institute for Informatics), Stefano Ortolani (Vrije Universiteit), Bruno Crispo (University of Trento)
PeerSoN: P2P Social Networking - Early Experiences and Insights
Sonja Buchegger, Doris Schiƶberg (TU Berlin, Deutsche Telekom Laboratories), Le Hung Vu (EPFL), Anwitaman Datta (NTU Singapore)
1230 - 1330: Lunch

1330 - 1500: The Ties that Bind
On the Strength of Weak Ties in Mobile Social Networks
Stratis Ioannidis, Augustin Chaintreau (Thomson)
Centralities: Capturing the Fuzzy Notion of Importance in Social Graphs
Erwan Le Merrer (INRIA), Gilles Tredan (University of Rennes 1)
Buzztraq: Predicting Geographical Access Patterns of Social Cascades using Social Networks
Nishanth Sastry, Eiko Yoneki, Jon Crowcroft (University of Cambridge)
1500 - 1530: Break

1530 - 1630: Personalizing Search
Towards Personalized Peer-to-Peer Top-K Processing
Xiao Bai, Marin Bertier (INSA de Rennes), Rachid Guerraoui (EPFL, Switzerland), Anne-Marie Kermarrec (INRIA Rennes, France)
Toward Personalized Query Expansion
Marin Bertier (INSA de Rennes, France), Rachid Guerraoui (EPFL, Switzerland), Anne-Marie Kermarrec, Vincent Leroy (INSA de Rennes, France)

Flash Floods and Ripples

My recent paper investigates the role of blogosphere as a social media. 
Flash Floods and Ripples: The Spread of Media Content through the Blogosphere. 
Meeyoung Cha, Juan Antonio Navarro Perez, and Hamed Haddadi
In Proc. of the AAAI Conference on Weblogs and Social Media (ICWSM) Data Challenge Workshop, San Jose, May 2009

We tracked down the occurrences of YouTube videos in blog posts and named the two key patterns we found: flash floods and ripples. Flash floods represent rapid cascade events, which we see in the spread of political videos. Ripples represent a slow propagation, which we see for old music videos. So just how rapid are flash floods? The graph below shows the time it took to propagate YouTube videos, based on their topics. News videos propagate by the hour and stop spreading after a week. Music videos continue to spread after several months.
Plot below shows the propagation pattern of one of the popular YouTube videos in the blogosphere. The video was an advertisement made by the Republican party for the U.S. Presidential Election, 2008. Like other news videos, it spread quickly in the network and was blogged about 79 times within a week!

Mar 26, 2009

Tag cloud on flickr research

Abstract of my recent submission on information propagation.

Wordle: paper abstract 2

ps: loving wordle!

Golden questions of the day

At systems seminar today, we had a discussion about this Eurosys 2009 paper: User Interactions in Social Networks and their Implications
Christo Wilson (UCSB), Bryce Boe (UCSB), Alessandra Sala (UCSB), Krishna Puttaswamy (UCSB), Ben Y. Zhao (UCSB)


Some thoughts and questions:

#1 Can we extract something useful from Facebook wall posts? Data mining? What do people talk about? See Google's flu paper (Nature).

#2 Why study user interactions at all? To look for invariant trends like Dunbar's number? Why is this important in systems research?

#3 There seems a clear distinction between "knowledge" and "interaction" when it comes to measuring tie strength, e.g., "I know him well, but haven't talked for a while". Knowledge part is not pronounced in OSNs. Can we infer tie strength or the level of trust without doing user studies?

#4 Why SybilGuard performs poorly in interaction graphs? It is not intuitive why SybilGuard should work better in interaction graph. In SybilGuard, information about social graph might be enough, because it only matters that you know the other friend is a real user, not a fake user. Fax-mixing properties SybilGuard exploits do not align well with a community structure that is embedded in OSNs.

#5 Links need to be tagged with a purpose. Simply knowing the social graph or the interaction graph may not be enough for many of the socially-enhanced applications.

#6 Low rate conversation made in OSNs could sum up to be quite valuable. See Google's flu paper.

#7 Need to be careful in drawing conclusions. Results could be extrapolated. Extra care needed in methodology.

#8 Related to "trust" in social ties.
Here is a link to the CHI2009 paper on measuring "tie strength" based on information available in online social networks (e.g., number of messages exchanged, word count, education level, mutual friends). Authors do surveys to get ground truth data and some of the survey questions are highly related to trust (e.g., would you lend this friend $100?).