Aug 28, 2009
Iran election in Twitter
Aug 12, 2009
The 1st International Workshop on Mining Social Media
Mining Social Media (MSM'09)
Paper submission deadline: September 6th
Venue: November 9th, 2009 Seville, Spain
Jun 26, 2009
Timely research
Jun 17, 2009
Twitter FAQ
May 20, 2009
ICWSM'09 note - day three
Leveraging Diversity
- Ideas embracing diversity in opinions getting popular: Sidelines, Google moderator
- Goal is to project an accurate proportion of users supporting different opinions. let users get an exposure to challenges or new ideas.
- Quick Q: do people really like diversity?
- Similar work on news media bias "NewsCube" (CHI'09). This work looks at content to aggregate similar news and project different themes news articles. Sidelines paper simply looks at voting counts.
- "Diversity in user activity and content quality in online communities" by Tad Hogg. How many activities does a user do per day?
- Users with very little online time or little activity harder to model (i.e., difficulty of modeling in a heavy-tail distribution).
- Visibility (or exposure) is the key mechanism by which information spreads? (whether exposed by friends or by serendipitous browsing). Visibility and interest are different. (look paper)
- Check out Lada Adamic's write-up on social networks at HP labs.
"Unlike viruses, which spread indiscriminately from host to host, pieces of information are propagated by people who find them interesting and who pass the information to others who they think may be interested. Since people are most similar to their immediate contacts, and this similarity decays as the distance in the social network between individuals increases, information becomes less relevant further away from the source and is unlikely to spread throughout the network. This holds true even in networks with power-law connectivity distributions where highly connected individuals, known as hubs, have the opportunity to potentially spread information to a large number of people. " (See paper)
- Spetrum: retrieving different points of view from the blogosphere.
- Meta search engine for blogs. Would be nice to see memes in the search results. Predicting bloggers' interests in realtime difficult.
- Blog directory (blog category, blogging fusion, yahoo! directory) - Do these sites really work? Blogs are ephemeral.
May 19, 2009
Duncan Watts keynote speech at ICWSM
- Work with Gueorgi Kossinets
- Social network changes over time. Averages about network structure (e.g., path length, clustering coefficient, node degree) stay rather steady over time, but individual quantities (e.g., rank) change rapidly.
- What does this mean in terms of social applications?
(2) macro-sociological experiment on social influence
- Music lab experiment -- very cute idea.
- People do get influenced by others. However, the top favorites in the social influence world also did well in the independent world. (Same trend for the bottom ones in the popularity distribution. Lots of noise in medium hot content)
(3) network survey on facebook
- Launched "Friend Sense" app on Facebook and asked political preferences of users and their friends.
(4) influence of financial rewards on performance
- Crowd sourcing site Amazon's Mechanical Turk (AMT) is a fantastic place to launch quick and inexpensive social science studies.
- Increased pay resulted in more work, but not increased accuracy in work. People always feel they are underpaid---the anchoring effect in psychology
(5) final remarks
- We moved from having too little data to having too much data.
- Facebook's 200+M users don't fit on a single memory for us to analyze!
- Having one more zero in your dataset (huge-data) doesn't mean that you're asking big questions. It's more important to ask important questions and set up experiment that can answer the question.
- Lada Adamic's recent work on diffusion of gesture on second life
- Duncan said, "You might ask why are we even asking this obvious question. After many years of being a sociologist, nothing is obvious!"
ICWSM'09 note - day two
- Cameron Marlow@Facebook: Look at data to do informed design
- Q: How do users find information on site? Are social links actively used? What about search?
- Slideshare A: We do see clicks from social networking sites, but most requests come from Google. So we put a lot of efforts and make sure our slides are easily searchable in Google. Front page is also important -- but that's more of a setting a tone about the site. (identity). We do lots of editorials in the front page.
- Facebook A: news feed is definitely the most important. This is a huge optimization problem. Facebook provides two feeds: live feed, highlights.
Modeling Social Dynamics
- Interesting paper "Stochastic models of user-contributory web sites" does modeling in social sites, similar to the famous Huberman et al. Science paper from 1998.
- One thought. Modeling is based on visible data. How would a model of user behavior change if we were to consider both visible and invisible activity? (e.g., browsing takes up most of our online time!)
- Prediction of content popularity based on early data possible--we've seen the same trend in YouTube. Is this a general trend in any massive content system?
- Flickr and YouTube are not about social relationship, but ultimately about information sharing. See evidence: paper "Personal Information Management vs. Resource Sharing"
- Social dimension (Shneiderman 2002): People search for other people's content on Flickr and YouTube. vs. People search for their own content on Delicious and Connotea. So is the purpose of tagging.
- Social computing & sustainability of sites: it's important to understand what, where, and why people share.
- Interesting paper by Nov et al.
- Motivation for sharing, two axes: self or others, intrinsic or extrinsic. All four combinations have positive loop to user behavior.
- This reminds me of comment from a director at Big Brothers. He said people post videos in YouTube, because they want to be the Steven Spielberg on the web.
May 18, 2009
ICWSM'09 note - day one
Tutorial: Psychology of Social Media
This was a 2-hour long tutorial at ICWSM'09, delivered by Sam Gosling (UT Austin) and Kate Niederhoffer (Nielsen Online). It was refreshing to listen to people in different fields who look at the very same problems (audience span psychologists, sociologists, computer scientists, wall street analysts). Sharing my note below.
1. Social media (like Facebook, Twitter) is serving some psychological needs. What?
- Everybody spends their day by doing something. What do they do and what does that tell us about the person?
- Psychologist Maslow says everyone has his own "hierarchy of human needs". This means that we all have our own set of priority in the action set.
2. Fundamental social needs of people: People want to (1) get along and (2) get ahead.
- Get along meaning, socialize. Get ahead meaning, step up in the social hierarchy level. How could this be projected in social media?
- What do our friends tell us about us? In a lab test, people with low self-esteem wanted to hang out with those who gave negative feedback, against those who gave positive feedback. Homophily?
3. People want to be seen accurately, than being projected more positively than they actually are.
- Sam looked at webpages of people, contacted the webpage owners, interviewed them and asked what person they'd like to be, asked their friends what the person is like, and found the projection is rather accurate.
- Would this hold in Facebook too?
4. What are identity claims?
- This mean the things about you that is deliberately chosen for other people. Examples are t-shirts that you wear, bumper sticker, webpage, pictures you hang in your room, etc.
- Music is not an identity claim. It's a tone-setter. It is typically chosen to set our mood in a particular way. You listen to upbeat music when you head out for clubbing vs. when you are home, you may listen to a different music.
- Books are inadvertent identity claims. This is not so deliberate, but is a behavioral residue over a longer time period. Just by looking at the variety, topics, and organization of books at home, we can tell so much about the person. For example, does the person read a wide variety of books (openness), are the books actually read, are there notes, how are they ordered -- neatly or messy, are cheesy books hidden behind, etc.
5. Language is the backbone of our expression.
- There are a lot of approach to understand human behaviors based on their linguistic styles. The use of pronouns (which we think is garbage word) can already reveal sex and age of a person.
- 140 characters in Twitter can tell so much.
- Usage of word "I" : more common among females. more common among people with low social status. Usage of word "We": group action. but also reflects future trouble (meaning, followed by "but")
6. What do we need to know to know a person?
(a) big five traits (openness = creativity + interest + opinions, conscientiousness = daily life, extraversion = documenting life, agreeableness = polite topic covered, neuroticism = cathartic or auto-therapeutic purpose - standard way, look Goldberg 1992, Costa and McCrae 1992)
(b) personal concerns (visions and goals)
(c) identity (personal myth - very difficult to measure)
May 15, 2009
John Wilkes
May 9, 2009
Chocolate pie chart
Apr 18, 2009
Social Mobile Web 2009
Topics:
Novel social interactions on mobile devices.
Social mobile content sharing and distribution services.
Context aware mobile services - beyond location based services.
Social mobile search and social mobile browsing.
User evaluations of social mobile services.
Mobile user interfaces that incorporate social elements.
Mobility and social networks.
Models of mobile social behavior and mobile traces.
Urban gaming, mobile mixed reality, etc.
Innovative social mobile applications.
Apr 9, 2009
Plotting Venn diagrams
Apr 7, 2009
Crawling YouTube
for comment_entry in comment_feed.entry:
print comment_entry.ToString()
Apr 6, 2009
Summary of the ACM SNS workshop
There are obviously lots of attacks on Facebook. Their long term goal is to achieve that one identity in the system corrsponds to one real identity. In every security policy, trade-off is at site integrity and user experience: throwing in more CAPCHA will increase security, but then users experience will degrade.
(b) 419 attack by Nigirian spammers (e.g., "I am lost in London, please send me $1000 to Western Union")
(c) koobface (a botnet that sends spam URLs)
(d) fake chain letter (e.g., "Facebook is overpopulated")
Often users use the same login credentials across multiple sites. (Yes, I do too!) So if one site gets compromised, then all are compromised. Because most sites force users to use complex password, uses end up using a common password across sites. Facebook tries a lot to educate users with sophisticated privacy setting.
Second talk was by Elie, who is a post-doc at Stanford. Elie gave a brief overview on his research: how to turn online social networks into a botnet. Elie found that a number of existing systems (e.g., MSN messenger) have vulnerabilities: a malicious user can send codes to turn his friends' (and their friends') host machines into a botnet.
* Eight Friends Are Enough: Social Graph Approximation via Public Listings, Joseph Bonneau (University of Cambridge)
I greatly enjoyed this talk. The talk demonstrated how revealing limited information about a social network (e.g., Facebook's public listing, which shows 8 random friends of a user) can say so much about the entire social graph structure.
Mar 30, 2009
Geographic distance of social ties
"Does Distance Still Matter in the Age of the Internet?" (Diana Mok, Juan-Antonio Carrasco and Barry Wellman, Urban Studies, 2009). Our study is part of the broad debate about the role of distance and technology for interpersonal contact. To the best of our knowledge, this is the first study that systematically and explicitly compares the role of distance in social networks pre- and post-Internet. We analyze the effect of distance on the frequency of email, phone, face-to-face and overall contact in personal networks, and we compare the findings with its pre-Internet counterpart whose data were collected in 1978 in the same East York, Toronto locality. We use multilevel models with spline specification to examine the nonlinear effects of distance on the frequency of contact. We compare these effects for both very close and somewhat close ties, and for different role relationships: immediate kin, extended kin, friends and neighbours. The results show that email contact is generally insensitive to distance, but tends to increase for transoceanic relationships greater than 3,000 miles apart. Face-to-face contact remains strongly related to short distances (within five miles), while distance has little impact on how often people phone each other at the regional level (within 100 miles). The study concludes that email has only somewhat altered the way people maintain their relationships. The frequency of face-to-face contact among socially-close friends and relatives has hardly changed between the 1970s and the 2000s, although the frequency of phone contact has slightly increased. Moreover, the sensitivity of these relationships to distance has remained similar, despite the communication affordances of the Internet and low-cost telephony.
Mar 27, 2009
ACM Social Network Systems 2009 Workshop
1000 - 1030: Botnets vs. Social Networks, Elie Bursztein (Stanford)
1030 - 1100: Break
1100 - 1230: Privacy and Security
Eight Friends Are Enough: Social Graph Approximation via Public Listings
Joseph Bonneau, Jonathan Anderson, Ross Anderson, Frank Stajano (University of Cambridge)
Mouna Kacimi (Max Planck Institute for Informatics), Stefano Ortolani (Vrije Universiteit), Bruno Crispo (University of Trento)
PeerSoN: P2P Social Networking - Early Experiences and Insights
Sonja Buchegger, Doris Schiƶberg (TU Berlin, Deutsche Telekom Laboratories), Le Hung Vu (EPFL), Anwitaman Datta (NTU Singapore)
1330 - 1500: The Ties that Bind
On the Strength of Weak Ties in Mobile Social Networks
Stratis Ioannidis, Augustin Chaintreau (Thomson)
Centralities: Capturing the Fuzzy Notion of Importance in Social Graphs
Erwan Le Merrer (INRIA), Gilles Tredan (University of Rennes 1)
Buzztraq: Predicting Geographical Access Patterns of Social Cascades using Social Networks
Nishanth Sastry, Eiko Yoneki, Jon Crowcroft (University of Cambridge)
1500 - 1530: Break
1530 - 1630: Personalizing Search
Towards Personalized Peer-to-Peer Top-K Processing
Xiao Bai, Marin Bertier (INSA de Rennes), Rachid Guerraoui (EPFL, Switzerland), Anne-Marie Kermarrec (INRIA Rennes, France)
Toward Personalized Query Expansion
Marin Bertier (INSA de Rennes, France), Rachid Guerraoui (EPFL, Switzerland), Anne-Marie Kermarrec, Vincent Leroy (INSA de Rennes, France)
Flash Floods and Ripples
In Proc. of the AAAI Conference on Weblogs and Social Media (ICWSM) Data Challenge Workshop, San Jose, May 2009
We tracked down the occurrences of YouTube videos in blog posts and named the two key patterns we found: flash floods and ripples. Flash floods represent rapid cascade events, which we see in the spread of political videos. Ripples represent a slow propagation, which we see for old music videos. So just how rapid are flash floods? The graph below shows the time it took to propagate YouTube videos, based on their topics. News videos propagate by the hour and stop spreading after a week. Music videos continue to spread after several months.
Mar 26, 2009
Golden questions of the day
Christo Wilson (UCSB), Bryce Boe (UCSB), Alessandra Sala (UCSB), Krishna Puttaswamy (UCSB), Ben Y. Zhao (UCSB)
Some thoughts and questions:
#1 Can we extract something useful from Facebook wall posts? Data mining? What do people talk about? See Google's flu paper (Nature).
#2 Why study user interactions at all? To look for invariant trends like Dunbar's number? Why is this important in systems research?
#3 There seems a clear distinction between "knowledge" and "interaction" when it comes to measuring tie strength, e.g., "I know him well, but haven't talked for a while". Knowledge part is not pronounced in OSNs. Can we infer tie strength or the level of trust without doing user studies?
#4 Why SybilGuard performs poorly in interaction graphs? It is not intuitive why SybilGuard should work better in interaction graph. In SybilGuard, information about social graph might be enough, because it only matters that you know the other friend is a real user, not a fake user. Fax-mixing properties SybilGuard exploits do not align well with a community structure that is embedded in OSNs.
#5 Links need to be tagged with a purpose. Simply knowing the social graph or the interaction graph may not be enough for many of the socially-enhanced applications.
#6 Low rate conversation made in OSNs could sum up to be quite valuable. See Google's flu paper.
#7 Need to be careful in drawing conclusions. Results could be extrapolated. Extra care needed in methodology.
#8 Related to "trust" in social ties.
Here is a link to the CHI2009 paper on measuring "tie strength" based on information available in online social networks (e.g., number of messages exchanged, word count, education level, mutual friends). Authors do surveys to get ground truth data and some of the survey questions are highly related to trust (e.g., would you lend this friend $100?).