May 20, 2009

ICWSM'09 note - day three

Leveraging Diversity

- Ideas embracing diversity in opinions getting popular: Sidelines, Google moderator

- Goal is to project an accurate proportion of users supporting different opinions. let users get an exposure to challenges or new ideas.

- Quick Q: do people really like diversity?

- Similar work on news media bias "NewsCube" (CHI'09). This work looks at content to aggregate similar news and project different themes news articles. Sidelines paper simply looks at voting counts.



- "Diversity in user activity and content quality in online communities" by Tad Hogg. How many activities does a user do per day?

- Users with very little online time or little activity harder to model (i.e., difficulty of modeling in a heavy-tail distribution).

- Visibility (or exposure) is the key mechanism by which information spreads? (whether exposed by friends or by serendipitous browsing). Visibility and interest are different. (look paper)



- Check out Lada Adamic's write-up on social networks at HP labs.

"Unlike viruses, which spread indiscriminately from host to host, pieces of information are propagated by people who find them interesting and who pass the information to others who they think may be interested. Since people are most similar to their immediate contacts, and this similarity decays as the distance in the social network between individuals increases, information becomes less relevant further away from the source and is unlikely to spread throughout the network. This holds true even in networks with power-law connectivity distributions where highly connected individuals, known as hubs, have the opportunity to potentially spread information to a large number of people. " (See paper)



- Spetrum: retrieving different points of view from the blogosphere.

- Meta search engine for blogs. Would be nice to see memes in the search results. Predicting bloggers' interests in realtime difficult.

- Blog directory (blog category, blogging fusion, yahoo! directory) - Do these sites really work? Blogs are ephemeral.

May 19, 2009

Duncan Watts keynote speech at ICWSM


(1) evolution of social network structure over time

- Work with Gueorgi Kossinets

- Social network changes over time. Averages about network structure (e.g., path length, clustering coefficient, node degree) stay rather steady over time, but individual quantities (e.g., rank) change rapidly.

- What does this mean in terms of social applications?



(2) macro-sociological experiment on social influence

- Music lab experiment -- very cute idea.

- People do get influenced by others. However, the top favorites in the social influence world also did well in the independent world. (Same trend for the bottom ones in the popularity distribution. Lots of noise in medium hot content)



(3) network survey on facebook

- Launched "Friend Sense" app on Facebook and asked political preferences of users and their friends.



(4) influence of financial rewards on performance

- Crowd sourcing site Amazon's Mechanical Turk (AMT) is a fantastic place to launch quick and inexpensive social science studies.

- Increased pay resulted in more work, but not increased accuracy in work. People always feel they are underpaid---the anchoring effect in psychology



(5) final remarks

- We moved from having too little data to having too much data.

- Facebook's 200+M users don't fit on a single memory for us to analyze!

- Having one more zero in your dataset (huge-data) doesn't mean that you're asking big questions. It's more important to ask important questions and set up experiment that can answer the question.

- Lada Adamic's recent work on diffusion of gesture on second life

- Duncan said, "You might ask why are we even asking this obvious question. After many years of being a sociologist, nothing is obvious!"


Tag cloud on Sam Gosling's tutorial #icwsm

Based on my summary below, Wordle says

ICWSM'09 note - day two


Panel Discussion

- Cameron Marlow@Facebook: Look at data to do informed design

- Q: How do users find information on site? Are social links actively used? What about search?

- Slideshare A: We do see clicks from social networking sites, but most requests come from Google. So we put a lot of efforts and make sure our slides are easily searchable in Google. Front page is also important -- but that's more of a setting a tone about the site. (identity). We do lots of editorials in the front page.

- Facebook A: news feed is definitely the most important. This is a huge optimization problem. Facebook provides two feeds: live feed, highlights.



Modeling Social Dynamics

- Interesting paper "Stochastic models of user-contributory web sites" does modeling in social sites, similar to the famous Huberman et al. Science paper from 1998.

- One thought. Modeling is based on visible data. How would a model of user behavior change if we were to consider both visible and invisible activity? (e.g., browsing takes up most of our online time!)

- Prediction of content popularity based on early data possible--we've seen the same trend in YouTube. Is this a general trend in any massive content system?


- Flickr and YouTube are not about social relationship, but ultimately about information sharing. See evidence: paper "Personal Information Management vs. Resource Sharing"

- Social dimension (Shneiderman 2002): People search for other people's content on Flickr and YouTube. vs. People search for their own content on Delicious and Connotea. So is the purpose of tagging.


- Social computing & sustainability of sites: it's important to understand what, where, and why people share.

- Interesting paper by Nov et al.

- Motivation for sharing, two axes: self or others, intrinsic or extrinsic. All four combinations have positive loop to user behavior.

- This reminds me of comment from a director at Big Brothers. He said people post videos in YouTube, because they want to be the Steven Spielberg on the web.


May 18, 2009

ICWSM'09 note - day one

Some of the interesting comments and conversations I heard:

Keynote speech
- Lillian Lee's new book on Opinion mining and sentiment analysis available for free.
- In opinion mining, simply using bag of words is not enough. Structure of the text is also important, e.g., look for "still" and "however".
- ? and ! are negative feedback cues

Community
- Over a billion dollars spent on influential viral marketing.
- My social cascade work got some attention.
- People's exposure to feed is the most critical factor in social cascade. Number of friends didn't play any significant role (i.e., accidental influential users?).
- Search "Facebook data" in Facebook. Join the Page!
- In a system like Yahoo! Answers, people reward answerers the most on topics like Music, Computer, and Medicine. However, Science questions were rewarded low.

Psychology and Users
- sociogeek paper is based on 150,000 online surveys
- Like Stanley Milgram's familiar strangers in the offline world, we can find familiar strangers in the online counterpart. One key insight to do this is that it will benefit a lot of people. (Social networks have power-law degree distribution, where many people have few contacts. Knowing familiar stranger for people in the tail.)
- Download geotag data at DBpedia

Ranking
- CourseRank program in Stanford by Georgia Koutrika, popularly used by students to rank courses and plan course scheduling.
- Some cute findings: (a) the larger the department size, the smaller individual contribution is. (b) students who got better grades gave higher ranking to the course.
- High median node degree in social networks can reflect spurious relationships? Search "top friends" application on Facebook.
- Similar to "Predicting Tie Strength With Social Media" from CHI2009, paper "Using Transactional Information to Predict Link Strength in Online Social Networks" looked at wall and photo posts to guess tie strength. They used Top Friends app results as the ground truth.


Some thoughts on tie strength and influential users
- After listening to some of the talks, I had the following thought. Network-activity based studies on user interaction mostly focus on visible interactions, e.g., wall posts and photo tagging. This might be a good predictor when we want to measure the strength of ties between individuals (or simply ask who are my best friends?). Then, what does a tie strength mean in terms of viral marketing? Are friends of stronger ties a good indicator of influential people?
- I get a feeling that this is not the case. Based on the Facebook paper "Gesundheit! Modeling Contagion Through Facebook News Feed", we also learned that adoption of information has to do with simple exposure (but not on how popular a friend is). Indeed, exposure is an important and dominant factor in social networks. My recent work using social network clickstream data showed that more than 80% of user activity had to do with "silent" browsing---which a crawl-based study cannot capture. Which friends are important then in viral marketing? Best friends or those whom I get exposed to?
- Obviously there will be lots of overlap between the two groups. But, my conclusion today is that influential users are the ones whom I "actively" get exposed to. This means friends whom I follow up on their updates by visiting their pages. Would accidental influentials would be those whom I got opportunistically exposed to (e.g., Facebook's news feed) and joined cascades?

Tutorial: Psychology of Social Media

This was a 2-hour long tutorial at ICWSM'09, delivered by Sam Gosling (UT Austin) and Kate Niederhoffer (Nielsen Online). It was refreshing to listen to people in different fields who look at the very same problems (audience span psychologists, sociologists, computer scientists, wall street analysts). Sharing my note below.



1. Social media (like Facebook, Twitter) is serving some psychological needs. What?

- Everybody spends their day by doing something. What do they do and what does that tell us about the person?

- Psychologist Maslow says everyone has his own "hierarchy of human needs". This means that we all have our own set of priority in the action set.



2. Fundamental social needs of people: People want to (1) get along and (2) get ahead.

- Get along meaning, socialize. Get ahead meaning, step up in the social hierarchy level. How could this be projected in social media?

- What do our friends tell us about us? In a lab test, people with low self-esteem wanted to hang out with those who gave negative feedback, against those who gave positive feedback. Homophily?



3. People want to be seen accurately, than being projected more positively than they actually are.

- Sam looked at webpages of people, contacted the webpage owners, interviewed them and asked what person they'd like to be, asked their friends what the person is like, and found the projection is rather accurate.

- Would this hold in Facebook too?



4. What are identity claims?

- This mean the things about you that is deliberately chosen for other people. Examples are t-shirts that you wear, bumper sticker, webpage, pictures you hang in your room, etc.

- Music is not an identity claim. It's a tone-setter. It is typically chosen to set our mood in a particular way. You listen to upbeat music when you head out for clubbing vs. when you are home, you may listen to a different music.

- Books are inadvertent identity claims. This is not so deliberate, but is a behavioral residue over a longer time period. Just by looking at the variety, topics, and organization of books at home, we can tell so much about the person. For example, does the person read a wide variety of books (openness), are the books actually read, are there notes, how are they ordered -- neatly or messy, are cheesy books hidden behind, etc.



5. Language is the backbone of our expression.

- There are a lot of approach to understand human behaviors based on their linguistic styles. The use of pronouns (which we think is garbage word) can already reveal sex and age of a person.

- 140 characters in Twitter can tell so much.

- Usage of word "I" : more common among females. more common among people with low social status. Usage of word "We": group action. but also reflects future trouble (meaning, followed by "but")



6. What do we need to know to know a person?

(a) big five traits (openness = creativity + interest + opinions, conscientiousness = daily life, extraversion = documenting life, agreeableness = polite topic covered, neuroticism = cathartic or auto-therapeutic purpose - standard way, look Goldberg 1992, Costa and McCrae 1992)

(b) personal concerns (visions and goals)

(c) identity (personal myth - very difficult to measure)





May 15, 2009

John Wilkes

John (who is my Facebook friend!) visited MPI-SWS and gave a talk about data center storage design. His theme was the design of "light out data center": can we design a data center that will continue to work when all the light goes out (without human intervention?) This means automated control.

He also shared one very interesting thought. SLAs are ad-hoc and random (e.g., service outage should be within three-nines, or 0.999 availability). Sometimes meeting this arbitrary number means investing arbitrary money and effort. So why not have flexible SLA and see what are the losses as we fail to meet the needs?---what a refreshing idea. John further said cost of not meeting the needs at Google may mean loosing customers. He said "users are one click away from leaving to other services" and the cause of leaving may depend on many axes including a particular shade of blue they use in the logo.



May 9, 2009

Chocolate pie chart

I have been quite busy with a paper deadline, advisory board visit, etc. While looking for a nice graphics tool to plot a pie chart, I came across this irresistible chart made of 70% milk + 20% dark + 10% milk chocolate. I'd love to make a program to generate this.
If tempted to put on your paper, consult Mary and Matt store