Researcher examines crowdsourcing as a way to identify interesting content

AddThis

One future application of artificial intelligence may involve predicting which web content most people will find interesting.

On Wednesday, Tad Hogg, a research fellow at the Institute for Molecular Manufacturing, presented his research about crowdsourcing as a means of identifying appealing content on the internet. The talk was part of the Association for the Advancement of Artificial Intelligence Conference on Human Computation and Crowdsourcing, held at the AT&T Center from Oct. 31 to Nov. 3. 

“We all have experience looking at lists of things on the internet,” Hogg said. “Often, there’s way too many items to show all of them, and you wouldn’t have the time or interest to go through them. Our goal is to figure out which one to show to users.”

According to Hogg, the idea is to create an accurate representation of the quality of the content by combining ratings from users. However, he said two aspects of user involvement make this task difficult: the varying levels of attention users dedicate to websites and peer influence. Hogg said peer influence occurs on websites that show votes or comments. This public feedback can influence how other users perceive the same content.

“One of the main problems is deciding which ratings to use when it’s about something subjective like this,” Hogg said. “There are people that just look at the title and a short description before voting, and then there are people who look at the detail content. So whose ratings do you use? Everyone’s or just the informed users’?”

To answer that question, Hogg conducted a two-part experiment. First, he created a website with 100 science stories and asked 3000 users to “like” the content they found interesting. Each story had a title and a short description, as well as an optional URL that allowed the user to read more. Hogg then used that data to determine the most popular stories.

In the next part of the experiment, Hogg tried to determine which group was best at predicting the most popular stories: a large group of users or a group of only “informed users,” those who clicked the link to read more. As each of these groups rated the stories, Hogg used an artificial intelligence program to predict which stories would be rated as the most popular by each group. He then compared these predictions to the original measurements from the first part of the experiment.

“The informed users converged to the actual quality, but it takes a long time because there are so few of them,” Hogg said.

According to Hogg, focusing on the informed users’ votes would best predict the highest-rated stories, as long as there was a large number of them. Including the uninformed users generates results faster but less accurately. 

During the second part of the experiment, there were 10 percent less URL clicks than votes, meaning people “liked” a story without reading the entire article, Hogg said. Therefore, the “uninformed” readers produced the vast majority of votes. Hogg said these results show that the average user does not devote much effort to these types of content-rating sites, such as Facebook or Reddit. 

In the future, Hogg said he hopes to generalize his work and study real user behavior on existing websites rather than experimental behavior.

Adam Kalai, a member of Microsoft Research who works on crowdsourcing, attended the event.

“[The presentation] was very interesting and very high quality,” Kalai said. “I learned a lot about the latest and greatest techniques. And it did a good job of showing both sides — the limitations as well as the benefits. I thought it was excellent.”