Our Ph.D. student, Manasa Rath, has successfully defended her dissertation titled “Assessing the quality of user-generated content in the presence of automated quality scores”. The committee included Chirag Shah (University of Washington, Chair), Vivek Singh (Rutgers University), Kaitlin Costello (Rutgers University), and Sanda Erdelez (Simmons University).
Students seek online crowdsource to fulfill their academic course requirements
Manasa investigated the quality of those user-generated content whether it is correct, credible, clear, and complete using her developed framework. The framework has been validated with multiple experts in the field. She then generated an automation to score the content accordingly and conduct a user study on 45 undergraduate students to see how and to what extent do users considered the role of quality while completing the task provided.
With the proliferation in participatory web culture, individuals not only create but also consume content present in crowdsourced environments such as blogs, question-answering systems. Studies have revealed users often employ the content present in them to make a range of choices, from issues in everyday life to important life decisions, without paying much attention to the quality. In the recent years, studies have demonstrated K-12 students’ over reliance on these crowdsourced sites to fulfill their academic course requirements. Therefore, it is important to evaluate users’ cognizance while evaluating the quality of content in these venues. But before identifying to what extent do users make use of quality while evaluating user-generated content, it is important to learn what constitutes of quality. To address these issues, this dissertation expounds to the problems in a three-step process. First, the dissertation begins by developing a conceptual framework for evaluating quality of user-generated content consisting of constructs such as correct, credible, clear, and complete. The second step involves validating the framework with the help of twelve experts i.e. librarians to attest the developed framework and using this validation to come to generate automated methodologies to evaluate the quality of content to provide quality scores. To further investigate, the third step delves deeper into users’ reliance on the automated quality scores by conducting a user study. 45 undergraduate students were recruited to investigate their use of automated quality scores while completing their tasks under three conditions – users provided with genuine quality scores, users provided with manipulated quality scores, and users provided with no quality scores (control). As prior research has indicated users task completion is dependent on the task-type provided, this user study involves providing users with different task types such as ranking and synthesis in the presence of quality scores. To further comprehend users’ use of quality scores while completing the tasks, the author makes use of eye-tracking metrics such as total number of gazes, gaze duration, and the number of gazes on the quality scores to evaluate their use by the users. Analyses was performed with the help of fixation data, users’ responses to pre and post task questionnaires, along with task scenarios along with interview transcripts. ANOVA and other statistical analyses were conducted and no statistical differences were found between users’ use of quality scores and the type of task. It was also found that users primarily considered the constructs – correct and credible from the charts from the qualitative data. Additionally, it was also found that users made use of the quality scores primarily when they had little familiarity with the topic. The study provided insights into how and to what extent do users considered the role of quality while completing the task provided. The contribution of this human information behavior study is of twofold: users’ reliance on an automated score provided by an intelligent tool along with studying users’ confidence in considering quality in evaluating user-generated content.
Our Ph.D. student, Ruoyuan Gao, has successfully defended her dissertation titled “Toward a Fairer Information Retrieval System”. The committee included Chirag Shah (University of Washington, Chair), Yongfeng Zhang (Rutgers University), Gerard de Melo (Rutgers University), and Fernando Diaz (Microsoft).
Ruoyuan investigated the existing bias presented in search engine results to understand the relationship between relevance and fairness in the results. She developed frameworks that could effectively identify the fairness and relevance in a data set. She also proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty, and relevance. With this matric, she developed algorithms that optimized both diversity fairness and relevance for search results.
With the increasing popularity and social influence of information retrieval (IR) systems, various studies have raised concerns on the presence of bias in IR and the social responsibilities of IR systems. Techniques for addressing these issues can be classified into pre-processing, in-processing and post-processing. Pre-processing reduces bias in the data that is fed into the machine learning models. In-processing encodes the fairness constraints as a part of the objective function or learning process. Post-processing operates as a top layer over the trained model to reduce the presentation bias exposed to users. This dissertation explored ways to bring the pre-processing and post-processing approaches, together with the fairness-aware evaluation metrics, into a unified frame- work as an attempt to break the vicious cycle of bias.
We first investigated the existing bias presented in search engine results. Specifically, we focused on the top-k fairness ranking in terms of statistical parity fairness and disparate impact fairness definitions. With Google search and a general purpose text cluster as a lens, we explored several topical diversity fairness ranking strategies to understand the relationship between relevance and fairness in search results. Our experimental results show that different fairness ranking strategies result in distinct utility scores and may perform differently with distinct datasets. Second, to further investigate the relationship of data and fairness algorithms, we developed a statistical framework that was able to facilitate various analysis and decision making. Our framework could effectively and efficiently estimate the domain of data and solution space. We derived theoretical expressions to identify the fairness and relevance bounds for data of different distributions, and applied them to both synthetic datasets and real world datasets. We presented a series of use cases to demonstrate how our framework was applied to associate data and provide insights to fairness optimization problems. Third, we proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty and relevance. This metric offered a new perspective of evaluating fairness-aware ranking results. Based on this metric, we developed effective ranking algorithms that optimized for diversity fairness and relevance at the same time. Our experiments showed that our algorithms were able to capture multiple aspects of the ranking and optimize the proposed fairness-aware metric.
FATE project – What is it about and why does it matter?
FATE is our InfoSeeking Lab’s one of the most active projects. It stands for Fairness Accountability Transparency Ethics. FATE aims to address bias found in search engines like Google and discover ways to de-bias information presented to the end-user while maintaining a high degree of utility.
Why does it matter?
There are many pieces of evidence in the past where search algorithms reinforce bias assumptions toward a certain group of people. Below are some past examples of search bias related to the Black community.
Search engines suggested unpleasant words on Black women. The algorithm recommended words like ‘angry’, ‘loud’, ‘mean’, or ‘attractive’. These auto-completions reinforced bias assumptions toward Black women.
Search results show images of Black’s natural hair as unprofessional while showing images of white Americans’ straight hair as professional hairstyles for work.
Search results on “three black teenagers” were represented by mug shots of Black teens while the results of “three white teenagers” were represented by images of smiling and happy white teenagers.
These are some issues that were around for many years until someone uncovered them, which then sparked changes to solve these problems.
At FATE, we aim to address these issues and find ways to bring fairness when seeking information.
If you want to learn more about what we do or get updates on our latest findings, check out our FATE website.