FATE Research Group: From Why to What and How

FATE Research Group: From Why to What and How

When the public started getting access to the internet, search engines became common in daily usage. Services such as Yahoo, AltaVista, and Google were used to satisfy people’s curiosity. Although it was not comfortable using search engines because users had to go back and forth between all the search engines, it seemed like magic that users could get so much information in a very short time. At that time, users started using search engines without any previous training. Before search engines became popular, the public generally found information in libraries by reading the library catalog or asking a librarian for help. In contrast, typing a few keywords is enough to find answers on the internet. Not only that, but search engines have been continually developing their own algorithms and giving us great features, such as knowledge bases that enhance their search engine results with information gathered from various sources.

Soon enough, Google became the first choice for many people due to its accuracy and high-quality results. As a result, other search engines got dominated by Google. However, while Google results are high-quality, those results are biased. According to a recent study, the top web search results from search engines are typically shown to be biased. Some of the results on the first page are made to be there just to capture users’ attention. At the same time, users tend to click mostly on results that appear on the first page. The study gives an example about a normal topic: coffee and health. In the first 20 results, there are 17 results about the health benefits, while only 3 results mentioned the harms.

This problem led our team at the InfoSeeking Lab to start a new project known as Fairness, Accountability, Transparency, Ethics (FATE). In this project, we have been exploring ways to balance the inherent bias found in search engines and fulfill a sense of fair representation while effectively maintaining a high degree of utility.

We started this experiment with one big goal, which is to improve fairness. For that, we designed a new system that shows two sets of results, both of which are very similar to Google’s dashboard. (as illustrated by picture below).  We have collected 100 queries and top 100 results per query from Google in general topics such as sports, food, travel, etc. One of these sets is obtained from Google. The other one is generated through an algorithm that reduces bias. The system has 20 rounds. The system gives a user 30 seconds on each round to choose the set they prefer.

For this experiment, we asked around 300 participants to participate. The goal is to see if participants can notice a difference between our algorithms and Google. The early results show that participants preferred our algorithms more than Google. However, we will discuss more in detail as soon as we finish the analysis process. Furthermore, we are in the process of writing a technical paper and an academic article.

Also, we have designed a game that looks very similar to our system. This game tests the ability to notice bad results. It gives you a score and some advice. In this game, users can also challenge their friends or members of their families. To try this game, click here http://fate.infoseeking.org/googleornot.php

For many years, the InfoSeeking Lab has worked on issues related to information retrieval, information behavior, data science, social media, and human-computer interaction. Visit the InfoSeeking Lab website to know more about our projects https://www.infoseeking.org

For more information about the experiment visit FATE project website http://fate.infoseeking.org

Jonathan Pulliza successfully defends his dissertation

Jonathan Pulliza successfully defends his dissertation

Jonathan Pulliza, Ph.D. student

Our Ph.D. student, Jonathan Pulliza, has successfully defended his dissertation titled titled “Let the Robot Do It For Me: Assessing Voice As a Modality for Visual Analytics for Novice Users”. The committee included  Chirag Shah (University of Washington, Chair), Nina Wacholder (Rutgers University), Mark Aakhus (Rutgers University), and Melanie Tory (Tableau).

Pulliza’s study focuses on understanding how the voice system facilitates novice users in Visual Analytics (VA). He found that participants chose to use the voice system because of its convenience, ability to get a quick start on their work, and better access to some functions that they could not find in the traditional screen interface. Participants refrained from choosing voice because of their previous experiences. They felt that using the voice system would not provide then all access to the more complicated VA system. They then often chose to struggle with the visual interface instead of using the voice system for assistance.

Abstract

The growth of Visual Analytics (VA) systems has been driven by the need to explore and understand large datasets across many domains. Applications such as Tableau were developed with the goal of better supporting novice users to generate data visualizations and complete their tasks. However, novice users still face many challenges in using VA systems, especially in complex tasks outside of simple trend identification, such as exploratory tasks. Many of the issues stem from the novice users’ inability to reconcile their questions or representations of the data with the visualizations presented using the interactions provided by the system.

With the improvement in natural language processing technology and the increased prevalence of voice interfaces, there is a renewed interest in developing voice interactions for VA systems. The goal is to enable users to ask questions directly to the system or to indicate specific actions using natural language, which may better facilitate access to functions available in the VA system. Previous approaches have tended to build systems in a screen-only environment in order to encourage interaction through voice. Though they did produce significant results and guidance for the technical challenges of voice in VA, it is important to understand how the use of a voice system would affect novice users within their most common context instead of moving them into new environments. It is also important to understand when a novice user would choose to use a voice modality when the traditional keyboard and mouse modality is also available.

This study is an attempt to understand the circumstances under which novice users of a VA system would choose to interact with using their voice in a traditional desktop environment, and whether the voice system better facilitates access to available functionalities. Given the users choose the voice system, do they choose different functions than those with only a keyboard and a mouse? Using a Wizard of Oz set up in the place of an automated voice system, we find that the participants chose to use the voice system because of its convenience, ability to get a quick start on their work, and in some situations where they could not find a specific function in the interface. Overall function choices were not found to be significantly different between those who had access to the voice system versus those who did not, though there were a few cases where participants were able to access less common functions compared to a control group. Participants refrained from choosing voice because their previous experiences with voice systems had led them to believe all voice systems were not capable of addressing their task needs. They also felt using the voice system was incongruent with gaining mastery of the underlying VA system, as the convenience of using the voice system could lead to its use as a crutch. Participants then often chose to struggle with the visual interface instead of using the voice system for assistance. In this way, they prioritized building a better mental model of the system over building a better sense of the data set and accomplishing the task.

Manasa Rath successfully defends her dissertation

Manasa Rath successfully defends her dissertation

Manasa Rath, Ph.D. student

Our Ph.D. student, Manasa Rath, has successfully defended her dissertation titled “Assessing the quality of user-generated content in the presence of automated quality scores”. The committee included  Chirag Shah (University of Washington, Chair), Vivek Singh (Rutgers University), Kaitlin Costello (Rutgers University), and Sanda Erdelez (Simmons University).

Students seek online crowdsource to fulfill their academic course requirements

Manasa investigated the quality of those user-generated content whether it is correct, credible, clear, and complete using her developed framework. The framework has been validated with multiple experts in the field. She then generated an automation to score the content accordingly and conduct a user study on 45 undergraduate students to see how and to what extent do users considered the role of quality while completing the task provided.

Abstract

With the proliferation in participatory web culture, individuals not only create but also consume content present in crowdsourced environments such as blogs, question-answering systems. Studies have revealed users often employ the content present in them to make a range of choices, from issues in everyday life to important life decisions, without paying much attention to the quality. In the recent years, studies have demonstrated K-12 students’ over reliance on these crowdsourced sites to fulfill their academic course requirements. Therefore, it is important to evaluate users’ cognizance while evaluating the quality of content in these venues. But before identifying to what extent do users make use of quality while evaluating user-generated content, it is important to learn what constitutes of quality. To address these issues, this dissertation expounds to the problems in a three-step process. First, the dissertation begins by developing a conceptual framework for evaluating quality of user-generated content consisting of constructs such as correct, credible, clear, and complete. The second step involves validating the framework with the help of twelve experts i.e. librarians to attest the developed framework and using this validation to come to generate automated methodologies to evaluate the quality of content to provide quality scores. To further investigate, the third step delves deeper into users’ reliance on the automated quality scores by conducting a user study. 45 undergraduate students were recruited to investigate their use of automated quality scores while completing their tasks under three conditions – users provided with genuine quality scores, users provided with manipulated quality scores, and users provided with no quality scores (control). As prior research has indicated users task completion is dependent on the task-type provided, this user study involves providing users with different task types such as ranking and synthesis in the presence of quality scores. To further comprehend users’ use of quality scores while completing the tasks, the author makes use of eye-tracking metrics such as total number of gazes, gaze duration, and the number of gazes on the quality scores to evaluate their use by the users. Analyses was performed with the help of fixation data, users’ responses to pre and post task questionnaires, along with task scenarios along with interview transcripts. ANOVA and other statistical analyses were conducted and no statistical differences were found between users’ use of quality scores and the type of task. It was also found that users primarily considered the constructs – correct and credible from the charts from the qualitative data. Additionally, it was also found that users made use of the quality scores primarily when they had little familiarity with the topic. The study provided insights into how and to what extent do users considered the role of quality while completing the task provided. The contribution of this human information behavior study is of twofold: users’ reliance on an automated score provided by an intelligent tool along with studying users’ confidence in considering quality in evaluating user-generated content.

Ruoyuan Gao successfully defends her dissertation

Ruoyuan Gao successfully defends her dissertation

Ruoyuan Gao, Ph.D. student

Our Ph.D. student, Ruoyuan Gao, has successfully defended her dissertation titled “Toward a Fairer Information Retrieval System”. The committee included  Chirag Shah (University of Washington, Chair), Yongfeng Zhang (Rutgers University), Gerard de Melo (Rutgers University), and Fernando Diaz (Microsoft).

Ruoyuan investigated the existing bias presented in search engine results to understand the relationship between relevance and fairness in the results. She developed frameworks that could effectively identify the fairness and relevance in a data set. She also proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty, and relevance. With this matric, she developed algorithms that optimized both diversity fairness and relevance for search results.

Abstract

With the increasing popularity and social influence of information retrieval (IR) systems, various studies have raised concerns on the presence of bias in IR and the social responsibilities of IR systems. Techniques for addressing these issues can be classified into pre-processing, in-processing and post-processing. Pre-processing reduces bias in the data that is fed into the machine learning models. In-processing encodes the fairness constraints as a part of the objective function or learning process. Post-processing operates as a top layer over the trained model to reduce the presentation bias exposed to users. This dissertation explored ways to bring the pre-processing and post-processing approaches, together with the fairness-aware evaluation metrics, into a unified frame- work as an attempt to break the vicious cycle of bias.

We first investigated the existing bias presented in search engine results. Specifically, we focused on the top-k fairness ranking in terms of statistical parity fairness and disparate impact fairness definitions. With Google search and a general purpose text cluster as a lens, we explored several topical diversity fairness ranking strategies to understand the relationship between relevance and fairness in search results. Our experimental results show that different fairness ranking strategies result in distinct utility scores and may perform differently with distinct datasets. Second, to further investigate the relationship of data and fairness algorithms, we developed a statistical framework that was able to facilitate various analysis and decision making. Our framework could effectively and efficiently estimate the domain of data and solution space. We derived theoretical expressions to identify the fairness and relevance bounds for data of different distributions, and applied them to both synthetic datasets and real world datasets. We presented a series of use cases to demonstrate how our framework was applied to associate data and provide insights to fairness optimization problems. Third, we proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty and relevance. This metric offered a new perspective of evaluating fairness-aware ranking results. Based on this metric, we developed effective ranking algorithms that optimized for diversity fairness and relevance at the same time. Our experiments showed that our algorithms were able to capture multiple aspects of the ranking and optimize the proposed fairness-aware metric.

FATE project – What is it about and why does it matter?

FATE project – What is it about and why does it matter?

FATE is our InfoSeeking Lab’s one of the most active projects. It stands for Fairness Accountability Transparency Ethics. FATE aims to address bias found in search engines like Google and discover ways to de-bias information presented to the end-user while maintaining a high degree of utility.

Why does it matter?

There are many pieces of evidence in the past where search algorithms reinforce bias assumptions toward a certain group of people. Below are some past examples of search bias related to the Black community.

Search engines suggested unpleasant words on Black women. The algorithm recommended words like ‘angry’, ‘loud’, ‘mean’, or ‘attractive’. These auto-completions reinforced bias assumptions toward Black women.

Credit: Safiya Noble

Search results show images of Black’s natural hair as unprofessional while showing images of white Americans’ straight hair as professional hairstyles for work.

Credit: Safiya Noble

Search results on “three black teenagers” were represented by mug shots of Black teens while the results of  “three white teenagers” were represented by images of smiling and happy white teenagers.

Credit: Safiya Noble

These are some issues that were around for many years until someone uncovered them, which then sparked changes to solve these problems.

At FATE, we aim to address these issues and find ways to bring fairness when seeking information.

If you want to learn more about what we do or get updates on our latest findings, check out our FATE website.

Yiwei Wang successfully defends her dissertation

Yiwei Wang successfully defends her dissertation

Yiwei Wang, Ph.D. student

Our Ph.D. student, Yiwei Wang, has successfully defended her dissertation titled “Authentic vs. Synthetic: A Comparison of Different Methods for Studying Task-based Information Seeking”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Kaitlin Costello (Rutgers University), and Diane Kelly (University of Tennessee Knoxville).

Abstract

In task-based information seeking research, researchers often collect data about users’ online behaviors to predict task characteristics and personalize information for users. User behavior may be directly influenced by the environment in which a study is conducted, and the tasks used. This dissertation investigates the impact of study setting and task authenticity on users’ searching behaviors, perceived task characteristics, and search experiences. Thirty-six undergraduate participants finished one lab session and one remote session in which they completed one authentic and one simulated task. The findings demonstrate that the synthetic lab setting and simulated tasks had significant influences mostly on behaviors related to content pages, such as page dwell time and number of pages visited per task. Meanwhile, first-query behaviors were less affected than whole-session behaviors, indicating the reliability of using first-query behaviors in task prediction. Subjective task characteristics—such as task motivation and importance—also varied in different settings and tasks. Qualitative interviews reveal why users were influenced. This dissertation addresses methodological limitations in existing research and provides new insights and implications for researchers who collect online user search behavioral data.

Souvick Ghosh successfully defends his dissertation

Souvick Ghosh successfully defends his dissertation

Souvick Ghosh, Ph.D. student

Our Ph.D. student, Souvick Ghosh, has successfully defended his dissertation titled “Exploring Intelligent Functionalities of Spoken Conversational Search Systems”. The committee included Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Katya Ognyanova (Rutgers University), and Vanessa Murdock (Amazon).

Abstract

Conversational search systems often fail to recognize the information need of the user, especially for exploratory and complex tasks where the question is non-factoid in nature. In any conversational search environment, spoken dialogues by the user communicate the search intent and the information need of the user to the system. In response, the system performs specific, expected search actions. This is a domain-specific natural language understanding problem where the agent must understand the user’s utterances and act accordingly. Prior literature in intelligent systems suggests that in a conversational search environment, spoken dialogues communicate the search intent and the information need of the user. The meaning of these spoken utterances can be deciphered by accurately identifying the speech or dialogue acts associated with them. However, only a few studies in the information retrieval community have explored automatic classification of speech acts in conversational search systems, and this creates a research gap. Also, during spoken search, the user rarely has control over the search process as the actions of the system are hidden from the user. This eliminates the possibility of correcting the course of search (from the user’s perspectives) and raises concern about the quality of the search and the reliability of the results presented. Previous research in human-computer interaction suggests that the system should facilitate user-system communication by explaining its understanding of the user’s information problem and the search context (referred to as the system’s model of the user). Such explanations could include the system’s understanding of the search on an abstract level and the description of the search process undertaken (queries and information sources used) on a functional level. While these interactions could potentially help the user and the agent to understand each other better, it is essential to evaluate if explicit clarifications are necessary and desired by the user.

We have conducted a within-subjects Wizard-of-Oz user study to evaluate user satisfaction and preferences in systems with and without explicit clarifications. However, the results of the Wilcoxon Signed Rank Test showed that the use of explicit system-level clarifications produced no positive effect on the user’s search experience. We have also built a simple but effective Multi-channel Deep Speech Classifier (MDSC) to predict speech acts and search actions in an information-seeking dialogue. The results highlight that the best performing model predicts speech acts with 90.2% and 73.2% for CONVEX and SCS datasets respectively. For search actions, the highest reported accuracy was 63.7% and 63.3% for CONVEX and SCS datasets respectively. Overall, for speech act prediction, MSDC outperforms all the traditional classification models by a large margin and shows improvements of 54.4% for CONVEX and 18.3% over the nearest baseline for SCS. For search actions, the improvements were 32.3% and 2.2% over the closest machine learning baselines. The results of ablation analysis indicate that the best performance is achieved using all the three channels for speech act prediction and metadata features only when predicting search actions. Individually, metadata features were most important, followed by lexical and syntactic features.

In this dissertation, we provide insights on two intelligent functionalities which are expected of conversational search systems: (i) how to better understand the natural language utterances of the user, in an information-seeking conversation; and (ii) if explicit clarifications or explanations from the system will improve the user-agent interaction during the search session. The observations and recommendations from this study will inform the future design and development of spoken conversational systems.

Prof. Chirag Shah is receiving the KSJ Award 2019 and giving a keynote at ECIR 2020.

Prof. Chirag Shah is receiving the KSJ Award 2019 and giving a keynote at ECIR 2020.

Our lab director, Prof. Chirag Shah, is receiving the Microsoft BCS/BCS IRSG Karen Spärck Jones Award (KSJ Award) 2019 and he is giving a keynote this Wednesday at the 42nd European Conference on Information Retrieval (ECIR 2020). 

About the KSJ Award

KSJ Award is created by The British Computer Society Information Retrieval Specialist Group (BCS IRSG) in conjunction with the BCS sine 2008. The award also sponsored by Microsoft Research. See more detail at https://irsg.bcs.org/ksjaward.php

About the keynote

“Task-Based Intelligent Retrieval and Recommendation”

While the act of looking for information happens within a context of a task from the user side, most search and recommendation systems focus on user actions (‘what’), ignoring the nature of the task that covers the process (‘how’) and user intent (‘why’). For long, scholars have argued that IR systems should help users accomplish their tasks and not just fulfill a search request. But just as keywords have been good enough approximators for information need, satisfying a set of search requests has been deemed to be good enough to address the task. However, with changing user behaviors and search modalities, specifically found in conversational interfaces, the challenge and opportunity to focus on task have become critically important and central to IR. In this talk, I will discuss some of the key ideas and recent works — both theoretical and empirical — to study and support aspects of task. I will show how we could derive user’s search path or strategy and intentions, and how they could be instrumental in not only creating more personalized search and recommendation solutions, but also solving problems not possible otherwise. Finally, I will extend this to the realm of intelligent assistants with our recent work in a new area called Information Fostering, where our knowledge of the user and the task can help us address another classical problem in IR — people don’t know what they don’t know.

See more detail and join the conference for free at https://ecir2020.org/

Jiqun Liu successfully defends his dissertation

Jiqun Liu successfully defends his dissertation

Jiqun Liu, Ph.D. student

Our Ph.D. student, Jiqun Liu, has successfully defended his dissertation titled “A State-Based Approach to Supporting Users in Complex Search Tasks”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Kaitlin Costello (Rutgers University), and Dan Russell (Google).

Liu’s study focuses on understanding the multi-round search processes of complex search tasks by using computational models of interactive IR and develop personalized recommendations to support task completion and search satisfaction. From the study, the team built a search recommendation model based on Q-learning algorithm. The results demonstrated that the simulated search episodes can improve search efficiency to many extents.

Abstract

Previous work on task-based interactive information retrieval (IR) has mainly focused on what users found along the search process and the predefined, static aspects of complex search tasks (e.g., task goal, task product, cognitive task complexity), rather than how complex search tasks of different types can be better understood, examined, and disambiguated within the associated multi-round search processes. Also, it is believed that the knowledge about users’ cognitive variations in task-based search process can help tailor search paths and experiences to support task completion and search satisfaction. To adaptively support users engaging in complex search tasks, it is critical to connect theoretical, descriptive frameworks of search process with computational models of interactive IR and develop personalized recommendations for users according to their task states. Based on the data collected from two laboratory user studies, in this dissertation we sought to understand the states and state transition patterns in complex search tasks of different types and predict the identified task states using Machine Learning (ML) classifiers built upon observable search behavioral features. Moreover, through running Q-learning-based simulation of adaptive search recommendations, we also explored how the state-based framework could be applied in building computational models and supporting users with timely recommendations.

Based on the results from the dissertation study, we identified four intention-based task states and six problem-help-based task states, which depict the active, planned dimension and situational, unanticipated dimension of search tasks respectively. We also found that 1) task state transition patterns as features extracted from interaction process could be useful for disambiguating different types of search tasks; 2) the implicit task states can be inferred and predicted using behavioral-feature-based ML classifiers. With respect to application, we built a search recommendation model based on Q-learning algorithm and the knowledge we learned about task states. Then we apply the model in simulating search sessions consisting of potentially useful query segments with high rewards from different users. Our results demonstrated that the simulated search episodes can improve search efficiency to varying extents in different task scenarios. However, in many task contexts, this improvement often comes with the price of hurting the diversity and fairness in information coverage.

This dissertation presents a comprehensive study on state-based approach to understanding and supporting complex search tasks: from task state and state transition pattern identification, task state prediction, all the way to the application of computational state-based model in simulating dynamic search recommendations. Our process-oriented, state-based framework can be further extended with studies in a variety of contexts (e.g., multi-session search, collaborative search, conversational search) and deeper knowledge about users’ cognitive limits and search decision-making.

Hands-On Introduction to Data Science, Dr. Shah, our lab director’s new book

Hands-On Introduction to Data Science, Dr. Shah, our lab director’s new book

If you are looking to get started in Data Science, or in the entry-level to intermediate level, this book is just the right fit for you. The “Hands-On Introduction to Data Science” newly published book by our lab director, Dr. Shah, is filled with hands-on examples, a wide range of practices and real-life applications that will help you develop a solid understanding of the subject. No prior technical background or computing knowledge needed for this book

If you are instructors and looking for a good textbook for your class, the book also provides end-to-end support for teaching a data science course.  The book provides curriculum suggestions, slides for each chapter, datasets, program scripts, and solutions to each exercise, as well as sample exams and projects. 

Reviews & Endorsements
‘Dr. Shah has written a fabulous introduction to data science for a broad audience. His book offers many learning opportunities, including explanations of core principles, thought-provoking conceptual questions, and hands-on examples and exercises. It will help readers gain proficiency in this important area and quickly start deriving insights from data.’ Ryen W. White, Microsoft Research AI.

Now available at
https://www.amazon.com/Hands-Introduction-Data-Science/dp/1108472443/?pldnSite=1

Book Summary:
This book introduces the field of data science in a practical and accessible manner, using a hands-on approach that assumes no prior knowledge of the subject. The foundational ideas and techniques of data science are provided independently from technology, allowing students to easily develop a firm understanding of the subject without a strong technical background, as well as being presented with material that will have continual relevance even after tools and technologies change. Using popular data science tools such as Python and R, the book offers many examples of real-life applications, with practice ranging from small to big data. A suite of online material for both instructors and students provides a strong supplement to the book, including datasets, chapter slides, solutions, sample exams and curriculum suggestions. This entry-level textbook is ideally suited to readers from a range of disciplines wishing to build a practical, working knowledge of data science.

  • Almost everything in the book is accompanied with examples and practice – both in-chapter and end-of-chapter so students are more engaged because they can use hands-on experiences to see how theories relate to solving practical problems
  • Assumes no prior technical background or computing knowledge and lowers the barrier for entering the field of data science so that students from a range of disciplines can benefit from a more accessible introduction to data science
  • Supplemented by a generous set of material for instructors, including curriculum suggestions and syllabi, slides for each chapter, datasets, program scripts, answers and solutions to each exercise, as well as sample exams and projects which gives instructors end-to-end support for teaching a data science course