Browsed by
Category: News

Soumik Mandal successfully defends his dissertation.

Soumik Mandal successfully defends his dissertation.

Soumik Mandal, Ph.D. student

Our Ph.D. student, Soumik Mandal, has successfully defended his dissertation titled “Clarifying user’s information need in conversational information retrieval”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Katya Ognyanova (Rutgers University), and Michel Galley (Microsoft).


With traditional information retrieval systems users are expected to express their information need adequately and accurately to get appropriate response from the system. This set up works generally well for simple tasks, however, in complex task scenarios users face difficulties in expressing information need as accurately as needed by the system. Therefore, the case of clarifying user’s information need arises. In current search engines, support in such cases is provided in the form of query suggestion or query recommendation.  However, in conversational information retrieval systems the interaction between the user and the system happens in the form of dialogue. Thus it is possible for the system to better support such cases by asking clarifying questions. However, current research in both natural language processing and information retrieval systems does not adequately explain how to form such questions and at what stage of dialog clarifying questions should be asked of the user. To address this gap, this proposed research will investigate the nature of conversation between user and expert intermediary to model the functions the expert performs to address the user’s information need. More specifically this study will explore the way intermediary can ask questions to user to clarify his information need in complex task scenarios.

InfoSeeking’s 10th Birthday

InfoSeeking’s 10th Birthday

Looking back

In the fall of 2010, we started as a reading group for people who would come together to read papers on topics of information seeking/retrieval/behavior every week. The group was called “Information seeking and behavior group”. Dr. Chirag Shah has been leading the group from the beginning. 

Quickly the reading group became a research group as students and faculty started identifying projects that interested them and pulled resources to design studies and experiments.

In that same fall, as the group started getting traction and attracting more students, resources, and funding, we became InfoSeeking Lab.

In the beginning, the lab focused on issues of information seeking/retrieval and social media. As new members and interests were added, the lab explored many more areas, including wearable sensors, collaborative work, online communities, and conversational systems.

The methods for research also evolved from user studies to large-scale log analysis, and from ethnographic approaches to deep learning models.

Our achievements

We have been pushing forward the knowledge in information seeking/retrieval and other related topics in the Information and Data Sciences field for 10 years. 

Throughout those years, the lab has received more than 4 million dollars in grants and gifts from federal and state agencies as well as private organizations. 

So far, the lab has produced 13 excellent PhD students and countless undergraduate and master students to drive new ideas and innovations into the Data Sciences field. Our alumni have gone to major universities around the world and reputable companies like Dropbox, eBay, Google, Sony, and TD Bank.

Some of the lab’s early works laid the foundation for collaborative and social work by people from all walks of life. One of the outcomes was a system called Coagmento, which was extensively tested with and deployed in classrooms. When it was used in a NY-based highschool, the teachers, for the first time, found that they could gain valuable insights into their students’ work and help them in ways not possible before using our system.

We have been at the forefront of developing new methodologies, tools, and solutions. We were one of the first to use the escape room as a method to understand how people seek information and solve problems.

We have been and are going to continue contributing to the community. The lab worked closely with the United Nations Data Analytics group to address several of the UN’s Sustainable Development Goals (SDGs). As a result of the collaboration, the lab launched Science for Social Good (S4SG). All of our works build into SDGs’ goals.

We have also worked with several private foundations and startups over the years to solve real-world problems. One example is our collaboration with Brainly, a startup from Poland that focused on educational Q&A. With them, we worked on problems of assessing the quality of the content as well as detecting users with certain characteristics, such as those exhibiting struggle. The solutions to these problems are extremely useful in education.

Looking back at the last 10 years and how glorious they have been, we are confident that the next decade will be even more amazing.

People can’t identify COVID-19 fake news

People can’t identify COVID-19 fake news

A recent study conducted by our lab, InfoSeeking Lab at the University of Washington, Seattle shows that people can’t spot COVID-19 fake news in search results.

The study was done by having people choosing between 2 sets of top 10 search results. One is direct from Google and another has been manipulated by putting one or two fake news results in the list. 

This is a continuing study from prior experiments from the Lab in a similar setting but with random information manipulated in the top 10 search results. The outcomes are all in the same directions, people can’t tell which search results are being manipulated.

“This means that I am able to sell this whole package of recommendations with a couple of bad things in it without you ever even noticing. Those bad things can be misinformation or whatever hidden agenda that I have”, said Chirag Shah, InfoSeeking Lab Director and Associate Professor at the University of Washington. 

This brought up very important problems that people don’t pay attention to. They believe that what they see is true because it comes from Google, Amazon, or some other system they use daily. Especially in prime positions like the first 10 results as multiple studies show that more than 90% of searchers’ clicks concentrate on the first page. This means any manipulated information that is able to get into Google’s first page of search results is now being perceived as true.

In the current situation, people are worried about uncertainty. A lot of us seek updates about the situation daily. Google is the top search engine that we turn to. People need trustworthy information; however, there are many who are taking advantage of people’s fear and spreading misinformation for their own agenda. What would happen if the next fake news said that there is a new finding that the virus has mutated with an 80% fatal rate, what would it do to our community? Would people start to usurp for food? Would people wearing a mask in the public be attacked? Would you be able to spot the fake news? The lab is continuing to explore these critical issues of public importance through their research work on FATE (Fairness Accountability Transparency Ethics).

For this finding, InfoSeeking researchers analyzed more than 10,000 answers on both random and fake information manipulated in the list, involving more than 500 English-speaking people around the U.S.

FATE Research Group: From Why to What and How

FATE Research Group: From Why to What and How

When the public started getting access to the internet, search engines became common in daily usage. Services such as Yahoo, AltaVista, and Google were used to satisfy people’s curiosity. Although it was not comfortable using search engines because users had to go back and forth between all the search engines, it seemed like magic that users could get so much information in a very short time. At that time, users started using search engines without any previous training. Before search engines became popular, the public generally found information in libraries by reading the library catalog or asking a librarian for help. In contrast, typing a few keywords is enough to find answers on the internet. Not only that, but search engines have been continually developing their own algorithms and giving us great features, such as knowledge bases that enhance their search engine results with information gathered from various sources.

Soon enough, Google became the first choice for many people due to its accuracy and high-quality results. As a result, other search engines got dominated by Google. However, while Google results are high-quality, those results are biased. According to a recent study, the top web search results from search engines are typically shown to be biased. Some of the results on the first page are made to be there just to capture users’ attention. At the same time, users tend to click mostly on results that appear on the first page. The study gives an example about a normal topic: coffee and health. In the first 20 results, there are 17 results about the health benefits, while only 3 results mentioned the harms.

This problem led our team at the InfoSeeking Lab to start a new project known as Fairness, Accountability, Transparency, Ethics (FATE). In this project, we have been exploring ways to balance the inherent bias found in search engines and fulfill a sense of fair representation while effectively maintaining a high degree of utility.

We started this experiment with one big goal, which is to improve fairness. For that, we designed a new system that shows two sets of results, both of which are very similar to Google’s dashboard. (as illustrated by picture below).  We have collected 100 queries and top 100 results per query from Google in general topics such as sports, food, travel, etc. One of these sets is obtained from Google. The other one is generated through an algorithm that reduces bias. The system has 20 rounds. The system gives a user 30 seconds on each round to choose the set they prefer.

For this experiment, we asked around 300 participants to participate. The goal is to see if participants can notice a difference between our algorithms and Google. The early results show that participants preferred our algorithms more than Google. However, we will discuss more in detail as soon as we finish the analysis process. Furthermore, we are in the process of writing a technical paper and an academic article.

Also, we have designed a game that looks very similar to our system. This game tests the ability to notice bad results. It gives you a score and some advice. In this game, users can also challenge their friends or members of their families. To try this game, click here

For many years, the InfoSeeking Lab has worked on issues related to information retrieval, information behavior, data science, social media, and human-computer interaction. Visit the InfoSeeking Lab website to know more about our projects

For more information about the experiment visit FATE project website

Jonathan Pulliza successfully defends his dissertation

Jonathan Pulliza successfully defends his dissertation

Jonathan Pulliza, Ph.D. student

Our Ph.D. student, Jonathan Pulliza, has successfully defended his dissertation titled titled “Let the Robot Do It For Me: Assessing Voice As a Modality for Visual Analytics for Novice Users”. The committee included  Chirag Shah (University of Washington, Chair), Nina Wacholder (Rutgers University), Mark Aakhus (Rutgers University), and Melanie Tory (Tableau).

Pulliza’s study focuses on understanding how the voice system facilitates novice users in Visual Analytics (VA). He found that participants chose to use the voice system because of its convenience, ability to get a quick start on their work, and better access to some functions that they could not find in the traditional screen interface. Participants refrained from choosing voice because of their previous experiences. They felt that using the voice system would not provide then all access to the more complicated VA system. They then often chose to struggle with the visual interface instead of using the voice system for assistance.


The growth of Visual Analytics (VA) systems has been driven by the need to explore and understand large datasets across many domains. Applications such as Tableau were developed with the goal of better supporting novice users to generate data visualizations and complete their tasks. However, novice users still face many challenges in using VA systems, especially in complex tasks outside of simple trend identification, such as exploratory tasks. Many of the issues stem from the novice users’ inability to reconcile their questions or representations of the data with the visualizations presented using the interactions provided by the system.

With the improvement in natural language processing technology and the increased prevalence of voice interfaces, there is a renewed interest in developing voice interactions for VA systems. The goal is to enable users to ask questions directly to the system or to indicate specific actions using natural language, which may better facilitate access to functions available in the VA system. Previous approaches have tended to build systems in a screen-only environment in order to encourage interaction through voice. Though they did produce significant results and guidance for the technical challenges of voice in VA, it is important to understand how the use of a voice system would affect novice users within their most common context instead of moving them into new environments. It is also important to understand when a novice user would choose to use a voice modality when the traditional keyboard and mouse modality is also available.

This study is an attempt to understand the circumstances under which novice users of a VA system would choose to interact with using their voice in a traditional desktop environment, and whether the voice system better facilitates access to available functionalities. Given the users choose the voice system, do they choose different functions than those with only a keyboard and a mouse? Using a Wizard of Oz set up in the place of an automated voice system, we find that the participants chose to use the voice system because of its convenience, ability to get a quick start on their work, and in some situations where they could not find a specific function in the interface. Overall function choices were not found to be significantly different between those who had access to the voice system versus those who did not, though there were a few cases where participants were able to access less common functions compared to a control group. Participants refrained from choosing voice because their previous experiences with voice systems had led them to believe all voice systems were not capable of addressing their task needs. They also felt using the voice system was incongruent with gaining mastery of the underlying VA system, as the convenience of using the voice system could lead to its use as a crutch. Participants then often chose to struggle with the visual interface instead of using the voice system for assistance. In this way, they prioritized building a better mental model of the system over building a better sense of the data set and accomplishing the task.

Manasa Rath successfully defends her dissertation

Manasa Rath successfully defends her dissertation

Manasa Rath, Ph.D. student

Our Ph.D. student, Manasa Rath, has successfully defended her dissertation titled “Assessing the quality of user-generated content in the presence of automated quality scores”. The committee included  Chirag Shah (University of Washington, Chair), Vivek Singh (Rutgers University), Kaitlin Costello (Rutgers University), and Sanda Erdelez (Simmons University).

Students seek online crowdsource to fulfill their academic course requirements

Manasa investigated the quality of those user-generated content whether it is correct, credible, clear, and complete using her developed framework. The framework has been validated with multiple experts in the field. She then generated an automation to score the content accordingly and conduct a user study on 45 undergraduate students to see how and to what extent do users considered the role of quality while completing the task provided.


With the proliferation in participatory web culture, individuals not only create but also consume content present in crowdsourced environments such as blogs, question-answering systems. Studies have revealed users often employ the content present in them to make a range of choices, from issues in everyday life to important life decisions, without paying much attention to the quality. In the recent years, studies have demonstrated K-12 students’ over reliance on these crowdsourced sites to fulfill their academic course requirements. Therefore, it is important to evaluate users’ cognizance while evaluating the quality of content in these venues. But before identifying to what extent do users make use of quality while evaluating user-generated content, it is important to learn what constitutes of quality. To address these issues, this dissertation expounds to the problems in a three-step process. First, the dissertation begins by developing a conceptual framework for evaluating quality of user-generated content consisting of constructs such as correct, credible, clear, and complete. The second step involves validating the framework with the help of twelve experts i.e. librarians to attest the developed framework and using this validation to come to generate automated methodologies to evaluate the quality of content to provide quality scores. To further investigate, the third step delves deeper into users’ reliance on the automated quality scores by conducting a user study. 45 undergraduate students were recruited to investigate their use of automated quality scores while completing their tasks under three conditions – users provided with genuine quality scores, users provided with manipulated quality scores, and users provided with no quality scores (control). As prior research has indicated users task completion is dependent on the task-type provided, this user study involves providing users with different task types such as ranking and synthesis in the presence of quality scores. To further comprehend users’ use of quality scores while completing the tasks, the author makes use of eye-tracking metrics such as total number of gazes, gaze duration, and the number of gazes on the quality scores to evaluate their use by the users. Analyses was performed with the help of fixation data, users’ responses to pre and post task questionnaires, along with task scenarios along with interview transcripts. ANOVA and other statistical analyses were conducted and no statistical differences were found between users’ use of quality scores and the type of task. It was also found that users primarily considered the constructs – correct and credible from the charts from the qualitative data. Additionally, it was also found that users made use of the quality scores primarily when they had little familiarity with the topic. The study provided insights into how and to what extent do users considered the role of quality while completing the task provided. The contribution of this human information behavior study is of twofold: users’ reliance on an automated score provided by an intelligent tool along with studying users’ confidence in considering quality in evaluating user-generated content.

Ruoyuan Gao successfully defends her dissertation

Ruoyuan Gao successfully defends her dissertation

Ruoyuan Gao, Ph.D. student

Our Ph.D. student, Ruoyuan Gao, has successfully defended her dissertation titled “Toward a Fairer Information Retrieval System”. The committee included  Chirag Shah (University of Washington, Chair), Yongfeng Zhang (Rutgers University), Gerard de Melo (Rutgers University), and Fernando Diaz (Microsoft).

Ruoyuan investigated the existing bias presented in search engine results to understand the relationship between relevance and fairness in the results. She developed frameworks that could effectively identify the fairness and relevance in a data set. She also proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty, and relevance. With this matric, she developed algorithms that optimized both diversity fairness and relevance for search results.


With the increasing popularity and social influence of information retrieval (IR) systems, various studies have raised concerns on the presence of bias in IR and the social responsibilities of IR systems. Techniques for addressing these issues can be classified into pre-processing, in-processing and post-processing. Pre-processing reduces bias in the data that is fed into the machine learning models. In-processing encodes the fairness constraints as a part of the objective function or learning process. Post-processing operates as a top layer over the trained model to reduce the presentation bias exposed to users. This dissertation explored ways to bring the pre-processing and post-processing approaches, together with the fairness-aware evaluation metrics, into a unified frame- work as an attempt to break the vicious cycle of bias.

We first investigated the existing bias presented in search engine results. Specifically, we focused on the top-k fairness ranking in terms of statistical parity fairness and disparate impact fairness definitions. With Google search and a general purpose text cluster as a lens, we explored several topical diversity fairness ranking strategies to understand the relationship between relevance and fairness in search results. Our experimental results show that different fairness ranking strategies result in distinct utility scores and may perform differently with distinct datasets. Second, to further investigate the relationship of data and fairness algorithms, we developed a statistical framework that was able to facilitate various analysis and decision making. Our framework could effectively and efficiently estimate the domain of data and solution space. We derived theoretical expressions to identify the fairness and relevance bounds for data of different distributions, and applied them to both synthetic datasets and real world datasets. We presented a series of use cases to demonstrate how our framework was applied to associate data and provide insights to fairness optimization problems. Third, we proposed an evaluation metric for the ranking results that encoded fairness, diversity, novelty and relevance. This metric offered a new perspective of evaluating fairness-aware ranking results. Based on this metric, we developed effective ranking algorithms that optimized for diversity fairness and relevance at the same time. Our experiments showed that our algorithms were able to capture multiple aspects of the ranking and optimize the proposed fairness-aware metric.

Yiwei Wang successfully defends her dissertation

Yiwei Wang successfully defends her dissertation

Yiwei Wang, Ph.D. student

Our Ph.D. student, Yiwei Wang, has successfully defended her dissertation titled “Authentic vs. Synthetic: A Comparison of Different Methods for Studying Task-based Information Seeking”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Kaitlin Costello (Rutgers University), and Diane Kelly (University of Tennessee Knoxville).


In task-based information seeking research, researchers often collect data about users’ online behaviors to predict task characteristics and personalize information for users. User behavior may be directly influenced by the environment in which a study is conducted, and the tasks used. This dissertation investigates the impact of study setting and task authenticity on users’ searching behaviors, perceived task characteristics, and search experiences. Thirty-six undergraduate participants finished one lab session and one remote session in which they completed one authentic and one simulated task. The findings demonstrate that the synthetic lab setting and simulated tasks had significant influences mostly on behaviors related to content pages, such as page dwell time and number of pages visited per task. Meanwhile, first-query behaviors were less affected than whole-session behaviors, indicating the reliability of using first-query behaviors in task prediction. Subjective task characteristics—such as task motivation and importance—also varied in different settings and tasks. Qualitative interviews reveal why users were influenced. This dissertation addresses methodological limitations in existing research and provides new insights and implications for researchers who collect online user search behavioral data.

Souvick Ghosh successfully defends his dissertation

Souvick Ghosh successfully defends his dissertation

Souvick Ghosh, Ph.D. student

Our Ph.D. student, Souvick Ghosh, has successfully defended his dissertation titled “Exploring Intelligent Functionalities of Spoken Conversational Search Systems”. The committee included Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Katya Ognyanova (Rutgers University), and Vanessa Murdock (Amazon).


Conversational search systems often fail to recognize the information need of the user, especially for exploratory and complex tasks where the question is non-factoid in nature. In any conversational search environment, spoken dialogues by the user communicate the search intent and the information need of the user to the system. In response, the system performs specific, expected search actions. This is a domain-specific natural language understanding problem where the agent must understand the user’s utterances and act accordingly. Prior literature in intelligent systems suggests that in a conversational search environment, spoken dialogues communicate the search intent and the information need of the user. The meaning of these spoken utterances can be deciphered by accurately identifying the speech or dialogue acts associated with them. However, only a few studies in the information retrieval community have explored automatic classification of speech acts in conversational search systems, and this creates a research gap. Also, during spoken search, the user rarely has control over the search process as the actions of the system are hidden from the user. This eliminates the possibility of correcting the course of search (from the user’s perspectives) and raises concern about the quality of the search and the reliability of the results presented. Previous research in human-computer interaction suggests that the system should facilitate user-system communication by explaining its understanding of the user’s information problem and the search context (referred to as the system’s model of the user). Such explanations could include the system’s understanding of the search on an abstract level and the description of the search process undertaken (queries and information sources used) on a functional level. While these interactions could potentially help the user and the agent to understand each other better, it is essential to evaluate if explicit clarifications are necessary and desired by the user.

We have conducted a within-subjects Wizard-of-Oz user study to evaluate user satisfaction and preferences in systems with and without explicit clarifications. However, the results of the Wilcoxon Signed Rank Test showed that the use of explicit system-level clarifications produced no positive effect on the user’s search experience. We have also built a simple but effective Multi-channel Deep Speech Classifier (MDSC) to predict speech acts and search actions in an information-seeking dialogue. The results highlight that the best performing model predicts speech acts with 90.2% and 73.2% for CONVEX and SCS datasets respectively. For search actions, the highest reported accuracy was 63.7% and 63.3% for CONVEX and SCS datasets respectively. Overall, for speech act prediction, MSDC outperforms all the traditional classification models by a large margin and shows improvements of 54.4% for CONVEX and 18.3% over the nearest baseline for SCS. For search actions, the improvements were 32.3% and 2.2% over the closest machine learning baselines. The results of ablation analysis indicate that the best performance is achieved using all the three channels for speech act prediction and metadata features only when predicting search actions. Individually, metadata features were most important, followed by lexical and syntactic features.

In this dissertation, we provide insights on two intelligent functionalities which are expected of conversational search systems: (i) how to better understand the natural language utterances of the user, in an information-seeking conversation; and (ii) if explicit clarifications or explanations from the system will improve the user-agent interaction during the search session. The observations and recommendations from this study will inform the future design and development of spoken conversational systems.

Jiqun Liu successfully defends his dissertation

Jiqun Liu successfully defends his dissertation

Jiqun Liu, Ph.D. student

Our Ph.D. student, Jiqun Liu, has successfully defended his dissertation titled “A State-Based Approach to Supporting Users in Complex Search Tasks”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Kaitlin Costello (Rutgers University), and Dan Russell (Google).

Liu’s study focuses on understanding the multi-round search processes of complex search tasks by using computational models of interactive IR and develop personalized recommendations to support task completion and search satisfaction. From the study, the team built a search recommendation model based on Q-learning algorithm. The results demonstrated that the simulated search episodes can improve search efficiency to many extents.


Previous work on task-based interactive information retrieval (IR) has mainly focused on what users found along the search process and the predefined, static aspects of complex search tasks (e.g., task goal, task product, cognitive task complexity), rather than how complex search tasks of different types can be better understood, examined, and disambiguated within the associated multi-round search processes. Also, it is believed that the knowledge about users’ cognitive variations in task-based search process can help tailor search paths and experiences to support task completion and search satisfaction. To adaptively support users engaging in complex search tasks, it is critical to connect theoretical, descriptive frameworks of search process with computational models of interactive IR and develop personalized recommendations for users according to their task states. Based on the data collected from two laboratory user studies, in this dissertation we sought to understand the states and state transition patterns in complex search tasks of different types and predict the identified task states using Machine Learning (ML) classifiers built upon observable search behavioral features. Moreover, through running Q-learning-based simulation of adaptive search recommendations, we also explored how the state-based framework could be applied in building computational models and supporting users with timely recommendations.

Based on the results from the dissertation study, we identified four intention-based task states and six problem-help-based task states, which depict the active, planned dimension and situational, unanticipated dimension of search tasks respectively. We also found that 1) task state transition patterns as features extracted from interaction process could be useful for disambiguating different types of search tasks; 2) the implicit task states can be inferred and predicted using behavioral-feature-based ML classifiers. With respect to application, we built a search recommendation model based on Q-learning algorithm and the knowledge we learned about task states. Then we apply the model in simulating search sessions consisting of potentially useful query segments with high rewards from different users. Our results demonstrated that the simulated search episodes can improve search efficiency to varying extents in different task scenarios. However, in many task contexts, this improvement often comes with the price of hurting the diversity and fairness in information coverage.

This dissertation presents a comprehensive study on state-based approach to understanding and supporting complex search tasks: from task state and state transition pattern identification, task state prediction, all the way to the application of computational state-based model in simulating dynamic search recommendations. Our process-oriented, state-based framework can be further extended with studies in a variety of contexts (e.g., multi-session search, collaborative search, conversational search) and deeper knowledge about users’ cognitive limits and search decision-making.