Our lab director’s new co-authored book has arrived!

Our lab director’s new co-authored book has arrived!

Abstract

Since user study design has been widely applied in search interactions and information retrieval (IR) systems evaluation studies, a deep reflection and meta-evaluation of interactive IR (IIR) user studies is critical for sharpening the instruments of IIR research and improving the reliability and validity of the conclusions drawn from IIR user studies. To this end, we developed a faceted framework for supporting user study design, reporting, and evaluation based on a systematic review of the state-of-the-art IIR research papers recently published in several top IR venues (n=462). Within the framework, we identify three major types of research focuses, extract and summarize facet values from specific cases, and highlight the under-reported user study components which may significantly affect the results of research. Then, we employ the faceted framework in evaluating a series of IIR user studies against their respective research questions and explain the roles and impacts of the underlying connections and “collaborations” among different facet values. Through bridging diverse combinations of facet values with the study design decisions made for addressing research problems, the faceted framework can shed light on IIR user study design, reporting, and evaluation practices and help students and young researchers design and assess their own studies.

Authors: Jiqun Liu, Rutgers University, Chirag Shah, Rutgers University

Get yours online at: http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=1418

An Integrated Model of Task, Information Needs, Sources and Uncertainty to Design Task-Aware Search Systems paper presented by Shawon Sarkar at the ICTIR conference

An Integrated Model of Task, Information Needs, Sources and Uncertainty to Design Task-Aware Search Systems paper presented by Shawon Sarkar at the ICTIR conference

The varieties of information seeking behavior encompass a range of practices and constructs such as the realization of an information need, selecting the nature of information, as well as information sources. Most of the past works have studied various constructs of the information seeking process, i.e., information, information need, and information sources individually. However, a person forms and re-forms his or her information seeking strategy based on continually shifting values of these dimensions associated with information seeking. This preliminary study conducted a survey with 15 search scenarios and multiple-choice characteristics completed by 114 Amazon’s Mechanical Turk workers to find out more about how these constructs play a role in people’s preferences regarding information seeking strategies. The study took an exploratory and inferential research approach to investigate how different forms of information and information needs might lead to different information sources by building binary classification models. The results show that the choice of sources can be predicted (with 80% accuracy) if the information need, representation, and form of information are apparent.

An Integrated Model of Task, Information Needs, Sources and Uncertainty to Design Task-Aware Search Systems
 Shawon Sarkar, Chirag Shah, ICTIR 2021

Shawon Sarkar successfully defended her dissertation proposal!

Shawon Sarkar successfully defended her dissertation proposal!

Congratulations to our PhD student, Shawon Sarkar for successfully defending her dissertation titled “An Integrated Model of Tasks and Uncertainties to Design Task-aware Intelligent Search Assistance”

Abstract

Search behaviors are generally motivated by some tasks that prompt users in search processes. Complex tasks often initiate lengthy, intermittently changing, interactive search processes with shifting goals at various search stages. At these different stages of the search, users’ search strategies are influenced by their search intentions, encountered problems, as well as knowledge states. However, search systems are primarily designed to optimize one request at a time, disregarding the underlying overarching task, shifting states of the task, or even the holistic nature of a search session. Although a set of descriptive and theoretical models of the search process can be found in the literature that characterizes tasks, there is a gap in research focused on leveraging dynamic task features in search ranking and recommendation processes. More importantly, there is a lack of support for users to complete their tasks in an adaptive, dynamic way across multiple devices and modalities. To address this issue, the proposed dissertation aims to develop new methods for constructing unified task representation using implicit search behavioral data and applying the task representation to improve existing search and recommendation systems and address emerging problems of conversational and interactive search. Specifically, this study creates a task-information need-strategy-problem map that can be leveraged to provide task-based support in various information formats (e.g., suggesting query, document, or people) to overcome problems and lead toward tasks completion. The main focus of this dissertation work is to develop task-aware search systems capable of understanding and extracting tasks and supporting user’s complex search task completion. Therefore, this research revolves around three broad objectives. First, developing a conceptual model for understanding how different types of tasks trigger particular information needs that may lead to different methods and strategies of seeking different forms of information, information sources, and various problems. Second, drawing knowledge from the first stage, developing computational models for extracting task states from users’ search behaviors. Third, leveraging the acquired task knowledge in designing scalable and efficient task-based proactive search systems to meet users’ task goals and provide relevant information in various formats (i.e., query, document, people).

Analyzing users’ perceptions of search engine biases and their satisfaction when the biases are regulated

Analyzing users’ perceptions of search engine biases and their satisfaction when the biases are regulated

In our survey study, we paired a real page from search engine Bing and a synthesized page with more diversities in the results (i.e. less biased). Both pages show the top-10 search items given search queries. We asked participants which one they prefer and why do they prefer the selected one. Statistical analyses revealed that overall, participants prefer the original Bing pages. Additionally, the location where the diversities are introduced is significantly associated with users’ preferences.

We found out that users prefer results that are more consistent and relevant to the search queries. Introducing diversities undermines the relevance of the search results and impairs users’ satisfaction to some degree. It was interesting to see that users tend to pay more attention to the top portion of the results rather than the bottom ones, which is consistent with some previous findings.

Han, B., Shah, C., & Saelid, D. (2021). Users’ Perception of Search-Engine Biases and Satisfaction. Second International workshop on algorithmic bias in search and recommendation (Bias 2021). April 1, 2021.

Interested in learning more? Read the full paper here: https://arxiv.org/abs/2105.02898

Taking a step towards fairness-aware ranking by defining latent groups using inferred features.

Taking a step towards fairness-aware ranking by defining latent groups using inferred features.

At the BIAS @ ECIR 2021 Workshop, our lab members continue to investigate the importance of fairness in search and recommendation that is increasingly drawing attention in recent years.

The paper explores how to define latent groups, which cannot be determined by self-contained features but must be inferred from external data sources, for fairness-aware ranking. In particular, taking the Semantic Scholar dataset released in TREC 2020 Fairness Ranking Track as a case study, we infer and extract multiple fairness-related dimensions of author identity including gender and location to construct groups.

Results

We propose a fairness-aware re-ranking algorithm incorporating both weighted relevance and diversity of returned items for given queries. Our experimental results demonstrate that different combinations of relative weights assigned to relevance, gender, and location groups perform as expected.

Future work

Due to inaccurate group classifications, for our future work, we propose to explore public personal locations, such as using Twitter profile locations.

Interested to learn more?

Read the full research paper here or watch the full presentation.

Lab members participated in the CHIIR 2021 Virtual Conference

Lab members participated in the CHIIR 2021 Virtual Conference

Several of the lab members participated in the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR) 2021 last week. CHIIR focuses on elements such as human involvement in search activities, and information seeking and use in context. Our lab director Dr. Shah is the Chair of the CHIIR Steering Committee. He presented a paper that was in collaboration with Microsoft Research (MSR) AI.

Several InfoSeeking students were also working as volunteers for the conference. “It was helpful to get feedback and questions from expert mentors in the field. As always, it was fun to be at CHIIR and seize opportunities to meet friends and colleagues. I had some productive discussions in front of the aquarium in the lobby at Gather Town”. Shawon Sarker PhD Student @ InfoSeeking Lab presented her dissertation “A context-independent representation of task” that aims to explicate task information from user behaviors and apply task knowledge to search and recommendations in order to support users to complete their tasks, especially complex tasks, across multiple devices.

For more information about CHIIR visit: https://acm-chiir.github.io/chiir2021/

iSchool’s unique approach to teaching data science by focusing on human values, transparency privacy and fairness

iSchool’s unique approach to teaching data science by focusing on human values, transparency privacy and fairness


When people talk about data science programs, what do you think of? Artificial intelligence, machine learning, or coding is probably the most popular answer for those outside the discipline. What if we told you there is more to that than what meets the eye. At iSchool, we continue to empower students in understanding the implications of using such a powerful tool. With our unique approach, we have designed a program that incorporates data science through a human-centered lens, promoting solutions that are socially responsible and understanding where the potential solutions get implemented. We strive to embody students to focus on human values such as privacy, human rights, and ethics while working on data problems. This means asking not just what technology could do, but also what it should do. And this means acknowledging and addressing the individuals, organizations, and communities behind the production and consumption of data and technology. At iSchool, a small group of iSchool faculty- the iSchool Data Science Curriculum Committee (iDSCC) continues to create a more cohesive and comprehensive program. In doing so, the iSchools are paving a path for DS that can create informative, insightful, and impactful solutions for the whole of humanity for generations to come. We believe foregrounding human needs and business understanding in this way not only will lead to more ethical data science practices but also a more successful (and profitable) outcome.

Our lab director Dr. Shah wrote an article with his international collaborators on what it is to do and teach data science in an iSchool.

Shah, C., Anderson, T., Hagen, L., & Zhang, Y. An iSchool approach to data science: Human-centered, socially responsible, and context-driven. Journal of the Association for Information Science and Technology (JASIST).

Read the full article here.

Developing a search system that knows what you are looking for before you do.

Developing a search system that knows what you are looking for before you do.

Have you ever searched to plan a trip, a wedding, job hunting, or your next apartment? This kind of search can take hours, days, or even weeks. Inevitably, it would get interrupted by our daily life routine. The interrupted events can be a break for coffees, hopping into the restroom, dining, or sleeping. Therefore, doing the search would require us to pick up where we left off. These kinds of searches are called “Interrupted Search Tasks”. 

We, as well as many other scientists, are working on tackling this problem. Our approach is to try to identify and predict the sub-tasks of complex search tasks. Based on that, we provide solutions to easily complete the tasks. For example, planning a wedding. You need different information i.e., food, dress, venue. Maybe, in the search process, you forget about the food which is a subtask of wedding planning. The system proactively gives suggestions for food. 

And how do we know when to suggest these things to you? First, we try to identify whether or not you are encountering problems during a search. We found that the longer people take at the search result page the higher chance they are having problems. Making this more illustrative, imagine a person searching “Churches in Seattle”, they took a long time on the research result page, and without clicking through any of the results, the person inputs another search query, “places for a wedding”, and so on. The more queries the person puts in without clicking through any pages reflect the likelihood they are encountering problems. However, if the person interacts with the result page, i.e. click the see inside the page, we would look at the number of pages that the person bookmarked. The more subsequence pages got bookmarked, the more relevant results the person found and the fewer problems they encountered. This is how we can tell whether people can find what they are looking for. If we see you are having problems, we will recommend things that you might miss out.

So how do we know what things you missed out? In other words, how do we know that “food”, “dress”, “venue” is related to planning a wedding? We use what people have searched for in the past. The higher frequency the 2 topics are searched together, the stronger the relationship. Let’s say 1000 people searched for “wedding” along with “dress food” vs. 5 people searched for “wedding” along with “black dress”. We can tell that “dress food” has a stronger relationship to the topic “wedding” but not so much with “Black dress”. Therefore, we can recommend “dress food” when the next person searches for “wedding.”

If you are interested to know more detail about this topic. Here is the paper that we published recently, “Identifying and Predicting the States of Complex Search Tasks”.

Challenging the status quo in search engine ranking algorithms

Challenging the status quo in search engine ranking algorithms

How can we bring more fairness to search result ranking? This was the question tackled by our FATE (Fairness Accountability Transparency Ethics) group in the 2020 Text REtrieval Conference’s (TREC) Fairness Ranking Track. In the context of searching for academic papers, the assigned goal of the track was the goal was to develop an algorithm that provides fair exposure to different groups of authors while ensuring that the papers are relevant to the search queries. 

The Approach

To achieve that goal, the group decided to use “gender” and “country” as key attributes because they were general enough to be applied to all author groups. From there, the group created an  fairness-aware algorithm that was used to run two specific tasks: 

  1. An information retrieval task where the goal was to return a ranked list of papers to serve as the candidate papers for re-ranking
  2. Re-ranking task where the goal was to rank the candidate papers based on the relevance to a given query, while accounting for fair author group exposure

To evaluate the relevance of the academic papers, the group relied on BM25, which is an algorithm frequently used by search engines.

The Findings

By randomly shuffling the academic papers, the result was high levels of fairness if only the gender of the authors was considered. In contrast, if only the country of the authors was  considered, fairness was relatively lower. With the proposed algorithm, data can be re-ranked based on an arbitrary number of group definitions. However, to fully provide fair and relevant results, more attributes need to be explored. 

Why is fairness in search rankings important?

We use search engines everyday to find out information and answers for almost everything in our lives. And the ranking of the search results determine what kind of content we are likely to consume. This poses a risk because ranking algorithms often leave out the underrepresented groups, whether it’s a small business, or a research lab that is not established yet. At the same time, the results tend to only show information we like to see or agree with, which could lack diversity and contribute to bias. 

Interested in learning more? Check out the full research paper here: https://arxiv.org/pdf/2011.02066.pdf 

Soumik Mandal successfully defends his dissertation.

Soumik Mandal successfully defends his dissertation.

Soumik Mandal, Ph.D. student

Our Ph.D. student, Soumik Mandal, has successfully defended his dissertation titled “Clarifying user’s information need in conversational information retrieval”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Katya Ognyanova (Rutgers University), and Michel Galley (Microsoft).

Abstract

With traditional information retrieval systems users are expected to express their information need adequately and accurately to get appropriate response from the system. This set up works generally well for simple tasks, however, in complex task scenarios users face difficulties in expressing information need as accurately as needed by the system. Therefore, the case of clarifying user’s information need arises. In current search engines, support in such cases is provided in the form of query suggestion or query recommendation.  However, in conversational information retrieval systems the interaction between the user and the system happens in the form of dialogue. Thus it is possible for the system to better support such cases by asking clarifying questions. However, current research in both natural language processing and information retrieval systems does not adequately explain how to form such questions and at what stage of dialog clarifying questions should be asked of the user. To address this gap, this proposed research will investigate the nature of conversation between user and expert intermediary to model the functions the expert performs to address the user’s information need. More specifically this study will explore the way intermediary can ask questions to user to clarify his information need in complex task scenarios.