Tackling Complex Search Tasks

Tackling Complex Search Tasks

To tackle the complex search tasks, we try to identify task’ states and study the connection between states and search behaviors. We found that the task state can be predicted from the user’s search behaviors. Read about our study in this article by Jiqun Liu, Shawon Sarkar, and Chirag Shah:

Liu, J., Sarkar, S., & Shah, C. (2020, March). Identifying and Predicting the States of Complex Search Tasks. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval (pp. 193-202).

Complex search tasks that involve uncertain solution space and multi-round search iterations are integral to everyday life and information-intensive workplace practices, affecting how people learn, work, and resolve problematic situations. However, current search systems still face plenty of challenges when applied in supporting users engaging in complex search tasks. To address this issue, we seek to explore the dynamic nature of complex search tasks from process-oriented perspective by identifying and predicting implicit task states. Specifically, based upon the Web search logs and user annotation data (regarding information seeking intentions in local search steps, in-situ search problems, and help needed) collected from 132 search sessions in two controlled lab studies, we developed two task state frameworks based on intention state and problem-help state respectively and examined the connection between task states and search behaviors. We report that (1) complex search tasks of different types can be deconstructed and disambiguated based on the associated nonlinear state transition patterns; and (2) the identified task states that cover multiple subtle factors of user cognition can be predicted from search behavioral signals using supervised learning algorithms. This study reveals the way in which complex search tasks are unfolded and manifested in users’ search interactions and paves the way for developing state-aware adaptive search supports and system evaluation frameworks.

Creating a fairer search engine

Creating a fairer search engine

It’s getting increasingly more important to understand, evaluate, and perhaps rethink our search results as they continue to show bias of various kinds. Given that so much of our decision-making relies on search engine results, this is a problem that touches almost all aspects of our lives. Read about some of our new works in a new article by InfoSeekers Ruoyuan Gao and Chirag Shah:

Gao, R. & Shah, C. (2020). Toward Creating a Fairer Ranking in Search Engine Results. Journal of Information Processing and Management (IP&M), 57 (1).

With the increasing popularity and social influence of search engines in IR, various studies have raised concerns on the presence of bias in search engines and the social responsibilities of IR systems. As an essential component of search engine, ranking is a crucial mechanism in presenting the search results or recommending items in a fair fashion. In this article, we focus on the top-k diversity fairness ranking in terms of statistical parity fairness and disparate impact fairness. The former fairness definition provides a balanced overview of search results where the number of documents from different groups are equal; The latter enables a realistic overview where the proportion of documents from different groups reflect the overall proportion. Using 100 queries and top 100 results per query from Google as the data, we first demonstrate how topical diversity bias is present in the top web search results. Then, with our proposed entropy-based metrics for measuring the degree of bias, we reveal that the top search results are unbalanced and disproportionate to their overall diversity distribution. We explore several fairness ranking strategies to investigate the relationship between fairness, diversity, novelty and relevance. Our experimental results show that using a variant of fair ε-greedy strategy, we could bring more fairness and enhance diversity in search results without a cost of relevance. In fact, we can improve the relevance and diversity by introducing the diversity fairness. Additional experiments with TREC datasets containing 50 queries demonstrate the robustness of our proposed strategies and our findings on the impact of fairness. We present a series of correlation analysis on the amount of fairness and diversity, showing that statistical parity fairness highly correlates with diversity while disparate impact fairness does not. This provides clear and tangible implications for future works where one would want to balance fairness, diversity and relevance in search results.

Connecting information need to recommendations

Connecting information need to recommendations

A new article published by InfoSeekers Shawon Sarkar, Matt Mitsui, Jiqun Liu, and Chirag Shah in the Journal of Information Processing and Management (IP&M), shows how we could use behavioral signals from a user in a search episode to explicate their information need, their perceived problems, and the potential help they may need.

Here are some highlights.

  • The amount of time spent on previous search results could be an indicator of potential problems in articulation of needs into queries, perceiving useless results, and not getting useful sources in the following search stage in an information search process.
  • While performing social tasks, users mostly searched with an entirely new query, whereas, for cognitive and moderate to high complexity tasks, users used both new and substituted queries as well.
  • From users’ search behaviors, it is possible to predict the potential problem that they are going to face in the future.
  • User’s search behaviors can map an information searcher’s situational need, along with his/her perception of barriers and helps in different stages of an information search process.
  • By combining perceived problem(s) and search behavioral features, it is possible to infer users’ needed help(s) in search with a certain level of accuracy (78%).

Read more about it at https://www.sciencedirect.com/science/article/pii/S0306457319300457

A new NSF grant for explainable recommendations

A new NSF grant for explainable recommendations

Dr. Yongfeng Zhang from Rutgers University and Dr. Chirag Shah from University of Washington are recipients of a new grant from NSF (3 years, $500k) to work on explainable recommendations. It’s a step toward curing the “runaway AI”!

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1910154

Recommendation systems are essential components of our daily life. Today, intelligent recommendation systems are used in many Web-based systems. These systems provide personalized information to help human decisions. Leading examples include e-commerce recommendations for everyday shopping, job recommendations for employment markets, and social recommendations to make people better connected. However, most recommendation systems merely suggest recommendations to users. They rarely tell users why such recommendations are provided. This is primarily due to the closed nature algorithms behind the systems that are difficult to explain. The lack of good explainability sacrifices transparency, effectiveness, persuasiveness, and trustworthiness of recommendation systems. This research will allow for personalized recommendations to be provided in more explainable manners, improving search performance and transparency. The research will benefit users in real systems through researchers? industry collaboration with e-commerce and social networks. New algorithms and datasets developed in the project will supplement courses in computer science and iSchool programs. Presentation of the work and demos will help to engage with wider audiences that are interested in computational research. Ultimately, the project will make it easier for humans to understand and trust the machine decisions.

This project will explore a new framework for explainable recommendation that involves both system designers and end users. The system designers will benefit from structured explanations that are generated for model diagnostics. The end users will benefit from receiving natural language explanations for various algorithmic decisions. This project will address three fundamental research challenges. First, it will create new machine learning methods for explainable decision making. Second, it will develop new models to generate free-text natural language explanations. Third, it will identify key factors to evaluate the quality of explanations. In the process, the project will also develop aggregated explainability measures and release evaluation benchmarks to support reproducible explainable recommendation research. The project will result in the dissemination of shared data and benchmarks to the Information Retrieval, Data Mining, Recommender System, and broader AI communities.

It’s a new chapter for us – at UW in Seattle

It’s a new chapter for us – at UW in Seattle

It’s been a bit quiet on iBlog lately and there is a good reason. The lab, along with me, has moved from Rutgers University in NJ to University of Washington (UW) in Seattle. This happened over the end of the summer and the beginning of the fall. Things were so chaotic at the time that we even missed celebrating or noticing 9 years of the lab!

This transition is still in progress. Most of the PhD students are still in NJ, but new students and projects are starting up with the lab in Seattle. Over the course of the next few weeks and months, we will be bringing more updates to our websites and social media channels.

It is a new chapter for us, indeed, but the journey goes on. We are still seekers!

InfoSeekers Publish an Article and an ICTIR Paper Acceptance

InfoSeekers Publish an Article and an ICTIR Paper Acceptance

First, we must congratulate InfoSeeker, Ruoyuan Gao for having her paper accepted at ICTIR 2019!

Next, we are excited to share the news that our InfoSeekers, Shawon Sarkar, Matthew Mitsui, Jiqun Liu and Chirag Shah, have published a new article! The title of the article is: Implicit information need as explicit problems, help, and behavioral signals.

ABSTRACT

Information need is one of the most fundamental aspects of information seeking, which traditionally conceptualizes as the initiation phase of an individual’s information seeking behavior. However, the very elusive and inexpressible nature of information need makes it hard to elicit from the information seeker or to extract through an automated process. One approach to understanding how a person realizes and expresses information need is to observe their seeking behaviors, to engage processes with information retrieval systems, and to focus on situated performative actions. Using Dervin’s Sense-Making theory and conceptualization of information need based on existing studies, the work reported here tries to understand and explore the concept of information need from a fresh methodological perspective by examining users’ perceived barriers and desired helps in different stages of information search episodes through the analyses of various implicit and explicit user search behaviors. In a controlled lab study, each participant performed three simulated online information search tasks. Participants’ implicit behaviors were collected through search logs, and explicit feedback was elicited through pre-task and post-task questionnaires. A total of 208 query segments were logged, along with users’ annotations on perceived problems and help. Data collected from the study was analyzed by applying both quantitative and qualitative methods. The findings identified several behaviors – such as the number of bookmarks, query length, number of the unique queries, time spent on search results observed in the previous segment, the current segment, and throughout the session – strongly associated with participants’ perceived barriers and help needed. The findings also showed that it is possible to build accurate predictive models to infer perceived problems of articulation of queries, useless and irrelevant information, and unavailability of information from users’ previous segment, current segment, and whole session behaviors. The findings also demonstrated that by combining perceived problem(s) and search behavioral features, it was possible to infer users’ needed help(s) in search with a certain level of accuracy (78%).

KEYWORDS

Information need
Information searching
Interactive IR

Implicit information need as explicit problems, help, and behavioral signals is available via the following link: https://doi.org/10.1016/j.ipm.2019.102069

Bias, Fairness, Diversity and Novelty

Bias, Fairness, Diversity and Novelty

When dealing with bias in IR systems, we are often faced with the question of what are the differences and connections between bias, fairness, diversity and novelty. We have briefly talked about the relationship between bias and fairness in the previous article. Let us now look at diversity and novelty.  

DIVERSITY aims to increase the topical coverage of the search results. Because the information need varies by individual users, predicting the intention based on the query alone is a difficult and sometimes impossible task for the IR system. Even with the presence of additional information such as personal profiles, social identity and networking, geolocations when issuing the search request and browsing history, it is still hard for the system to precisely accommodate each individual’s information need. One perspective is on the ambiguity associated with search query. For example, “apple” may mean the fruit, or the company named Apple. Apart from the ambiguity inherent in the language, a query may also be ambiguous due to the search intent. For instance, a user searching for “rutgers” may be looking for the official website of Rutgers University, the location, description, ranking, or recent news about the university, etc. The IR system must consider all the possibilities of user search intent and return the most relevant results. Another perspective is on the subtopics or topic aspects regarding the searched topic. For instance, different opinion polarities about “gun control” should be included in the search results so that the information presented provides a comprehensive view of the topic. Increasing diversity means to include as many topical subgroups as possible. As a result, diversity can alleviate some bias by enriching the results with more perspective, avoiding results that are all from the same group. Meanwhile, diversity can increase the fairness because it accounts for all subgroups/aspects of the topic.

NOVELTY aims to reduce the redundancy in the retrieved information. For instance, given the search query “gun control”, if two consecutive results are from the same website, or one is a forwarded or cited article from another, then the users may find the second one to be redundant to the first one. In other words, novelty tries to bring as much “new” information as possible in the set of retrieved results. From this perspective, we can see that diversity and novelty can benefit each other to some extent, but none of which guarantees the other. On the one hand, diversity brings new information by introducing different subtopics/aspects. But it does not address in-group information, i.e., it does not care how many results are in the same topical group, as long as the results cover as many groups as possible. So diversity does not guarantee novelty. Novelty, on the other hand, can surface different subtopics/aspects by reducing redundant information. But novelty does not care about the topical groups, as long as the result introduces new information compared to the previous results. In terms of bias, skewed view may be avoided by increasing novelty, but since there is no guarantee on the “group” distribution, there is no guarantee on removing the bias.

FAIRNESS aims to bring balance to the retrieved results according to a subjective design need. If the goal is to enforce topical fairness, then fairness requires all subtopics/aspects to be covered, hence the maximum diversity. But fairness does not necessarily have to be concerned with topical groups. Fairness can be imposed on other groups such as gender, racial and religion. So achieving fairness and diversity can be different goals. In addition, the key points of diversity fairness is to balance the number of results from each topical group, while diversity only aims to maximize the total number of groups covered. For example, if there are two subtopic groups for a given query in the search results, then diversity can be achieved by including one result from one group, and take the rest of the results from another group. But fairness may need the same number of results from each group, depending on the notion of fairness by system design need.

To sum up, while diversity and novelty can potentially reduce bias and fairness, their goals are essentially different from the concepts and goals of unbiasedness and fairness.

InfoSeekers Publish a Book!

InfoSeekers Publish a Book!

We’re excited to share the news that our InfoSeekers, Jiqun Liu and Chirag Shah, have published a new book! The title of the book is Interactive IR User Study Design, Evaluation, and Reporting.

Abstract

Since user study design has been widely applied in search interactions and information retrieval (IR) systems evaluation studies, a deep reflection and meta-evaluation of interactive IR (IIR) user studies is critical for sharpening the instruments of IIR research and improving the reliability and validity of the conclusions drawn from IIR user studies. To this end, we developed a faceted framework for supporting user study design, reporting, and evaluation based on a systematic review of the state-of-the-art IIR research papers recently published in several top IR venues (n=462). Within the framework, we identify three major types of research focuses, extract and summarize facet values from specific cases, and highlight the under-reported user study components which may significantly affect the results of research. Then, we employ the faceted framework in evaluating a series of IIR user studies against their respective research questions and explain the roles and impacts of the underlying connections and “collaborations” among different facet values. Through bridging diverse combinations of facet values with the study design decisions made for addressing research problems, the faceted framework can shed light on IIR user study design, reporting, and evaluation practices and help students and young researchers design and assess their own studies.

Table of Contents: Preface / Acknowledgments / Introduction / Interactive Information Retrieval / Methodology: Paper Selection and Coding Scheme / Faceted Framework of IIR User Studies / Evaluating IIR User Studies of Different Types / Implications and Limitations of the Faceted Framework / Conclusion and Future Directions / Appendix / Bibliography / Authors’ Biographies

Jiqun Liu shared his thoughts on the release of his new book: “Synthesis Lectures on Information Concepts, Retrieval, and Services includes a variety of interesting topics that are highly relevant to my research. I am thrilled and honored to have my own book published as part of the Synthesis Lectures book series.”

Interactive IR User Study Design, Evaluation, and Reporting is available via the following link: https://www.morganclaypool.com/doi/pdf/10.2200/S00923ED1V01Y201905ICR067

Graduation Celebrations for our InfoSeekers

Graduation Celebrations for our InfoSeekers

2019 commencement celebrations have arrived. First, we must extend our enormous congratulations to both Dr. Matthew Mitsui and Dr. Ziad Matni for completing their PhDs!

Dr. Ziad Matni with Professor Chirag Shah.
Dr. Matthew Mitsui with Professor Chirag Shah.

Additionally, we must celebrate InfoSeeker Ruoyuan Gao for passing her qualifying exams this semester! And, InfoSeeker Jiqun Liu won outstanding continuing doctoral student award in the area of Information Science this semester.

Finally, we would like to acknowledge the great work of our undergraduate InfoSeekers. Divya Parikh has been working on our social media system SOCRATES. Samantha Lee worked on a project that assessed the variety of approaches to improve community Q&A platforms as part of Project SUPER. Ruchi Khatri worked on a project as part of Project SUPER that investigated which factors affect stress in human computer interaction, interactive information retrieval, health search, and interface design. And, Gayeon Yoo is working on a project for our system, SOCRATES.

Congratulations again to all of our InfoSeekers and their hard work this year!

Bias and Fairness in Data Science and Machine Learning

Bias and Fairness in Data Science and Machine Learning

Where does bias come from?

Bias in data science and machine learning may come from the source data, algorithmic or system bias, and cognitive bias. Imagine that you are analyzing criminal records for two districts. The records include 10,000 residents from district A and 1,000 residents from district B. 100 district A residents and 50 district B residents have committed crimes in the past year. Will you conclude that people from district A are more likely to be criminals than people from district B? If simply comparing the number of criminals in the past year, you are very likely to reach this conclusion. But if you look at the criminal rate, you will find that district A’s criminal rate is 1% which is less than district B. Based on this analysis, the previous conclusion is biased for district A residents. This type of bias is generated due to the analyzing method, thus we call it algorithmic bias or system bias. Does the criminal based analysis guarantee an unbiased conclusion? The answer is no. It could be possible that both districts have a population of 10,000. This indicates that the criminal records have the complete statistics of district A, yet only partial statistics of district B. Depending on how the reports data is collected, 5% may or may not be the true criminal rate for district B. As a consequence, we may still arrive at a biased conclusion. This type of bias is inherent in the data we are examining, thus we call it data bias. The third type of bias is cognitive bias, which arises from our perception of the presented data. An example is that you are given the conclusions from two criminal analysis agencies. You tend to believe one over another because the former has a higher reputation, even though the former may have the biased conclusion. Read a real world case of machine learning algorithms being racially biased on recidivism here: https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html.

Bias is everywhere

With the explosion of data and technologies, we are immersed in all kinds of data applications. Think of the news you read everyday on the internet, the music you listen to through service providers, the ads displayed while you are browsing webpages, the products recommended to you when shopping online, the information you found through search engines, etc., bias can be present everywhere without people’s awareness. Like “you are what you eat”, the data you consume is so powerful that it can in fact shape your views, preferences, judgements, and even decisions in many aspects of your life. Say you want to know whether some food is good or bad for health. A search engine returns 10 pages of results. The first result and most of the results on the first page are stating that the food is healthy. To what extend do you believe the search results? After glancing at the results on the first page, will you conclude that the food is beneficial or at least the benefits outweigh the harm? How likely will you continue to check results on the second page? Are you aware that the second page may contain results of the harm of the food so that results on the first page results are biased? As a data scientist, it is important to be careful to avoid biased outcomes. But as a human being who lives in the world of data, it is more important to be aware of the bias that may exist in your daily data consumption.

Bias v.s. Fairness

It is possible that bias leads to unfairness, but can it be biased but also fair? The answer is yes. Think bias as the skewed view of the protected groups, fairness is the subjective measurement of the data or the way data is handled. In other words, bias and fairness are not necessarily contradictory to each other. Consider the employee diversity in a US company. All but one employees are US citizens. Is the employment structure biased toward US citizens? Yes, if this is a result of the US citizens being favored during the hiring process. Is it a fair structure? Yes and No. According to the Rooney Rule, this is fair since the company hired at least one minority. While according to statistical parity, this is unfair since the number of US citizens and noncitizens are not equal. In general, bias is easy and direct to measure, yet fairness is subtler due to the various subjective concerns. There are just so many different fairness definitions to choose from, let alone some of which are contradictory to each other. Check out this tutorial https://www.youtube.com/watch?v=jIXIuYdnyyk for some examples and helpful insights of fairness definitions from the perspective of a computer scientist.­­­