Browsed by
Category: Publications

Developing a search system that knows what you are looking for before you do.

Developing a search system that knows what you are looking for before you do.

Have you ever searched to plan a trip, a wedding, job hunting, or your next apartment? This kind of search can take hours, days, or even weeks. Inevitably, it would get interrupted by our daily life routine. The interrupted events can be a break for coffees, hopping into the restroom, dining, or sleeping. Therefore, doing the search would require us to pick up where we left off. These kinds of searches are called “Interrupted Search Tasks”. 

We, as well as many other scientists, are working on tackling this problem. Our approach is to try to identify and predict the sub-tasks of complex search tasks. Based on that, we provide solutions to easily complete the tasks. For example, planning a wedding. You need different information i.e., food, dress, venue. Maybe, in the search process, you forget about the food which is a subtask of wedding planning. The system proactively gives suggestions for food. 

And how do we know when to suggest these things to you? First, we try to identify whether or not you are encountering problems during a search. We found that the longer people take at the search result page the higher chance they are having problems. Making this more illustrative, imagine a person searching “Churches in Seattle”, they took a long time on the research result page, and without clicking through any of the results, the person inputs another search query, “places for a wedding”, and so on. The more queries the person puts in without clicking through any pages reflect the likelihood they are encountering problems. However, if the person interacts with the result page, i.e. click the see inside the page, we would look at the number of pages that the person bookmarked. The more subsequence pages got bookmarked, the more relevant results the person found and the fewer problems they encountered. This is how we can tell whether people can find what they are looking for. If we see you are having problems, we will recommend things that you might miss out.

So how do we know what things you missed out? In other words, how do we know that “food”, “dress”, “venue” is related to planning a wedding? We use what people have searched for in the past. The higher frequency the 2 topics are searched together, the stronger the relationship. Let’s say 1000 people searched for “wedding” along with “dress food” vs. 5 people searched for “wedding” along with “black dress”. We can tell that “dress food” has a stronger relationship to the topic “wedding” but not so much with “Black dress”. Therefore, we can recommend “dress food” when the next person searches for “wedding.”

If you are interested to know more detail about this topic. Here is the paper that we published recently, “Identifying and Predicting the States of Complex Search Tasks”.

Challenging the status quo in search engine ranking algorithms

Challenging the status quo in search engine ranking algorithms

How can we bring more fairness to search result ranking? This was the question tackled by our FATE (Fairness Accountability Transparency Ethics) group in the 2020 Text REtrieval Conference’s (TREC) Fairness Ranking Track. In the context of searching for academic papers, the assigned goal of the track was the goal was to develop an algorithm that provides fair exposure to different groups of authors while ensuring that the papers are relevant to the search queries. 

The Approach

To achieve that goal, the group decided to use “gender” and “country” as key attributes because they were general enough to be applied to all author groups. From there, the group created an  fairness-aware algorithm that was used to run two specific tasks: 

  1. An information retrieval task where the goal was to return a ranked list of papers to serve as the candidate papers for re-ranking
  2. Re-ranking task where the goal was to rank the candidate papers based on the relevance to a given query, while accounting for fair author group exposure

To evaluate the relevance of the academic papers, the group relied on BM25, which is an algorithm frequently used by search engines.

The Findings

By randomly shuffling the academic papers, the result was high levels of fairness if only the gender of the authors was considered. In contrast, if only the country of the authors was  considered, fairness was relatively lower. With the proposed algorithm, data can be re-ranked based on an arbitrary number of group definitions. However, to fully provide fair and relevant results, more attributes need to be explored. 

Why is fairness in search rankings important?

We use search engines everyday to find out information and answers for almost everything in our lives. And the ranking of the search results determine what kind of content we are likely to consume. This poses a risk because ranking algorithms often leave out the underrepresented groups, whether it’s a small business, or a research lab that is not established yet. At the same time, the results tend to only show information we like to see or agree with, which could lack diversity and contribute to bias. 

Interested in learning more? Check out the full research paper here: 

Hands-On Introduction to Data Science, Dr. Shah, our lab director’s new book

Hands-On Introduction to Data Science, Dr. Shah, our lab director’s new book

If you are looking to get started in Data Science, or in the entry-level to intermediate level, this book is just the right fit for you. The “Hands-On Introduction to Data Science” newly published book by our lab director, Dr. Shah, is filled with hands-on examples, a wide range of practices and real-life applications that will help you develop a solid understanding of the subject. No prior technical background or computing knowledge needed for this book

If you are instructors and looking for a good textbook for your class, the book also provides end-to-end support for teaching a data science course.  The book provides curriculum suggestions, slides for each chapter, datasets, program scripts, and solutions to each exercise, as well as sample exams and projects. 

Reviews & Endorsements
‘Dr. Shah has written a fabulous introduction to data science for a broad audience. His book offers many learning opportunities, including explanations of core principles, thought-provoking conceptual questions, and hands-on examples and exercises. It will help readers gain proficiency in this important area and quickly start deriving insights from data.’ Ryen W. White, Microsoft Research AI.

Now available at

Book Summary:
This book introduces the field of data science in a practical and accessible manner, using a hands-on approach that assumes no prior knowledge of the subject. The foundational ideas and techniques of data science are provided independently from technology, allowing students to easily develop a firm understanding of the subject without a strong technical background, as well as being presented with material that will have continual relevance even after tools and technologies change. Using popular data science tools such as Python and R, the book offers many examples of real-life applications, with practice ranging from small to big data. A suite of online material for both instructors and students provides a strong supplement to the book, including datasets, chapter slides, solutions, sample exams and curriculum suggestions. This entry-level textbook is ideally suited to readers from a range of disciplines wishing to build a practical, working knowledge of data science.

  • Almost everything in the book is accompanied with examples and practice – both in-chapter and end-of-chapter so students are more engaged because they can use hands-on experiences to see how theories relate to solving practical problems
  • Assumes no prior technical background or computing knowledge and lowers the barrier for entering the field of data science so that students from a range of disciplines can benefit from a more accessible introduction to data science
  • Supplemented by a generous set of material for instructors, including curriculum suggestions and syllabi, slides for each chapter, datasets, program scripts, answers and solutions to each exercise, as well as sample exams and projects which gives instructors end-to-end support for teaching a data science course
Tackling Complex Search Tasks

Tackling Complex Search Tasks

To tackle the complex search tasks, we try to identify task’ states and study the connection between states and search behaviors. We found that the task state can be predicted from the user’s search behaviors. Read about our study in this article by Jiqun Liu, Shawon Sarkar, and Chirag Shah:

Liu, J., Sarkar, S., & Shah, C. (2020, March). Identifying and Predicting the States of Complex Search Tasks. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval (pp. 193-202).

Complex search tasks that involve uncertain solution space and multi-round search iterations are integral to everyday life and information-intensive workplace practices, affecting how people learn, work, and resolve problematic situations. However, current search systems still face plenty of challenges when applied in supporting users engaging in complex search tasks. To address this issue, we seek to explore the dynamic nature of complex search tasks from process-oriented perspective by identifying and predicting implicit task states. Specifically, based upon the Web search logs and user annotation data (regarding information seeking intentions in local search steps, in-situ search problems, and help needed) collected from 132 search sessions in two controlled lab studies, we developed two task state frameworks based on intention state and problem-help state respectively and examined the connection between task states and search behaviors. We report that (1) complex search tasks of different types can be deconstructed and disambiguated based on the associated nonlinear state transition patterns; and (2) the identified task states that cover multiple subtle factors of user cognition can be predicted from search behavioral signals using supervised learning algorithms. This study reveals the way in which complex search tasks are unfolded and manifested in users’ search interactions and paves the way for developing state-aware adaptive search supports and system evaluation frameworks.

Creating a fairer search engine

Creating a fairer search engine

It’s getting increasingly more important to understand, evaluate, and perhaps rethink our search results as they continue to show bias of various kinds. Given that so much of our decision-making relies on search engine results, this is a problem that touches almost all aspects of our lives. Read about some of our new works in a new article by InfoSeekers Ruoyuan Gao and Chirag Shah:

Gao, R. & Shah, C. (2020). Toward Creating a Fairer Ranking in Search Engine Results. Journal of Information Processing and Management (IP&M), 57 (1).

With the increasing popularity and social influence of search engines in IR, various studies have raised concerns on the presence of bias in search engines and the social responsibilities of IR systems. As an essential component of search engine, ranking is a crucial mechanism in presenting the search results or recommending items in a fair fashion. In this article, we focus on the top-k diversity fairness ranking in terms of statistical parity fairness and disparate impact fairness. The former fairness definition provides a balanced overview of search results where the number of documents from different groups are equal; The latter enables a realistic overview where the proportion of documents from different groups reflect the overall proportion. Using 100 queries and top 100 results per query from Google as the data, we first demonstrate how topical diversity bias is present in the top web search results. Then, with our proposed entropy-based metrics for measuring the degree of bias, we reveal that the top search results are unbalanced and disproportionate to their overall diversity distribution. We explore several fairness ranking strategies to investigate the relationship between fairness, diversity, novelty and relevance. Our experimental results show that using a variant of fair ε-greedy strategy, we could bring more fairness and enhance diversity in search results without a cost of relevance. In fact, we can improve the relevance and diversity by introducing the diversity fairness. Additional experiments with TREC datasets containing 50 queries demonstrate the robustness of our proposed strategies and our findings on the impact of fairness. We present a series of correlation analysis on the amount of fairness and diversity, showing that statistical parity fairness highly correlates with diversity while disparate impact fairness does not. This provides clear and tangible implications for future works where one would want to balance fairness, diversity and relevance in search results.

Connecting information need to recommendations

Connecting information need to recommendations

A new article published by InfoSeekers Shawon Sarkar, Matt Mitsui, Jiqun Liu, and Chirag Shah in the Journal of Information Processing and Management (IP&M), shows how we could use behavioral signals from a user in a search episode to explicate their information need, their perceived problems, and the potential help they may need.

Here are some highlights.

  • The amount of time spent on previous search results could be an indicator of potential problems in articulation of needs into queries, perceiving useless results, and not getting useful sources in the following search stage in an information search process.
  • While performing social tasks, users mostly searched with an entirely new query, whereas, for cognitive and moderate to high complexity tasks, users used both new and substituted queries as well.
  • From users’ search behaviors, it is possible to predict the potential problem that they are going to face in the future.
  • User’s search behaviors can map an information searcher’s situational need, along with his/her perception of barriers and helps in different stages of an information search process.
  • By combining perceived problem(s) and search behavioral features, it is possible to infer users’ needed help(s) in search with a certain level of accuracy (78%).

Read more about it at