Analyzing users’ perceptions of search engine biases and their satisfaction when the biases are regulated

Analyzing users’ perceptions of search engine biases and their satisfaction when the biases are regulated

In our survey study, we paired a real page from search engine Bing and a synthesized page with more diversities in the results (i.e. less biased). Both pages show the top-10 search items given search queries. We asked participants which one they prefer and why do they prefer the selected one. Statistical analyses revealed that overall, participants prefer the original Bing pages. Additionally, the location where the diversities are introduced is significantly associated with users’ preferences.

We found out that users prefer results that are more consistent and relevant to the search queries. Introducing diversities undermines the relevance of the search results and impairs users’ satisfaction to some degree. It was interesting to see that users tend to pay more attention to the top portion of the results rather than the bottom ones, which is consistent with some previous findings.

Han, B., Shah, C., & Saelid, D. (2021). Users’ Perception of Search-Engine Biases and Satisfaction. Second International workshop on algorithmic bias in search and recommendation (Bias 2021). April 1, 2021.

Interested in learning more? Read the full paper here: https://arxiv.org/abs/2105.02898

Taking a step towards fairness-aware ranking by defining latent groups using inferred features.

Taking a step towards fairness-aware ranking by defining latent groups using inferred features.

At the BIAS @ ECIR 2021 Workshop, our lab members continue to investigate the importance of fairness in search and recommendation that is increasingly drawing attention in recent years.

The paper explores how to define latent groups, which cannot be determined by self-contained features but must be inferred from external data sources, for fairness-aware ranking. In particular, taking the Semantic Scholar dataset released in TREC 2020 Fairness Ranking Track as a case study, we infer and extract multiple fairness-related dimensions of author identity including gender and location to construct groups.

Results

We propose a fairness-aware re-ranking algorithm incorporating both weighted relevance and diversity of returned items for given queries. Our experimental results demonstrate that different combinations of relative weights assigned to relevance, gender, and location groups perform as expected.

Future work

Due to inaccurate group classifications, for our future work, we propose to explore public personal locations, such as using Twitter profile locations.

Interested to learn more?

Read the full research paper here or watch the full presentation.

Lab members participated in the CHIIR 2021 Virtual Conference

Lab members participated in the CHIIR 2021 Virtual Conference

Several of the lab members participated in the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR) 2021 last week. CHIIR focuses on elements such as human involvement in search activities, and information seeking and use in context. Our lab director Dr. Shah is the Chair of the CHIIR Steering Committee. He presented a paper that was in collaboration with Microsoft Research (MSR) AI.

Several InfoSeeking students were also working as volunteers for the conference. “It was helpful to get feedback and questions from expert mentors in the field. As always, it was fun to be at CHIIR and seize opportunities to meet friends and colleagues. I had some productive discussions in front of the aquarium in the lobby at Gather Town”. Shawon Sarker PhD Student @ InfoSeeking Lab presented her dissertation “A context-independent representation of task” that aims to explicate task information from user behaviors and apply task knowledge to search and recommendations in order to support users to complete their tasks, especially complex tasks, across multiple devices.

For more information about CHIIR visit: https://acm-chiir.github.io/chiir2021/

iSchool’s unique approach to teaching data science by focusing on human values, transparency privacy and fairness

iSchool’s unique approach to teaching data science by focusing on human values, transparency privacy and fairness


When people talk about data science programs, what do you think of? Artificial intelligence, machine learning, or coding is probably the most popular answer for those outside the discipline. What if we told you there is more to that than what meets the eye. At iSchool, we continue to empower students in understanding the implications of using such a powerful tool. With our unique approach, we have designed a program that incorporates data science through a human-centered lens, promoting solutions that are socially responsible and understanding where the potential solutions get implemented. We strive to embody students to focus on human values such as privacy, human rights, and ethics while working on data problems. This means asking not just what technology could do, but also what it should do. And this means acknowledging and addressing the individuals, organizations, and communities behind the production and consumption of data and technology. At iSchool, a small group of iSchool faculty- the iSchool Data Science Curriculum Committee (iDSCC) continues to create a more cohesive and comprehensive program. In doing so, the iSchools are paving a path for DS that can create informative, insightful, and impactful solutions for the whole of humanity for generations to come. We believe foregrounding human needs and business understanding in this way not only will lead to more ethical data science practices but also a more successful (and profitable) outcome.

Our lab director Dr. Shah wrote an article with his international collaborators on what it is to do and teach data science in an iSchool.

Shah, C., Anderson, T., Hagen, L., & Zhang, Y. An iSchool approach to data science: Human-centered, socially responsible, and context-driven. Journal of the Association for Information Science and Technology (JASIST).

Read the full article here.

Developing a search system that knows what you are looking for before you do.

Developing a search system that knows what you are looking for before you do.

Have you ever searched to plan a trip, a wedding, job hunting, or your next apartment? This kind of search can take hours, days, or even weeks. Inevitably, it would get interrupted by our daily life routine. The interrupted events can be a break for coffees, hopping into the restroom, dining, or sleeping. Therefore, doing the search would require us to pick up where we left off. These kinds of searches are called “Interrupted Search Tasks”. 

We, as well as many other scientists, are working on tackling this problem. Our approach is to try to identify and predict the sub-tasks of complex search tasks. Based on that, we provide solutions to easily complete the tasks. For example, planning a wedding. You need different information i.e., food, dress, venue. Maybe, in the search process, you forget about the food which is a subtask of wedding planning. The system proactively gives suggestions for food. 

And how do we know when to suggest these things to you? First, we try to identify whether or not you are encountering problems during a search. We found that the longer people take at the search result page the higher chance they are having problems. Making this more illustrative, imagine a person searching “Churches in Seattle”, they took a long time on the research result page, and without clicking through any of the results, the person inputs another search query, “places for a wedding”, and so on. The more queries the person puts in without clicking through any pages reflect the likelihood they are encountering problems. However, if the person interacts with the result page, i.e. click the see inside the page, we would look at the number of pages that the person bookmarked. The more subsequence pages got bookmarked, the more relevant results the person found and the fewer problems they encountered. This is how we can tell whether people can find what they are looking for. If we see you are having problems, we will recommend things that you might miss out.

So how do we know what things you missed out? In other words, how do we know that “food”, “dress”, “venue” is related to planning a wedding? We use what people have searched for in the past. The higher frequency the 2 topics are searched together, the stronger the relationship. Let’s say 1000 people searched for “wedding” along with “dress food” vs. 5 people searched for “wedding” along with “black dress”. We can tell that “dress food” has a stronger relationship to the topic “wedding” but not so much with “Black dress”. Therefore, we can recommend “dress food” when the next person searches for “wedding.”

If you are interested to know more detail about this topic. Here is the paper that we published recently, “Identifying and Predicting the States of Complex Search Tasks”.

Challenging the status quo in search engine ranking algorithms

Challenging the status quo in search engine ranking algorithms

How can we bring more fairness to search result ranking? This was the question tackled by our FATE (Fairness Accountability Transparency Ethics) group in the 2020 Text REtrieval Conference’s (TREC) Fairness Ranking Track. In the context of searching for academic papers, the assigned goal of the track was the goal was to develop an algorithm that provides fair exposure to different groups of authors while ensuring that the papers are relevant to the search queries. 

The Approach

To achieve that goal, the group decided to use “gender” and “country” as key attributes because they were general enough to be applied to all author groups. From there, the group created an  fairness-aware algorithm that was used to run two specific tasks: 

  1. An information retrieval task where the goal was to return a ranked list of papers to serve as the candidate papers for re-ranking
  2. Re-ranking task where the goal was to rank the candidate papers based on the relevance to a given query, while accounting for fair author group exposure

To evaluate the relevance of the academic papers, the group relied on BM25, which is an algorithm frequently used by search engines.

The Findings

By randomly shuffling the academic papers, the result was high levels of fairness if only the gender of the authors was considered. In contrast, if only the country of the authors was  considered, fairness was relatively lower. With the proposed algorithm, data can be re-ranked based on an arbitrary number of group definitions. However, to fully provide fair and relevant results, more attributes need to be explored. 

Why is fairness in search rankings important?

We use search engines everyday to find out information and answers for almost everything in our lives. And the ranking of the search results determine what kind of content we are likely to consume. This poses a risk because ranking algorithms often leave out the underrepresented groups, whether it’s a small business, or a research lab that is not established yet. At the same time, the results tend to only show information we like to see or agree with, which could lack diversity and contribute to bias. 

Interested in learning more? Check out the full research paper here: https://arxiv.org/pdf/2011.02066.pdf 

Soumik Mandal successfully defends his dissertation.

Soumik Mandal successfully defends his dissertation.

Soumik Mandal, Ph.D. student

Our Ph.D. student, Soumik Mandal, has successfully defended his dissertation titled “Clarifying user’s information need in conversational information retrieval”. The committee included  Chirag Shah (University of Washington, Chair), Nick Belkin (Rutgers University), Katya Ognyanova (Rutgers University), and Michel Galley (Microsoft).

Abstract

With traditional information retrieval systems users are expected to express their information need adequately and accurately to get appropriate response from the system. This set up works generally well for simple tasks, however, in complex task scenarios users face difficulties in expressing information need as accurately as needed by the system. Therefore, the case of clarifying user’s information need arises. In current search engines, support in such cases is provided in the form of query suggestion or query recommendation.  However, in conversational information retrieval systems the interaction between the user and the system happens in the form of dialogue. Thus it is possible for the system to better support such cases by asking clarifying questions. However, current research in both natural language processing and information retrieval systems does not adequately explain how to form such questions and at what stage of dialog clarifying questions should be asked of the user. To address this gap, this proposed research will investigate the nature of conversation between user and expert intermediary to model the functions the expert performs to address the user’s information need. More specifically this study will explore the way intermediary can ask questions to user to clarify his information need in complex task scenarios.

InfoSeeking’s 10th Birthday

InfoSeeking’s 10th Birthday

Looking back

In the fall of 2010, we started as a reading group for people who would come together to read papers on topics of information seeking/retrieval/behavior every week. The group was called “Information seeking and behavior group”. Dr. Chirag Shah has been leading the group from the beginning. 

Quickly the reading group became a research group as students and faculty started identifying projects that interested them and pulled resources to design studies and experiments.

In that same fall, as the group started getting traction and attracting more students, resources, and funding, we became InfoSeeking Lab.

In the beginning, the lab focused on issues of information seeking/retrieval and social media. As new members and interests were added, the lab explored many more areas, including wearable sensors, collaborative work, online communities, and conversational systems.

The methods for research also evolved from user studies to large-scale log analysis, and from ethnographic approaches to deep learning models.

Our achievements

We have been pushing forward the knowledge in information seeking/retrieval and other related topics in the Information and Data Sciences field for 10 years. 

Throughout those years, the lab has received more than 4 million dollars in grants and gifts from federal and state agencies as well as private organizations. 

So far, the lab has produced 13 excellent PhD students and countless undergraduate and master students to drive new ideas and innovations into the Data Sciences field. Our alumni have gone to major universities around the world and reputable companies like Dropbox, eBay, Google, Sony, and TD Bank.

Some of the lab’s early works laid the foundation for collaborative and social work by people from all walks of life. One of the outcomes was a system called Coagmento, which was extensively tested with and deployed in classrooms. When it was used in a NY-based highschool, the teachers, for the first time, found that they could gain valuable insights into their students’ work and help them in ways not possible before using our system.

We have been at the forefront of developing new methodologies, tools, and solutions. We were one of the first to use the escape room as a method to understand how people seek information and solve problems.

We have been and are going to continue contributing to the community. The lab worked closely with the United Nations Data Analytics group to address several of the UN’s Sustainable Development Goals (SDGs). As a result of the collaboration, the lab launched Science for Social Good (S4SG). All of our works build into SDGs’ goals.

We have also worked with several private foundations and startups over the years to solve real-world problems. One example is our collaboration with Brainly, a startup from Poland that focused on educational Q&A. With them, we worked on problems of assessing the quality of the content as well as detecting users with certain characteristics, such as those exhibiting struggle. The solutions to these problems are extremely useful in education.

Looking back at the last 10 years and how glorious they have been, we are confident that the next decade will be even more amazing.

People can’t identify COVID-19 fake news

People can’t identify COVID-19 fake news

A recent study conducted by our lab, InfoSeeking Lab at the University of Washington, Seattle shows that people can’t spot COVID-19 fake news in search results.

The study was done by having people choosing between 2 sets of top 10 search results. One is direct from Google and another has been manipulated by putting one or two fake news results in the list. 

This is a continuing study from prior experiments from the Lab in a similar setting but with random information manipulated in the top 10 search results. The outcomes are all in the same directions, people can’t tell which search results are being manipulated.

“This means that I am able to sell this whole package of recommendations with a couple of bad things in it without you ever even noticing. Those bad things can be misinformation or whatever hidden agenda that I have”, said Chirag Shah, InfoSeeking Lab Director and Associate Professor at the University of Washington. 

This brought up very important problems that people don’t pay attention to. They believe that what they see is true because it comes from Google, Amazon, or some other system they use daily. Especially in prime positions like the first 10 results as multiple studies show that more than 90% of searchers’ clicks concentrate on the first page. This means any manipulated information that is able to get into Google’s first page of search results is now being perceived as true.

In the current situation, people are worried about uncertainty. A lot of us seek updates about the situation daily. Google is the top search engine that we turn to. People need trustworthy information; however, there are many who are taking advantage of people’s fear and spreading misinformation for their own agenda. What would happen if the next fake news said that there is a new finding that the virus has mutated with an 80% fatal rate, what would it do to our community? Would people start to usurp for food? Would people wearing a mask in the public be attacked? Would you be able to spot the fake news? The lab is continuing to explore these critical issues of public importance through their research work on FATE (Fairness Accountability Transparency Ethics).

For this finding, InfoSeeking researchers analyzed more than 10,000 answers on both random and fake information manipulated in the list, involving more than 500 English-speaking people around the U.S.

InfoSeeking at The 14th RecSys Conference

InfoSeeking at The 14th RecSys Conference

RecSys is the premier international forum for the presentation of new research results, systems, and techniques in the broad field of recommender systems. 

We are thrilled to be involved in one of the most important annual conferences for the presentation and discussion of recommender systems research. This year, our Lab Director, Chirag Shah, collaborated with Spotify paper – Investigating Listeners’ Responses to Divergent Recommendations – is being presented at the conference. 

Moreover, two of our InfoSeekers, Ruoyuan Gao and Chirag Shah, are also giving a tutorial on “Counteracting Bias and Increasing Fairness in Search and Recommender Systems

About the Tutorial

Search and recommender systems have unprecedented influence on how and what information people access. These gateways to information on the one hand create an easy and universal access to online information, and on the other hand create biases that have shown to cause knowledge disparity and ill-decisions for information seekers. Most of the algorithms for indexing, retrieval, ranking, and recommendation are heavily driven by the underlying data that itself is biased. In addition, ordering of the search and recommendation results create position bias and exposure bias due to their considerable focus on relevance and user satisfaction. These and other forms of biases that are implicitly and some times explicitly woven in search and recommender systems are becoming increasing threats to information seeking and sense-making processes. In this tutorial, we will introduce the issues of biases in search and recommendation and show how we could think about and create systems that are fairer, with increasing diversity and transparency. Specifically, the tutorial will present several fundamental concepts such as relevance, novelty, diversity, bias, and fairness using socio-technical terminologies taken from various communities, and dive deeper into metrics and frameworks that allow us to understand, extract, and materialize them. The tutorial will cover some of the most recent works in this area and show how this interdisciplinary research has opened up new challenges and opportunities for communities such as RecSys.

DATE

Session A on Sep 25 10:00 – Sep 25 11:00, Attend in Whova
Session B on Sep 25 21:00 – Sep 25 22:00, Attend in Whova