Browsed by
Category: Projects

FATE Research Group: From Why to What and How

FATE Research Group: From Why to What and How

When the public started getting access to the internet, search engines became common in daily usage. Services such as Yahoo, AltaVista, and Google were used to satisfy people’s curiosity. Although it was not comfortable using search engines because users had to go back and forth between all the search engines, it seemed like magic that users could get so much information in a very short time. At that time, users started using search engines without any previous training. Before search engines became popular, the public generally found information in libraries by reading the library catalog or asking a librarian for help. In contrast, typing a few keywords is enough to find answers on the internet. Not only that, but search engines have been continually developing their own algorithms and giving us great features, such as knowledge bases that enhance their search engine results with information gathered from various sources.

Soon enough, Google became the first choice for many people due to its accuracy and high-quality results. As a result, other search engines got dominated by Google. However, while Google results are high-quality, those results are biased. According to a recent study, the top web search results from search engines are typically shown to be biased. Some of the results on the first page are made to be there just to capture users’ attention. At the same time, users tend to click mostly on results that appear on the first page. The study gives an example about a normal topic: coffee and health. In the first 20 results, there are 17 results about the health benefits, while only 3 results mentioned the harms.

This problem led our team at the InfoSeeking Lab to start a new project known as Fairness, Accountability, Transparency, Ethics (FATE). In this project, we have been exploring ways to balance the inherent bias found in search engines and fulfill a sense of fair representation while effectively maintaining a high degree of utility.

We started this experiment with one big goal, which is to improve fairness. For that, we designed a new system that shows two sets of results, both of which are very similar to Google’s dashboard. (as illustrated by picture below).  We have collected 100 queries and top 100 results per query from Google in general topics such as sports, food, travel, etc. One of these sets is obtained from Google. The other one is generated through an algorithm that reduces bias. The system has 20 rounds. The system gives a user 30 seconds on each round to choose the set they prefer.

For this experiment, we asked around 300 participants to participate. The goal is to see if participants can notice a difference between our algorithms and Google. The early results show that participants preferred our algorithms more than Google. However, we will discuss more in detail as soon as we finish the analysis process. Furthermore, we are in the process of writing a technical paper and an academic article.

Also, we have designed a game that looks very similar to our system. This game tests the ability to notice bad results. It gives you a score and some advice. In this game, users can also challenge their friends or members of their families. To try this game, click here http://fate.infoseeking.org/googleornot.php

For many years, the InfoSeeking Lab has worked on issues related to information retrieval, information behavior, data science, social media, and human-computer interaction. Visit the InfoSeeking Lab website to know more about our projects https://www.infoseeking.org

For more information about the experiment visit FATE project website http://fate.infoseeking.org

FATE project – What is it about and why does it matter?

FATE project – What is it about and why does it matter?

FATE is our InfoSeeking Lab’s one of the most active projects. It stands for Fairness Accountability Transparency Ethics. FATE aims to address bias found in search engines like Google and discover ways to de-bias information presented to the end-user while maintaining a high degree of utility.

Why does it matter?

There are many pieces of evidence in the past where search algorithms reinforce bias assumptions toward a certain group of people. Below are some past examples of search bias related to the Black community.

Search engines suggested unpleasant words on Black women. The algorithm recommended words like ‘angry’, ‘loud’, ‘mean’, or ‘attractive’. These auto-completions reinforced bias assumptions toward Black women.

Credit: Safiya Noble

Search results show images of Black’s natural hair as unprofessional while showing images of white Americans’ straight hair as professional hairstyles for work.

Credit: Safiya Noble

Search results on “three black teenagers” were represented by mug shots of Black teens while the results of  “three white teenagers” were represented by images of smiling and happy white teenagers.

Credit: Safiya Noble

These are some issues that were around for many years until someone uncovered them, which then sparked changes to solve these problems.

At FATE, we aim to address these issues and find ways to bring fairness when seeking information.

If you want to learn more about what we do or get updates on our latest findings, check out our FATE website.

A new NSF grant for explainable recommendations

A new NSF grant for explainable recommendations

Dr. Yongfeng Zhang from Rutgers University and Dr. Chirag Shah from University of Washington are recipients of a new grant from NSF (3 years, $500k) to work on explainable recommendations. It’s a step toward curing the “runaway AI”!

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1910154

Recommendation systems are essential components of our daily life. Today, intelligent recommendation systems are used in many Web-based systems. These systems provide personalized information to help human decisions. Leading examples include e-commerce recommendations for everyday shopping, job recommendations for employment markets, and social recommendations to make people better connected. However, most recommendation systems merely suggest recommendations to users. They rarely tell users why such recommendations are provided. This is primarily due to the closed nature algorithms behind the systems that are difficult to explain. The lack of good explainability sacrifices transparency, effectiveness, persuasiveness, and trustworthiness of recommendation systems. This research will allow for personalized recommendations to be provided in more explainable manners, improving search performance and transparency. The research will benefit users in real systems through researchers? industry collaboration with e-commerce and social networks. New algorithms and datasets developed in the project will supplement courses in computer science and iSchool programs. Presentation of the work and demos will help to engage with wider audiences that are interested in computational research. Ultimately, the project will make it easier for humans to understand and trust the machine decisions.

This project will explore a new framework for explainable recommendation that involves both system designers and end users. The system designers will benefit from structured explanations that are generated for model diagnostics. The end users will benefit from receiving natural language explanations for various algorithmic decisions. This project will address three fundamental research challenges. First, it will create new machine learning methods for explainable decision making. Second, it will develop new models to generate free-text natural language explanations. Third, it will identify key factors to evaluate the quality of explanations. In the process, the project will also develop aggregated explainability measures and release evaluation benchmarks to support reproducible explainable recommendation research. The project will result in the dissemination of shared data and benchmarks to the Information Retrieval, Data Mining, Recommender System, and broader AI communities.

Bias, Fairness, Diversity and Novelty

Bias, Fairness, Diversity and Novelty

When dealing with bias in IR systems, we are often faced with the question of what are the differences and connections between bias, fairness, diversity and novelty. We have briefly talked about the relationship between bias and fairness in the previous article. Let us now look at diversity and novelty.  

DIVERSITY aims to increase the topical coverage of the search results. Because the information need varies by individual users, predicting the intention based on the query alone is a difficult and sometimes impossible task for the IR system. Even with the presence of additional information such as personal profiles, social identity and networking, geolocations when issuing the search request and browsing history, it is still hard for the system to precisely accommodate each individual’s information need. One perspective is on the ambiguity associated with search query. For example, “apple” may mean the fruit, or the company named Apple. Apart from the ambiguity inherent in the language, a query may also be ambiguous due to the search intent. For instance, a user searching for “rutgers” may be looking for the official website of Rutgers University, the location, description, ranking, or recent news about the university, etc. The IR system must consider all the possibilities of user search intent and return the most relevant results. Another perspective is on the subtopics or topic aspects regarding the searched topic. For instance, different opinion polarities about “gun control” should be included in the search results so that the information presented provides a comprehensive view of the topic. Increasing diversity means to include as many topical subgroups as possible. As a result, diversity can alleviate some bias by enriching the results with more perspective, avoiding results that are all from the same group. Meanwhile, diversity can increase the fairness because it accounts for all subgroups/aspects of the topic.

NOVELTY aims to reduce the redundancy in the retrieved information. For instance, given the search query “gun control”, if two consecutive results are from the same website, or one is a forwarded or cited article from another, then the users may find the second one to be redundant to the first one. In other words, novelty tries to bring as much “new” information as possible in the set of retrieved results. From this perspective, we can see that diversity and novelty can benefit each other to some extent, but none of which guarantees the other. On the one hand, diversity brings new information by introducing different subtopics/aspects. But it does not address in-group information, i.e., it does not care how many results are in the same topical group, as long as the results cover as many groups as possible. So diversity does not guarantee novelty. Novelty, on the other hand, can surface different subtopics/aspects by reducing redundant information. But novelty does not care about the topical groups, as long as the result introduces new information compared to the previous results. In terms of bias, skewed view may be avoided by increasing novelty, but since there is no guarantee on the “group” distribution, there is no guarantee on removing the bias.

FAIRNESS aims to bring balance to the retrieved results according to a subjective design need. If the goal is to enforce topical fairness, then fairness requires all subtopics/aspects to be covered, hence the maximum diversity. But fairness does not necessarily have to be concerned with topical groups. Fairness can be imposed on other groups such as gender, racial and religion. So achieving fairness and diversity can be different goals. In addition, the key points of diversity fairness is to balance the number of results from each topical group, while diversity only aims to maximize the total number of groups covered. For example, if there are two subtopic groups for a given query in the search results, then diversity can be achieved by including one result from one group, and take the rest of the results from another group. But fairness may need the same number of results from each group, depending on the notion of fairness by system design need.

To sum up, while diversity and novelty can potentially reduce bias and fairness, their goals are essentially different from the concepts and goals of unbiasedness and fairness.

Graduation Celebrations for our InfoSeekers

Graduation Celebrations for our InfoSeekers

2019 commencement celebrations have arrived. First, we must extend our enormous congratulations to both Dr. Matthew Mitsui and Dr. Ziad Matni for completing their PhDs!

Dr. Ziad Matni with Professor Chirag Shah.
Dr. Matthew Mitsui with Professor Chirag Shah.

Additionally, we must celebrate InfoSeeker Ruoyuan Gao for passing her qualifying exams this semester! And, InfoSeeker Jiqun Liu won outstanding continuing doctoral student award in the area of Information Science this semester.

Finally, we would like to acknowledge the great work of our undergraduate InfoSeekers. Divya Parikh has been working on our social media system SOCRATES. Samantha Lee worked on a project that assessed the variety of approaches to improve community Q&A platforms as part of Project SUPER. Ruchi Khatri worked on a project as part of Project SUPER that investigated which factors affect stress in human computer interaction, interactive information retrieval, health search, and interface design. And, Gayeon Yoo is working on a project for our system, SOCRATES.

Congratulations again to all of our InfoSeekers and their hard work this year!

Bias and Fairness in Data Science and Machine Learning

Bias and Fairness in Data Science and Machine Learning

Where does bias come from?

Bias in data science and machine learning may come from the source data, algorithmic or system bias, and cognitive bias. Imagine that you are analyzing criminal records for two districts. The records include 10,000 residents from district A and 1,000 residents from district B. 100 district A residents and 50 district B residents have committed crimes in the past year. Will you conclude that people from district A are more likely to be criminals than people from district B? If simply comparing the number of criminals in the past year, you are very likely to reach this conclusion. But if you look at the criminal rate, you will find that district A’s criminal rate is 1% which is less than district B. Based on this analysis, the previous conclusion is biased for district A residents. This type of bias is generated due to the analyzing method, thus we call it algorithmic bias or system bias. Does the criminal based analysis guarantee an unbiased conclusion? The answer is no. It could be possible that both districts have a population of 10,000. This indicates that the criminal records have the complete statistics of district A, yet only partial statistics of district B. Depending on how the reports data is collected, 5% may or may not be the true criminal rate for district B. As a consequence, we may still arrive at a biased conclusion. This type of bias is inherent in the data we are examining, thus we call it data bias. The third type of bias is cognitive bias, which arises from our perception of the presented data. An example is that you are given the conclusions from two criminal analysis agencies. You tend to believe one over another because the former has a higher reputation, even though the former may have the biased conclusion. Read a real world case of machine learning algorithms being racially biased on recidivism here: https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html.

Bias is everywhere

With the explosion of data and technologies, we are immersed in all kinds of data applications. Think of the news you read everyday on the internet, the music you listen to through service providers, the ads displayed while you are browsing webpages, the products recommended to you when shopping online, the information you found through search engines, etc., bias can be present everywhere without people’s awareness. Like “you are what you eat”, the data you consume is so powerful that it can in fact shape your views, preferences, judgements, and even decisions in many aspects of your life. Say you want to know whether some food is good or bad for health. A search engine returns 10 pages of results. The first result and most of the results on the first page are stating that the food is healthy. To what extend do you believe the search results? After glancing at the results on the first page, will you conclude that the food is beneficial or at least the benefits outweigh the harm? How likely will you continue to check results on the second page? Are you aware that the second page may contain results of the harm of the food so that results on the first page results are biased? As a data scientist, it is important to be careful to avoid biased outcomes. But as a human being who lives in the world of data, it is more important to be aware of the bias that may exist in your daily data consumption.

Bias v.s. Fairness

It is possible that bias leads to unfairness, but can it be biased but also fair? The answer is yes. Think bias as the skewed view of the protected groups, fairness is the subjective measurement of the data or the way data is handled. In other words, bias and fairness are not necessarily contradictory to each other. Consider the employee diversity in a US company. All but one employees are US citizens. Is the employment structure biased toward US citizens? Yes, if this is a result of the US citizens being favored during the hiring process. Is it a fair structure? Yes and No. According to the Rooney Rule, this is fair since the company hired at least one minority. While according to statistical parity, this is unfair since the number of US citizens and noncitizens are not equal. In general, bias is easy and direct to measure, yet fairness is subtler due to the various subjective concerns. There are just so many different fairness definitions to choose from, let alone some of which are contradictory to each other. Check out this tutorial https://www.youtube.com/watch?v=jIXIuYdnyyk for some examples and helpful insights of fairness definitions from the perspective of a computer scientist.­­­

Creating social impact through research

Creating social impact through research

Over the years, our lab has done some really groundbreaking work in the fields of information seeking, interactive information retrieval, social and collaborative search, social media, and human-computer interaction. Almost all of it had been geared toward scholarly communities. It makes sense. After all, we are operating in an academic research setting.

But lately at least I have been pondering about how what we do could and should benefit the society. And I don’t mean it in subtle, indirect, or some hypothetical ways. Sure, everything we do has a positive impact on people, starting with people doing that work. It earns them class credits, diplomas, and salaries. It also helps educate students and train professionals in certain skills. But that’s still a very small sample of population. Beyond that, some of our research and technologies developed through that work have impacted various government, educational, research, for-profit, and non-profit organizations in furthering their agendas.

And yes, from time to time we have helped out the United Nations (UN) and a few other organizations more directly with their data problems.

That is still not enough. There are many important issues in the world to address and those of us in privileged positions should do more.

And that is where we launched a new effort called Science for Social Good (S4SG). Under this umbrella, we started rethinking some of our existing works and how they could help address one of the issues of societal importance. Since we already had ties with the UN, and I regularly participate in some of their activities, it made logical sense to start with what the UN considers as a set of important issues. As it happens, the UN has a list of 17 Sustainable Development Goals (SDGs), which they hope will be met by year 2030. And we decided to be a part of the solution.

The UN’s list seemed comprehensive enough, and so, we started from that list and first identified a few organizations who aim to address at least some of those SDGs. And then, we looked inward — to see what activities that we do could help with these SDGs. The result was a pleasant surprise. Several of our projects do actually directly connect to one or more of these SDGs. In other words, by solving those research and development problems, we are directly or indirectly helping the UN (and the world) meet those SDGs. Some of the most common SDGs that our projects are addressing include Good Health and Well-being (SDG-3), Education (SDG-4), and Reduced Inequality (SDG-10).

More importantly, creating the S4SG platform has allowed us to rethink some of our future research activities and see if we could better align them with the societal impact in mind. This is not always easy, but it’s almost always worth doing.

Visit S4SG.org to learn more.

Information Retrieval (IR) Fairness: What is it and what can we do?

Information Retrieval (IR) Fairness: What is it and what can we do?

When you search for information in a search engine such as Google, a list of results is displayed for you to explore further. This process is called Information Retrieval. The contents of the search results are collected based on certain criteria that are catered to you such as: past search history to match your interests, geographic location to relate to what is relevant based on your physical location, and advertising that has been targeted to match your interests and geographic location. These criteria are coded into algorithms to automate the information retrieval process catered to your needs, or what you would potentially consider to be relevant.

For example, let’s say you are searching for information about the healthiness of coffee and you search for “is coffee good for your health.” You may be looking for information that confirms your belief about the benefits of coffee, or you may be simply asking the question “whether coffee is good or bad for your health.” If you asked this question to a human who was an expert in facts about coffee you would likely get an answer that weighs the benefits and harms of coffee. Ideally, when you enter this same question in a search engine, it should return both the goodness and badness about coffee.

Unfortunately, searching for “is coffee good for your health” and “is coffee good for your health” will return a different set of information that is catered to your needs, and not the question as a whole. As a result, catering specifically to the user can create bias. If you are only seeing information that relates to what you are already interested in, or what is geographically near you, there are other perspectives that are intentionally filtered out of the results list.

So, how can we improve the algorithms that are used in the information retrieval process to incorporate more perspectives to reduce bias?

InfoSeeker Ruoyuan Gao is currently working on addressing the presence of bias found in search engine results. Currently, she is exploring several strategies to investigate the relationship between information usefulness and fairness within search engines such as Google. She proposes developing tools to measure the degree of bias in order to create a more balanced list of search results that includes many relevant perspectives for a search topic.

A Summer of Productive Fun for InfoSeekers

A Summer of Productive Fun for InfoSeekers

Activities included travel, classes, lab meetings and socializing!

Where does one begin to describe the summer of 2018? Chirag summed it up when he said, “We had a fantastic, fun, and productive summer. I think even having lab meetings every week throughout the summer is an achievement. We learned a lot from each other and had fun doing so. InfoSeekers have won awards, presented papers, and traveled to different corners of the world. Even our alumni have done some wonderful things.”

InfoSeekers line up for an end-of-summer group photo.

The following captures just the highlights of InfoSeekers at work and play, keeping things interesting as they moved their studies forward.

Manasa Rath went to summer school in Los Angeles, and her team rated runner-up status for an award for a project for “Summer Methods Course on Computational Social Sciences.” Before attending the course, Manasa had scored full funding for her travels, accommodation and other support. (Only 11 percent of those who apply for this support receive it.) While there, she met other graduate students from the U.S. and Europe who were learning about automated textual analysis. Her team’s project concerned using word embeddings to measure ethnic stereotypes from various news corpora, including NPR (National Public Radio) and The New York Times.

Meanwhile, Souvick Ghosh did a ten-week internship as part of the LEADS-4-NDP (National Digital Platform) Fellowship Program. Each intern in the program worked with different industry partners focusing on data science problems. Vic collaborated with OCLC Research to cluster publisher names using MARC records. (OCLC is the global library cooperative that provides shared technology services; MARC stands for Machine-Readable Catalog and has provided the national standard for the description of items for the digital catalog for libraries since 1971.) In their internship work they attempted to cluster instances of MARC records that contain different information such as the title of a book, the author, the publisher, ISBN number, etc. The idea was to cluster the instances of same publisher entities, exploring different hashing and machine-learning approaches, additionally evaluating the relative importance of various features for classifying entities.

InfoSeekers continued lab meetings throughout the summer.

In other updates, Jiqun Liu and Shawon Sarkar started the recruitment phase for a study on people’s search experience and preferred supports in information seeking, the purpose of which is to improve Web search. So far, four people have completed the study. Recruitment and running the study will likely continue through mid-October.

InfoSeeking Lab Director and all-around inspired leader Chirag Shah did his share of travel this summer including a visit to Ryerson University in Toronto, where he gave a talk about data and algorithmic biases. (See his August 6 blog.) But the real fun was being able to finish his goal of making it to all 50 of the states in the U.S.

Please be sure to scroll all the way down to see the fun capper snapshot!

Diana Soltani presenting her summer research on Coagmento with a poster and a demo.

Never let it be said that InfoSeekers are anti-social. Lunch out helps punctuate the end of a great summer.

Alaska was the final frontier for Chirag’s quest to visit all 50 U.S. states. Here are Chirag and Lori Shah in a kayak in the mountains in White Pass, which is actually in the Canadian Yukon territories if you want to get picky. (Did you know that the kayak comes to us from the native peoples of Alaska, Canada and Greenland?)

 

 

 

 

 

 

 

 

 

 

By the way, have a wonderful fall semester, InfoSeekers, and a very Happy Birthday to Chirag!

Addressing bias by bringing in diversity and inclusion in search

Addressing bias by bringing in diversity and inclusion in search

When it comes to Web search, the Matthew Effect (rich gets richer) applies quite well. Things that show up on the top of a rank list are likely to get clicked more and as they get clicked more, they continue being on the top. Of course, that’s not bad in itself; if there are things relevant to one’s query every time, why not have them show up at the top? But this becomes problematic when objectionable and deceptive items get to the top for some reason.

Think about sensational news. People like them; more specifically, people like clicking on them. When a title of a story says “You won’t believe what NASA is hiding”, most people are naturally enticed to click on it. It doesn’t matter if that story actually has anything substantive or not, but it got clicked, and in many instances, shared. This all probably started by someone searching for ‘NASA’. Or, increasingly, such a story coming up as a paid content (ad) next to a real story. A search system then treats this as a signal of relevance and the next time this story gets even more prominent position, thus starting the vicious cycle of the Matthew Effect. We are victims of our own desire to be curious and wanting to get attention by others!

There are many instances of this and other kinds of intentional and unintentional biases in today’s Web searches, social media, and recommender systems as people try dirty techniques of SEO (search engine optimization), click-bait, and simply wanting to spread rumors and “fake news”. It’s becoming an increasingly frustrating problem for service providers and users, but even more seriously, a threat to our democracy and free speech (and some may even say, free will).

There is no single solution for this. But we have to try all that we can to fight such issues of bias and fairness in Web. For my part, I have been addressing this through the use of collaboration — specifically, bringing in diversity and inclusion in search. For example, in 2007-2008, as a part of my work with Jeremy Pickens and late Gene Golovchinsky at FXPAL, I looked at having people with different backgrounds and skillsets to work together in collaborative search. This was with algorithmic mediation. During 2008-2010, I continued exploring this idea from the user side (people identifying and leveraging different roles). With my doctoral student Roberto Gonzalez-Ibanez, we then worked on identifying opportunities for collaboration in search (2010-2012), so we could start creating synergies instead of biases. In 2013-2014, I worked with Laure Soulier and Lynda Tamine from IRIT in France to bring both the system-side and the user-side together in an attempt to have diverse roles of people while working on search tasks.

While these works were going on with collaboration, I have also been exploring a parallel thread on social information seeking. This is where I studied how communities could come together in creating more synergic solutions, sort of using “wisdom of crowd” idea, rather than relying on any single individual’s ideas and opinions. This thread has been carried out from 2008 to the present day.

Now, we have started a new thread in my lab that is aimed at addressing bias in search in an individual searcher’s setting. It’s not easy. First off, the notions of bias and fairness are not clearly defined, and our own understanding of them keeps evolving. Second, there are several layers of complexity here that involve data as well as algorithms of ranking and organizing information, and it’s not always clear which of these layers are responsible for introducing bias. Third, there is a large amount of personalization in pretty much everything we see and consume on the Web today. Since this layer of personalization is, as the name suggests, personal, it’s hard to peel it off.

Still, I think this issue of bias and fairness in the Web is very important and together with my colleagues and students, I am continuing to invest a lot of efforts on this problem. It’s been more than a decade of working on this problem through many different angles, and I feel like we are just getting started!