Browsed by
Month: May 2019

Graduation Celebrations for our InfoSeekers

Graduation Celebrations for our InfoSeekers

2019 commencement celebrations have arrived. First, we must extend our enormous congratulations to both Dr. Matthew Mitsui and Dr. Ziad Matni for completing their PhDs!

Dr. Ziad Matni with Professor Chirag Shah.
Dr. Matthew Mitsui with Professor Chirag Shah.

Additionally, we must celebrate InfoSeeker Ruoyuan Gao for passing her qualifying exams this semester! And, InfoSeeker Jiqun Liu won outstanding continuing doctoral student award in the area of Information Science this semester.

Finally, we would like to acknowledge the great work of our undergraduate InfoSeekers. Divya Parikh has been working on our social media system SOCRATES. Samantha Lee worked on a project that assessed the variety of approaches to improve community Q&A platforms as part of Project SUPER. Ruchi Khatri worked on a project as part of Project SUPER that investigated which factors affect stress in human computer interaction, interactive information retrieval, health search, and interface design. And, Gayeon Yoo is working on a project for our system, SOCRATES.

Congratulations again to all of our InfoSeekers and their hard work this year!

Bias and Fairness in Data Science and Machine Learning

Bias and Fairness in Data Science and Machine Learning

Where does bias come from?

Bias in data science and machine learning may come from the source data, algorithmic or system bias, and cognitive bias. Imagine that you are analyzing criminal records for two districts. The records include 10,000 residents from district A and 1,000 residents from district B. 100 district A residents and 50 district B residents have committed crimes in the past year. Will you conclude that people from district A are more likely to be criminals than people from district B? If simply comparing the number of criminals in the past year, you are very likely to reach this conclusion. But if you look at the criminal rate, you will find that district A’s criminal rate is 1% which is less than district B. Based on this analysis, the previous conclusion is biased for district A residents. This type of bias is generated due to the analyzing method, thus we call it algorithmic bias or system bias. Does the criminal based analysis guarantee an unbiased conclusion? The answer is no. It could be possible that both districts have a population of 10,000. This indicates that the criminal records have the complete statistics of district A, yet only partial statistics of district B. Depending on how the reports data is collected, 5% may or may not be the true criminal rate for district B. As a consequence, we may still arrive at a biased conclusion. This type of bias is inherent in the data we are examining, thus we call it data bias. The third type of bias is cognitive bias, which arises from our perception of the presented data. An example is that you are given the conclusions from two criminal analysis agencies. You tend to believe one over another because the former has a higher reputation, even though the former may have the biased conclusion. Read a real world case of machine learning algorithms being racially biased on recidivism here:

Bias is everywhere

With the explosion of data and technologies, we are immersed in all kinds of data applications. Think of the news you read everyday on the internet, the music you listen to through service providers, the ads displayed while you are browsing webpages, the products recommended to you when shopping online, the information you found through search engines, etc., bias can be present everywhere without people’s awareness. Like “you are what you eat”, the data you consume is so powerful that it can in fact shape your views, preferences, judgements, and even decisions in many aspects of your life. Say you want to know whether some food is good or bad for health. A search engine returns 10 pages of results. The first result and most of the results on the first page are stating that the food is healthy. To what extend do you believe the search results? After glancing at the results on the first page, will you conclude that the food is beneficial or at least the benefits outweigh the harm? How likely will you continue to check results on the second page? Are you aware that the second page may contain results of the harm of the food so that results on the first page results are biased? As a data scientist, it is important to be careful to avoid biased outcomes. But as a human being who lives in the world of data, it is more important to be aware of the bias that may exist in your daily data consumption.

Bias v.s. Fairness

It is possible that bias leads to unfairness, but can it be biased but also fair? The answer is yes. Think bias as the skewed view of the protected groups, fairness is the subjective measurement of the data or the way data is handled. In other words, bias and fairness are not necessarily contradictory to each other. Consider the employee diversity in a US company. All but one employees are US citizens. Is the employment structure biased toward US citizens? Yes, if this is a result of the US citizens being favored during the hiring process. Is it a fair structure? Yes and No. According to the Rooney Rule, this is fair since the company hired at least one minority. While according to statistical parity, this is unfair since the number of US citizens and noncitizens are not equal. In general, bias is easy and direct to measure, yet fairness is subtler due to the various subjective concerns. There are just so many different fairness definitions to choose from, let alone some of which are contradictory to each other. Check out this tutorial for some examples and helpful insights of fairness definitions from the perspective of a computer scientist.­­­

InfoSeekers attend ECIR 2019!

InfoSeekers attend ECIR 2019!

InfoSeeker Souvick Ghosh attended the 41st Annual European Conference on Information Retrieval in Cologne, German. Souvick presented “Exploring Result Presentation in Conversational IR using a Wizard-of-Oz Study” at ECIR as part of the Doctoral Consortium.

Souvick Ghosh (center) Doctoral Consortium group photo at the 41st Annual European Conference on Information Retrieval (ECIR 2019.)

Souvick Ghosh presented his work that reflects on recent researches in conversational IR that have explored problems related to context enhancement, question-answering, and query reformulations. His work focused on result presentation over audio channels. The linear and transient nature of speech makes it cognitively challenging for the user to process a large amount of information. Presenting the search results (from SERP) is equally challenging, as it is not feasible to read out the list of results. He proposes a study to evaluate the users’ preference of modalities when using conversational search systems. The study aims to understand how results should be presented in a conversational search system. Through observation of how users search using audio queries, interact with the intermediary, and process the results presented, insight can be developed on how to present results more efficiently in a conversational search setting. Additionally, there are plans to explore the effectiveness and consistency of different media in a conversational search setting. Observations in this work will inform future designs and help to create a better understanding of such systems. 

Souvick had a few words to reflect on his experience at ECIR 2019: “I was lucky to have Dr. Udo Kruschwitz as my mentor and we had some great discussions about my dissertation ideas, research in general, and the life of a Ph.D. student. It also gave me the opportunity to catch up with some old friends in Europe and make some new ones.”