How can we bring more fairness to search result ranking? This was the question tackled by our FATE (Fairness Accountability Transparency Ethics) group in the 2020 Text REtrieval Conference’s (TREC) Fairness Ranking Track. In the context of searching for academic papers, the assigned goal of the track was the goal was to develop an algorithm that provides fair exposure to different groups of authors while ensuring that the papers are relevant to the search queries.
To achieve that goal, the group decided to use “gender” and “country” as key attributes because they were general enough to be applied to all author groups. From there, the group created an fairness-aware algorithm that was used to run two specific tasks:
An information retrieval task where the goal was to return a ranked list of papers to serve as the candidate papers for re-ranking
Re-ranking task where the goal was to rank the candidate papers based on the relevance to a given query, while accounting for fair author group exposure
To evaluate the relevance of the academic papers, the group relied on BM25, which is an algorithm frequently used by search engines.
By randomly shuffling the academic papers, the result was high levels of fairness if only the gender of the authors was considered. In contrast, if only the country of the authors was considered, fairness was relatively lower. With the proposed algorithm, data can be re-ranked based on an arbitrary number of group definitions. However, to fully provide fair and relevant results, more attributes need to be explored.
Why is fairness in search rankings important?
We use search engines everyday to find out information and answers for almost everything in our lives. And the ranking of the search results determine what kind of content we are likely to consume. This poses a risk because ranking algorithms often leave out the underrepresented groups, whether it’s a small business, or a research lab that is not established yet. At the same time, the results tend to only show information we like to see or agree with, which could lack diversity and contribute to bias.
Our lab director, Prof. Chirag Shah, is receiving the Microsoft BCS/BCS IRSG Karen Spärck Jones Award (KSJ Award) 2019 and he is giving a keynote this Wednesday at the 42nd European Conference on Information Retrieval (ECIR 2020).
About the KSJ Award
KSJ Award is created by The British Computer Society Information Retrieval Specialist Group (BCS IRSG) in conjunction with the BCS sine 2008. The award also sponsored by Microsoft Research. See more detail at https://irsg.bcs.org/ksjaward.php
About the keynote
“Task-Based Intelligent Retrieval and Recommendation”
While the act of looking for information happens within a context of a task from the user side, most search and recommendation systems focus on user actions (‘what’), ignoring the nature of the task that covers the process (‘how’) and user intent (‘why’). For long, scholars have argued that IR systems should help users accomplish their tasks and not just fulfill a search request. But just as keywords have been good enough approximators for information need, satisfying a set of search requests has been deemed to be good enough to address the task. However, with changing user behaviors and search modalities, specifically found in conversational interfaces, the challenge and opportunity to focus on task have become critically important and central to IR. In this talk, I will discuss some of the key ideas and recent works — both theoretical and empirical — to study and support aspects of task. I will show how we could derive user’s search path or strategy and intentions, and how they could be instrumental in not only creating more personalized search and recommendation solutions, but also solving problems not possible otherwise. Finally, I will extend this to the realm of intelligent assistants with our recent work in a new area called Information Fostering, where our knowledge of the user and the task can help us address another classical problem in IR — people don’t know what they don’t know.
To tackle the complex search tasks, we try to identify task’ states and study the connection between states and search behaviors. We found that the task state can be predicted from the user’s search behaviors. Read about our study in this article by Jiqun Liu, Shawon Sarkar, and Chirag Shah:
Complex search tasks that involve uncertain solution space and multi-round search iterations are integral to everyday life and information-intensive workplace practices, affecting how people learn, work, and resolve problematic situations. However, current search systems still face plenty of challenges when applied in supporting users engaging in complex search tasks. To address this issue, we seek to explore the dynamic nature of complex search tasks from process-oriented perspective by identifying and predicting implicit task states. Specifically, based upon the Web search logs and user annotation data (regarding information seeking intentions in local search steps, in-situ search problems, and help needed) collected from 132 search sessions in two controlled lab studies, we developed two task state frameworks based on intention state and problem-help state respectively and examined the connection between task states and search behaviors. We report that (1) complex search tasks of different types can be deconstructed and disambiguated based on the associated nonlinear state transition patterns; and (2) the identified task states that cover multiple subtle factors of user cognition can be predicted from search behavioral signals using supervised learning algorithms. This study reveals the way in which complex search tasks are unfolded and manifested in users’ search interactions and paves the way for developing state-aware adaptive search supports and system evaluation frameworks.
InfoSeekers Publish an Article and an ICTIR Paper Acceptance
First, we must congratulate InfoSeeker, Ruoyuan Gao for having her
paper accepted at ICTIR 2019!
Next, we are excited to share the news that our InfoSeekers, Shawon
Sarkar, Matthew Mitsui, Jiqun Liu and Chirag Shah, have published a new article!
The title of the article is: Implicit information need as explicit problems, help,
and behavioral signals.
Information need is one of the most fundamental
aspects of information seeking, which traditionally conceptualizes as the
initiation phase of an individual’s information seeking behavior. However, the
very elusive and inexpressible nature of information need makes it hard to
elicit from the information seeker or to extract through an automated process.
One approach to understanding how a person realizes and expresses information
need is to observe their seeking behaviors, to engage processes with
information retrieval systems, and to focus on situated performative actions.
Using Dervin’s Sense-Making theory and conceptualization of information need
based on existing studies, the work reported here tries to understand and
explore the concept of information need from a fresh methodological perspective
by examining users’ perceived barriers and desired helps in different stages of
information search episodes through the analyses of various implicit and
explicit user search behaviors. In a controlled lab study, each participant
performed three simulated online information search tasks. Participants’
implicit behaviors were collected through search logs, and explicit feedback
was elicited through pre-task and post-task questionnaires. A total of 208 query
segments were logged, along with users’ annotations on perceived problems and
help. Data collected from the study was analyzed by applying both quantitative
and qualitative methods. The findings identified several behaviors – such as
the number of bookmarks, query length, number of the unique queries, time spent
on search results observed in the previous segment, the current segment, and
throughout the session – strongly associated with participants’ perceived
barriers and help needed. The findings also showed that it is possible to build
accurate predictive models to infer perceived problems of articulation of
queries, useless and irrelevant information, and unavailability of information
from users’ previous segment, current segment, and whole session behaviors. The
findings also demonstrated that by combining perceived problem(s) and search
behavioral features, it was possible to infer users’ needed help(s) in search
with a certain level of accuracy (78%).
Information need Information searching Interactive IR
InfoSeeker Souvick Ghosh attended the 41st Annual European Conference on Information Retrieval in Cologne, German. Souvick presented “Exploring Result Presentation in Conversational IR using a Wizard-of-Oz Study” at ECIR as part of the Doctoral Consortium.
Souvick Ghosh presented his work that
reflects on recent researches in conversational IR that have explored problems
related to context enhancement, question-answering, and query reformulations. His
work focused on result presentation over audio channels. The linear and
transient nature of speech makes it cognitively challenging for the user to
process a large amount of information. Presenting the search results (from
SERP) is equally challenging, as it is not feasible to read out the list of
results. He proposes a study to evaluate the users’ preference of modalities
when using conversational search systems. The study aims to understand how
results should be presented in a conversational search system. Through
observation of how users search using audio queries, interact with the
intermediary, and process the results presented, insight can be developed on
how to present results more efficiently in a conversational search setting. Additionally,
there are plans to explore the effectiveness and consistency of different media
in a conversational search setting. Observations in this work will inform
future designs and help to create a better understanding of such systems.
Souvick had a few words to reflect on his experience at ECIR 2019: “I was lucky to have Dr. Udo Kruschwitz as my mentor and we had some great discussions about my dissertation ideas, research in general, and the life of a Ph.D. student. It also gave me the opportunity to catch up with some old friends in Europe and make some new ones.”
InfoSeekers attend the New Jersey Big Data Alliance Symposium!
InfoSeeker Matthew Mitsui attended the 6th Annual New Jersey Big Data Alliance Symposium. The title of the symposium this year was The Future of Big Data: Artificial Intelligence and Machine Learning, and it was hosted at New Jersey City University.
Matthew Mitsui presented “Multi-Faceted Information Seeking Leveraging Big Data” at the symposium. It was co-authored by some of our other InfoSeekers: Souvick Ghosh, Ruoyuan Gao, and Chirag Shah.
Their presentation addressed the complexities of the search process and the multitude of obstacles and issues an information seeker can encounter; viz., information task and resource limitations; information quality; information bias. They identified that those obstacles are often presented to the user through the tools employed during the search process, and their aim was to explore how search tools can be improved in order to foster collaboration with the user and surmount these obstacles. They addressed the need for search tools that can assist the user through three primary approaches: search task assistance, assessing information quality, and counterbalancing bias.
This month, some of our InfoSeekers attended the 2019 annual CHIIR conference in Glasgow, Scotland. Here are some of the highlights!
Rutgers University InfoSeeking students, Jiqun Liu, Souvick Ghosh, and Diana Soltari attended CHIIR, along with InfoSeekers Chirag Shah and Matthew Mitsui.
InfoSeeker, Diana Soltari presented Coagmento 3.0, which is an interactive web application that allows researchers to prototype web-search behavior studies through a GUI. The demonstration presented the front-end administrative functionality of Coagmento, including, but not limited to, stage and questionnaire creation.
InfoSeeker, Jiqun Liu presented several papers this year at CHIIR! He presented one full paper, one short paper, and one doctoral consortium paper. Jiqun’s papers reported user studies on the interactions between task, information seeking intentions, and user search behavior in information seeking episodes.
InfoSeekers Attend ASIS&T and CSCW November Conferences!
This month, some of our InfoSeekers attended the 2018 annual ASIS&T conference in Vancouver and the annual CSCW in Jersey City, NJ. Here are some highlights!
Rutgers University InfoSeeking students, Jiquin Liu, Soumick Mandal, and Yiwei Wang attended CSCW
(Conference on Computer-Supported Cooperative Work and Social Computing) in Jersey City, NJ on Nov 3 – 7. The team presented their poster titled, “Persuasion by Peer or Expert for Web Search.” They presented their preliminary findings on the persuasiveness of two sources of search advice, cognitive authority and peer advice, and their influences on search behaviors.
Next, our InfoSeekers attended ASIS&T in Vancouver on Nov 10 – 14. Souvick Ghosh and Manasa Rath participated in SIGInfoLearn workshop. They discussed relevant research that supports searching as learning.
Yiwei Wang also attended ASIS&T and co-organized this year’s SIG-USE Symposium with Annie Chen (University of Washington), Melissa Ocepek (University of Illinois Urbana Champaign), and Devendra Potnis (University of Tennessee at Knoxville). The SIG-USE Symposium is an annual workshop held by ASIS&T Special Interest Group on Information Needs, Seeking, and Use and it focuses on the behavioral and cognitive activities of users, and their affective states as they interact with information. The theme this year was Moving Toward the Future of Information Behavior Research and Practice. It was an engaging and inspiring event, and they had 42 participants this year!
Finally, we are very excited to announce that Manasa Rath received the New Leader Award at ASIS&T. She was one of six students to receive the award. Additionally, Manasa will be working with ASIST Board of Directors in the Professional Development committee. Congratulations, Manasa!
This year, Schloss Dagstuhl – Leibniz Center for Informatics in Saarland, Germany hosted another successful Autumn School for Information Retrieval and Information Foraging (ASIRF). The event took place between September 16-21, 2018. Sponsored by the ACM Special Interest Group on Information Retrieval (SIGIR), German Informatics Society (GI), Center for Informatics Research and Technology (CIRT), and the University of Trier, ASIRF 2018 attracted twenty graduate students with diverse backgrounds from five countries. There were nine lectures given by leading academics in their respective fields. These lectures provided the attendees an opportunity to advance their knowledge in the domains of Information Retrieval (IR), Interactive Information Retrieval (IIR), Human Information Behavior, Collaborative IR, and Computational Social Science. Also, they had ample time to expand their research networks at a remote, serene area in the German countryside.
On Sunday, I arrived at Schloss Dagstuhl after an eighteen-hour-long journey from New York which included a flight from New York to Frankfurt, a scenic train ride from Frankfurt to St. Wendel through the hilly regions of Hesse, Rhineland-Palatinate, and Saarland, a short bus trip in the middle of farms and wineries to Dagstuhl Bahnhofstrasse, and finally a walk to Schloss Dagstuhl. However, the weariness of the journey vanished immediately in the prospect of spending the whole week in a single room on the top floor of a eighteenth-century old manor house. The evening started with a delicious meal and ended with the keynote lecture by Dr. Michael Lay on the evolution of the conceptual data model of dblp Computer Science bibliography system.
Throughout the week, leading scholars in the field delivered a range of lectures and presentations that covered a broad spectrum of topics related to information retrieval and information seeking behavior. On Monday, Dr. Ingo Frommholz provided an introduction to IR models – from Vector Space Model to some more advanced information retrieval models. Dr. Ralf Schenkel offered a foundation course on IR systems and efficiency methods of query processing. The lectures on theoretical foundations of system-centric IR were followed up with a comprehensive presentation by Prof. Dr. Norbert Fuhr on techniques for modeling Interactive IR and Information Seeking and Searching Behaviors. He presented an overview of the Probability Ranking Principle for Interactive IR, Markov models, Hidden Markov models, and parameter estimation based on Markov models. He also introduced the participants to the concept of information seeking and searching, some well-known cognitive search models such as Ellis’ behavioral model of information seeking strategies, Bates’ Berrypicking model, Belkin’s ASK model, Ingwersen’s Cognitive model, and user interface design based on cognitive models.
The next day, Dr. Pia Borlund presented a comprehensive overview of Interactive IR research field starting from how IIR is related to IR and information seeking to IIR evaluation model and simulated task situations. Finally, Dr. Borlund walked us through her process of designing and conducting IIR user studies. Later, Dr. Christa Womser-Hacker provided a thorough introduction to information behavior and various information seeking models including information foraging theory. The following lecture was given by Dr. Andreas Henrich on digital research infrastructures for the humanities and social sciences. Dr. Henrich discussed challenges and opportunities for retrieval systems and specialized databases in support of digitally-enabled research and teaching in the arts and humanities.
Later in the week, Dr. Henning Wachsmuth presented the argumentation analysis and retrieval using natural language processing (NLP). He introduced the key concepts of argument mining, assessment, and retrieval, and highlighted its applications in various domains such as intelligent personal assistants and fact-checking. On a similar note, Dr. Ahmet Aker started off by providing an overview of fake news and online rumors and explaining how these can be detected by using classic machine learning methods such as support vector machines, decision trees, k-nearest neighbor classifier, and deep neural networks (CNN and LSTM). Then, he offered a practical tutorial with case studies on deep learning including a brief overview of programming interfaces like TensorFlow and Keras where we had opportunities to use Keras and apply RNN and LSTM to real research problems.
The last tutorial of the ASIRF 2018 was presented by Dr. Stefanie Elbeshausen on Collaborative Information Seeking (CIS). Dr. Elbeshausen introduced the topic with defining collaboration and its characteristics, collaborative browsing and seeking. She also covered some of the prominent CIS systems including our own Coagmento.
Apart from the tutorials, there were also opportunities to have hands-on experiences in applying those learned concepts in practice. The participants were divided into small groups and assigned tasks to propose a new ranking algorithm, design user interface for an IR system or, raise an empirical research question, and then they presented their ideas in front of fellow students. The assignment helped us to apply the concepts and techniques that we had learned about throughout the week. Moreover, all participants had the opportunity to give a presentation of their work.
Other than these scholarly activities, we also enjoyed the cordial hospitality of the Schloss Dagstuhl staff, including randomized seating arrangements during our delicious meals and wine and cheese events at the wine cellar, which facilitated further discussion opportunities among students and faculties. We also had a chance to explore the beautiful area around Schloss Dagstuhl. We enjoyed a hike through the woods to the ruins of the old Dagstuhl castle on top of the hill. We also went to a guided tour of Trier, the oldest city in Germany and birthplace of Karl Marx, followed by a pleasant visit to an old winery to taste some of the best wines from the famous Mosel wine region.
For me, the Autumn School was a very productive and pleasant experience. It helped me to enhance my knowledge in information seeking and retrieval, establish valuable research connections, and make new friends. My experience would not have been possible without the generous grants from the ACM Special Interest Group on Information Retrieval (SIGIR), and our PhD program at the School of Communication and Information.
Activities included travel, classes, lab meetings and socializing!
Where does one begin to describe the summer of 2018? Chirag summed it up when he said, “We had a fantastic, fun, and productive summer. I think even having lab meetings every week throughout the summer is an achievement. We learned a lot from each other and had fun doing so. InfoSeekers have won awards, presented papers, and traveled to different corners of the world. Even our alumni have done some wonderful things.”
The following captures just the highlights of InfoSeekers at work and play, keeping things interesting as they moved their studies forward.
Manasa Rath went to summer school in Los Angeles, and her team rated runner-up status for an award for a project for “Summer Methods Course on Computational Social Sciences.” Before attending the course, Manasa had scored full funding for her travels, accommodation and other support. (Only 11 percent of those who apply for this support receive it.) While there, she met other graduate students from the U.S. and Europe who were learning about automated textual analysis. Her team’s project concerned using word embeddings to measure ethnic stereotypes from various news corpora, including NPR (National Public Radio) and The New York Times.
Meanwhile, Souvick Ghosh did a ten-week internship as part of the LEADS-4-NDP (National Digital Platform) Fellowship Program. Each intern in the program worked with different industry partners focusing on data science problems. Vic collaborated with OCLC Research to cluster publisher names using MARC records. (OCLC is the global library cooperative that provides shared technology services; MARC stands for Machine-Readable Catalog and has provided the national standard for the description of items for the digital catalog for libraries since 1971.) In their internship work they attempted to cluster instances of MARC records that contain different information such as the title of a book, the author, the publisher, ISBN number, etc. The idea was to cluster the instances of same publisher entities, exploring different hashing and machine-learning approaches, additionally evaluating the relative importance of various features for classifying entities.
In other updates, Jiqun Liu and Shawon Sarkar started the recruitment phase for a study on people’s search experience and preferred supports in information seeking, the purpose of which is to improve Web search. So far, four people have completed the study. Recruitment and running the study will likely continue through mid-October.
InfoSeeking Lab Director and all-around inspired leader Chirag Shah did his share of travel this summer including a visit to Ryerson University in Toronto, where he gave a talk about data and algorithmic biases. (See his August 6 blog.) But the real fun was being able to finish his goal of making it to all 50 of the states in the U.S.
Please be sure to scroll all the way down to see the fun capper snapshot!
By the way, have a wonderful fall semester, InfoSeekers, and a very Happy Birthday to Chirag!