Panel featuring Kimon Drakopoulos (University of Southern California), Moon Duchin (Tufts University), Philip LeClerc (U.S. Census Bureau), Samir Shah (VolunteerMatch), Alex Teytelboym (University of Oxford); moderated by Vahideh Manshadi (Yale University).
Prediction models in healthcare are being utilized for many tasks. However, the use of these models for medical decision-making warrants special considerations that are less critical when prediction models are used in other domains. Two of these considerations, which we will discuss in the talk, are fairness and explainability. We will discuss these considerations from the viewpoint of a large healthcare organization that uses prediction models ubiquity on a daily basis. We will also describe how academic collaborations can expand our toolbox for handling these issues in practice.
As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, why these disagreements occur, and how to address these disagreements in a rigorous fashion. However, there is little to no research that provides answers to these critical questions. In this talk, I will present some of our recent research which addresses the aforementioned questions. More specifically, I will discuss i) a novel quantitative framework to formalize the disagreement between state-of-the-art feature attribution based explanation methods (e.g., LIME, SHAP, Gradient based methods). I will also touch upon on how this framework was constructed by leveraging inputs from interviews and user studies with data scientists who utilize explanation methods in their day-to-day work; ii) an online user study to understand how data scientists resolve disagreements in explanations output by the aforementioned methods; iii) a novel function approximation framework to explain why explanation methods often disagree with each other. I will demonstrate that all the key feature attribution based explanation methods are essentially performing local function approximations albeit, with different loss functions and notions of neighborhood. (iv) a set of guiding principles on how to choose explanation methods and resulting explanations when they disagree in real-world settings. I will conclude this talk by presenting a brief overview of an open source framework that we recently developed called Open-XAI which enables researchers and practitioners to seamlessly evaluate and benchmark both existing and new explanation methods based on various characteristics such as faithfulness, stability, and fairness.
We introduce the pipeline intervention problem, defined by a layered directed acyclic graph and a set of stochastic matrices governing transitions between successive layers. The graph is a stylized model for how people from different populations are presented opportunities, eventually leading to some reward. In our model, individuals are born into an initial position (i.e. some node in the first layer of the graph) according to a fixed probability distribution, and then stochastically progress through the graph according to the transition matrices, until they reach a node in the final layer of the graph; each node in the final layer has a reward associated with it. The pipeline intervention problem asks how to best make costly changes to the transition matrices governing people's stochastic transitions through the graph, subject to a budget constraint. We consider two objectives: social welfare maximization, and a fairness-motivated maximin objective that seeks to maximize the value to the population (starting node) with the least expected value. We consider two variants of the maximin objective that turn out to be distinct, depending on whether we demand a deterministic solution or allow randomization. For each objective, we give an efficient approximation algorithm (an additive FPTAS) for constant width networks. We also tightly characterize the "price of fairness" in our setting: the ratio between the highest achievable social welfare and the highest social welfare consistent with a maximin optimal solution. Finally we show that for polynomial width networks, even approximating the maximin objective to any constant factor is NP hard, even for networks with constant depth. This shows that the restriction on the width in our positive results is essential.
Algorithmic decision-making in societal contexts, such as retail pricing, loan administration, recommendations on online platforms, etc., often involves experimentation with decisions for the sake of learning, which results in perceptions of unfairness among people impacted by these decisions. It is hence necessary to embed appropriate notions of fairness in such decision-making processes. The goal of this paper is to highlight the rich interface between temporal notions of fairness and online decision-making through a novel meta-objective of ensuring fairness at the time of decision. Given some arbitrary comparative fairness notion for static decision-making (e.g., students should pay at most 90% of the general adult price), a corresponding online decision-making algorithm satisfies fairness at the time of decision if the said notion of fairness is satisfied for any entity receiving a decision in comparison to all the past decisions. We show that this basic requirement introduces new methodological challenges in online decision-making. We illustrate the novel approaches necessary to address these challenges in the context of stochastic convex optimization with bandit feedback under a comparative fairness constraint that imposes lower bounds on the decisions received by entities depending on the decisions received by everyone in the past. The talk will showcase some novel research opportunities in online decision-making stemming from temporal fairness concerns. This is based on joint work with Vijay Kamble and Jad Salem.
The current refugee resettlement system is inefficient because there are too few resettlement places and because refugees are resettled to locations where they might not thrive. I will overview some recent efforts to improve employment outcomes of refugees arriving to the United States. I will then describe some recent efforts to incorporate refugees' preferences in processes that match them to locations.
The U.S. Census Bureau adopted formally private methods to protect the principal products released based on the 2020 Decennial Census of Population and Housing. These include the Public Law 94-171 Redistricting Data Summary File (already released), the Demographic and Housing Characteristics File (DHC; in its final phase of privacy budget tuning), as well as the Detailed Demographic and Housing Characteristics File and Supplemental Demographic and Housing Characteristics File releases (in earlier phases of design, testing, and planning). Additional, smaller product releases based on the 2020 confidential data are also expected, with sub-state releases currently required to use differentially private methods. In this talk, I describe the design and a few of the major technical issues encountered in developing the TopDown algorithm (TDA), the principal formally private algorithm used to protect the PL94-171 release, and expected to be used to protect the DHC release. TDA was designed by a joint team of academic, contractor and government employees; I discuss the ways in which this collaboration worked, as well as what worked well and what was challenging, and briefly touch on the role of industry in algorithm design outside of TDA. I close with some general thoughts on ways to help form productive collaborations between academic, government, and industry expertise in formally private methods.
Kerfuffle (/kərˈfəfəl/): a commotion or fuss, especially one caused by conflicting views. "There was a kerfuffle over the use of differential privacy for the 2020 Census." This talk will give a too-brief introduction to some of the issues that played out in tweets, court proceedings, and academic preprints. We'll also discuss approaches and challenges to understanding the effect of differential privacy on downstream policy.
Assuming no particular background, I'll give a high-level introduction to the problem of electoral redistricting in the U.S. and the helpful and not-so-helpful ways that algorithmic district generation has intervened on law and policy.
This talk will set up the following talk, in which Aloni Cohen will talk about the panic about differential privacy in the redistricting data.
Panel featuring Kimon Drakopoulos (University of Southern California), Moon Duchin (Tufts University), Philip LeClerc (U.S. Census Bureau), Samir Shah (VolunteerMatch), Alex Teytelboym (University of Oxford); moderated by Vahideh Manshadi (Yale University).
Vahideh Manshadi is an Associate Professor of Operations at Yale School of Management. She is also affiliated with the Yale Institute for Network Science, the Department of Statistics and Data Science, and the Cowles Foundation for Research in Economics. Her current research focuses on the operation of online and matching platforms in both the private and public sectors. Professor Manshadi serves on the editorial boards of Management Science, Operations Research, and Manufacturing & Service Operations Management. She received her Ph.D. in electrical engineering at Stanford University, where she also received MS degrees in statistics and electrical engineering. Before joining Yale, she was a postdoctoral scholar at the MIT Operations Research Center.
Alex Teytelboym is an Associate Professor at the Department of Economics, University of Oxford, a Tutorial Fellow at St. Catherine’s College, and a Senior Research Fellow at the Institute for New Economic Thinking at the Oxford Martin School. His research interests lie in market design and the economics of networks, as well as their applications to environmental economics and energy markets. His policy work has been on designing matching systems for refugee resettlement and environmental auctions. He is co-founder of Refugees.AI, an organization that is developing new technology for refugee resettlement.
Kimon Drakopoulos is the Robert R. Dockson Assistant Professor in Business Administration at the Data Sciences and Operations department at USC Marshall School of Business. His research focuses on the operations of complex networked systems, social networks, stochastic modeling, game theory and information economics. In 2020 he served as the Chief Data Scientist of the Greek National COVID-19 Scientific taskforce and a Data Science and Operations Advisor to the Greek Prime Minister. He has been awarded the Wagner Prize for Excellence in Applied Analytics and the Pierskalla Award for contributions to Healthcare Analytics.
Moon Duchin is a Professor of Mathematics at Tufts University, and runs the MGGG Redistricting Lab, an interdisciplinary research group at Tisch College of Civic Life of Tufts University. The lab's research program centers on Data For Democracy, bridging math, CS, geography, law, and policy to build models of elections and redistricting. She has worked to support commissions, legislatures, and other line-drawing bodies and has served as an expert witness in redistricting cases around the country.
Philip Leclerc is an operations research analyst working in the Center for Enterprise Dissemination-Disclosure Avoidance (CEDDA) at the U.S. Census Bureau. He graduated with a B.A. in mathematical economics and psychology from Christopher Newport University, and later completed his Ph.D. in Systems Modeling and Analysis at Virginia Commonwealth University. He joined the U.S. Census Bureau 6 years ago, where he first learned about differential privacy, and for the last 5 years has served as the internal scientific lead on the project for modernizing the disclosure avoidance system used in the first two major releases from the Decennial Census.
Samir Shah is Vice President, Partnerships & Customer Success at VolunteerMatch, where, for over a decade, he has contributed to a vision of developing the global digital volunteering backbone. Samir uses technology, networks, and data to empower volunteers, nonprofits, governments, companies, and brands to create value from VolunteerMatch’s products and services. He has negotiated complex partnerships with Fidelity, California Volunteers, Office of the Governor, and STEM Next, and manages trusted relationships with VolunteerMatch’s Open API Network of third party platform partners. Samir has a BA in Economics from the UT, Austin, a MA in Asian Studies from the UC, Berkeley, and an MBA from the Haas School of Business.
Deep learning algorithms that achieve state-of-the-art results on image and text recognition tasks tend to fit the entire training dataset (nearly) perfectly including mislabeled examples and outliers. This propensity to memorize seemingly useless data and the resulting large generalization gap have puzzled many practitioners and is not explained by existing theories of machine learning. We provide a simple conceptual explanation and a theoretical model demonstrating that memorization of outliers and mislabeled examples is necessary for achieving close-to-optimal generalization error when learning from long-tailed data distributions. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. We then demonstrate the utility of memorization and support our explanation empirically. These results rely on a new technique for efficiently estimating memorization and influence of training data points. Our results allow us to quantify the cost of limiting memorization in learning and explain the disparate effects that privacy and model compression have on different subgroups.
We study the concurrent composition properties of interactive differentially private mechanisms, whereby an adversary can arbitrarily interleave its queries to the different mechanisms. We prove that all composition theorems for non-interactive differentially private mechanisms extend to the concurrent composition of interactive differentially private mechanisms for all standard variants of differential privacy including $(\eps,\delta)$-DP with $\delta>0$, R\`enyi DP, and $f$-DP, thus answering the open question by \cite{vadhan2021concurrent}. For $f$-DP, which captures $(\eps,\delta)$-DP as a special case, we prove the concurrent composition theorems by showing that every interactive $f$-DP mechanism can be simulated by interactive post-processing of a non-interactive $f$-DP mechanism. For R\`enyi DP, we use a different approach by showing the optimal adversary against the concurrent composition can be decomposed as a product of the optimal adversaries against each interactive mechanism.
In this talk I will review some of the psychological and economic factors influencing consumers’ desire and ability to manage their privacy effectively. Contrary to depictions of online sharing behaviors as careless, consumers fundamentally care about online privacy, but technological developments and economic forces have made it prohibitively difficult to attain desired, or even desirable, levels of privacy through individual action alone. The result does not have to be what some have called "digital resignation" though: a combination of individual and institutional efforts can change what seems to be the inevitability of the death of privacy into effective privacy protection.
Panel featuring Kimon Drakopoulos (University of Southern California), Moon Duchin (Tufts University), Philip LeClerc (U.S. Census Bureau), Samir Shah (VolunteerMatch), Alex Teytelboym (University of Oxford); moderated by Vahideh Manshadi (Yale University).