This page provides more information on the Longtermism Fund’s ($110,000) grant to support Professor Martin Wattenberg and Professor Fernanda Viégas to develop their AI interpretability work at Harvard University.
This grant will fund research into methods for looking at the inner workings of an AI system and inferring from this why the AI system produces the results it does. (This is often called mechanistic interpretability.)
Currently, we have almost no ability to understand why the most powerful systems make the decisions they make. They are mostly a “black box”. For example:
This grant supports work to change this. Specifically, it will support Prof. Wattenberg and Prof. Viégas to:
Concretely, most of this funding will help them to (A) set up a compute cluster needed to work on these projects at the cutting edge of AI, and (B) hire graduate students or possibly postdoctoral fellows.
One of Prof. Wattenberg and Prof. Viégas’ dreams is to make possible dashboards that display what an AI system believes about its user and about itself. This is very ambitious, but would have a wide range of benefits. Notably for the Fund’s goals, progress toward this goal could help AI developers to detect unwanted processes within AI systems, such as deception. Perhaps it could even lead to a more reliable understanding of what goals systems have learned. The feasibility of this type of progress remains to be seen.
Longview Philanthropy — one of Giving What We Can’s trusted evaluators — recommended this grant after investigating its strength compared to other projects. This investigation involved:
Please note that GWWC does not evaluate individual charities. Our recommendations are based on the research of third-party, impact-focused charity evaluators our research team has found to be particularly well-suited to help donors do the most good per dollar, according to their recent evaluator investigations. Our other supported programs are those that align with our charitable purpose — they are working on a high-impact problem and take a reasonably promising approach (based on publicly-available information).
At Giving What We Can, we focus on the effectiveness of an organisation's work -- what the organisation is actually doing and whether their programs are making a big difference. Some others in the charity recommendation space focus instead on the ratio of admin costs to program spending, part of what we’ve termed the “overhead myth.” See why overhead isn’t the full story and learn more about our approach to charity evaluation.