## Latent Variables and Model Mis-specification

Machine learning is very good at optimizing predictions to match an observed signal — for instance, given a dataset of input images and labels of the images (e.g. dog, cat, etc.), machine learning is very good at correctly predicting the label of a new image. However, performance can quickly break down as soon as we care about criteria other than predicting observables. There are several cases where we might care about such criteria:

• In scientific investigations, we often care less about predicting a specific observable phenomenon, and more about what that phenomenon implies about an underlying scientific theory.
• In economic analysis, we are most interested in what policies will lead to desirable outcomes. This requires predicting what would counterfactually happen if we were to enact the policy, which we (usually) don’t have any data about.
• In machine learning, we may be interested in learning value functions which match human preferences (this is especially important in complex settings where it is hard to specify a satisfactory value function by hand). However, we are unlikely to observe information about the value function directly, and instead must infer it implicitly. For instance, one might infer a value function for autonomous driving by observing the actions of an expert driver.

In all of the above scenarios, the primary object of interest — the scientific theory, the effects of a policy, and the value function, respectively — is not part of the observed data. Instead, we can think of it as an unobserved (or “latent”) variable in the model we are using to make predictions. While we might hope that a model that makes good predictions will also place correct values on unobserved variables as well, this need not be the case in general, especially if the model is mis-specified.

I am interested in latent variable inference because I think it is a potentially important sub-problem for building AI systems that behave safely and are aligned with human values. The connection is most direct for value learning, where the value function is the latent variable of interest and the fidelity with which it is learned directly impacts the well-behavedness of the system. However, one can imagine other uses as well, such as making sure that the concepts that an AI learns sufficiently match the concepts that the human designer had in mind. It will also turn out that latent variable inference is related to counterfactual reasoning, which has a large number of tie-ins with building safe AI systems that I will elaborate on in forthcoming posts.

The goal of this post is to explain why problems show up if one cares about predicting latent variables rather than observed variables, and to point to a research direction (counterfactual reasoning) that I find promising for addressing these issues. More specifically, in the remainder of this post, I will: (1) give some formal settings where we want to infer unobserved variables and explain why we can run into problems; (2) propose a possible approach to resolving these problems, based on counterfactual reasoning.

## 1 Identifying Parameters in Regression Problems

Suppose that we have a regression model $p_{\theta}(y \mid x)$, which outputs a probability distribution over $y$ given a value for $x$. Also suppose we are explicitly interested in identifying the “true” value of $\theta$ rather than simply making good predictions about $y$ given $x$. For instance, we might be interested in whether smoking causes cancer, and so we care not just about predicting whether a given person will get cancer ($y$) given information about that person ($x$), but specifically whether the coefficients in $\theta$ that correspond to a history of smoking are large and positive.

In a typical setting, we are given data points $(x_1,y_1), \ldots, (x_n,y_n)$ on which to fit a model. Most methods of training machine learning systems optimize predictive performance, i.e. they will output a parameter $\hat{\theta}$ that (approximately) maximizes $\sum_{i=1}^n \log p_{\theta}(y_i \mid x_i)$. For instance, for a linear regression problem we have $\log p_{\theta}(y_i \mid x_i) = -(y_i - \langle \theta, x_i \rangle)^2$. Various more sophisticated methods might employ some form of regularization to reduce overfitting, but they are still fundamentally trying to maximize some measure of predictive accuracy, at least in the limit of infinite data.

Call a model well-specified if there is some parameter $\theta^*$ for which $p_{\theta^*}(y \mid x)$ matches the true distribution over $y$, and call a model mis-specified if no such $\theta^*$ exists. One can show that for well-specified models, maximizing predictive accuracy works well (modulo a number of technical conditions). In particular, maximizing $\sum_{i=1}^n \log p_{\theta}(y_i \mid x_i)$ will (asymptotically, as $n \to \infty$) lead to recovering the parameter $\theta^*$.

However, if a model is mis-specified, then it is not even clear what it means to correctly infer $\theta$. We could declare the $\theta$ maximizing predictive accuracy to be the “correct” value of $\theta$, but this has issues:

1. While $\theta$ might do a good job of predicting $y$ in the settings we’ve seen, it may not predict $y$ well in very different settings.
2. If we care about determining $\theta$ for some scientific purpose, then good predictive accuracy may be an unsuitable metric. For instance, even though margarine consumption might correlate well with (and hence be a good predictor of) divorce rate, that doesn’t mean that there is a causal relationship between the two.

The two problems above also suggest a solution: we will say that we have done a good job of inferring a value for $\theta$ if $\theta$ can be used to make good predictions in a wide variety of situations, and not just the situation we happened to train the model on. (For the latter case of predicting causal relationships, the “wide variety of situations” should include the situation in which the relevant causal intervention is applied.)

Note that both of the problems above are different from the typical statistical problem of overfitting. Clasically, overfitting occurs when a model is too complex relative to the amount of data at hand, but even if we have a large amount of data the problems above could occur. This is illustrated in the following graph:

Here the blue line is the data we have ($x,y$), and the green line is the model we fit (with slope and intercept parametrized by $\theta$). We have more than enough data to fit a line to it. However, because the true relationship is quadratic, the best linear fit depends heavily on the distribution of the training data. If we had fit to a different part of the quadratic, we would have gotten a potentially very different result. Indeed, in this situation, there is no linear relationship that can do a good job of extrapolating to new situations, unless the domain of those new situations is restricted to the part of the quadratic that we’ve already seen.

I will refer to the type of error in the diagram above as mis-specification error. Again, mis-specification error is different from error due to overfitting. Overfitting occurs when there is too little data and noise is driving the estimate of the model; in contrast, mis-specification error can occur even if there is plenty of data, and instead occurs because the best-performing model is different in different scenarios.

## 2 Structural Equation Models

We will next consider a slightly subtler setting, which in economics is referred to as a structural equation model. In this setting we again have an output $y$ whose distribution depends on an input $x$, but now this relationship is mediated by an unobserved variable $z$. A common example is a discrete choice model, where consumers make a choice among multiple goods ($y$) based on a consumer-specific utility function ($z$) that is influenced by demographic and other information about the consumer ($x$). Natural language processing provides another source of examples: in semantic parsing, we have an input utterance ($x$) and output denotation ($y$), mediated by a latent logical form $z$; in machine translation, we have input and output sentences ($x$ and $y$) mediated by a latent alignment ($z$).

Symbolically, we represent a structural equation model as a parametrized probability distribution $p_{\theta}(y, z \mid x)$, where we are trying to fit the parameters $\theta$. Of course, we can always turn a structural equation model into a regression model by using the identity $p_{\theta}(y \mid x) = \sum_{z} p_{\theta}(y, z \mid x)$, which allows us to ignore $z$ altogether. In economics this is called a reduced form model. We use structural equation models if we are specifically interested in the unobserved variable $z$ (for instance, in the examples above we are interested in the value function for each individual, or in the logical form representing the sentence’s meaning).

In the regression setting where we cared about identifying $\theta$, it was obvious that there was no meaningful “true” value of $\theta$ when the model was mis-specified. In this structural equation setting, we now care about the latent variable $z$, which can take on a meaningful true value (e.g. the actual utility function of a given individual) even if the overall model $p_{\theta}(y,z \mid x)$ is mis-specified. It is therefore tempting to think that if we fit parameters $\theta$ and use them to impute $z$, we will have meaningful information about the actual utility functions of individual consumers. However, this is a notational sleight of hand — just because we call $z$ “the utility function” does not make it so. The variable $z$ need not correspond to the actual utility function of the consumer, nor does the consumer’s preferences even need to be representable by a utility function.

We can understand what goes wrong by consider the following procedure, which formalizes the proposal above:

1. Find $\theta$ to maximize the predictive accuracy on the observed data, $\sum_{i=1}^n \log p_{\theta}(y_i \mid x_i)$, where $p_{\theta}(y_i \mid x_i) = \sum_z p_{\theta}(y_i, z \mid x_i))$. Call the result $\theta_0$.
2. Using this value $\theta_0$, treat $z_i$ as being distributed according to $p_{\theta_0}(z \mid x_i,y_i)$. On a new value $x_+$ for which $y$ is not observed, treat $z_+$ as being distributed according to $p_{\theta_0}(z \mid x_+)$.

As before, if the model is well-specified, one can show that such a procedure asymptotically outputs the correct probability distribution over $z$. However, if the model is mis-specified, things can quickly go wrong. For example, suppose that $y$ represents what choice of drink a consumer buys, and $z$ represents consumer utility (which might be a function of the price, attributes, and quantity of the drink). Now suppose that individuals have preferences which are influenced by unmodeled covariates: for instance, a preference for cold drinks on warm days, while the input $x$ does not have information about the outside temperature when the drink was bought. This could cause any of several effects:

• If there is a covariate that happens to correlate with temperature in the data, then we might conclude that that covariate is predictive of preferring cold drinks.
• We might increase our uncertainty about $z$ to capture the unmodeled variation in $y$.
• We might implicitly increase uncertainty by moving utilities closer together (allowing noise or other factors to more easily change the consumer’s decision).

In practice we will likely have some mixture of all of these, and this will lead to systematic biases in our conclusions about the consumers’ utility functions.

The same problems as before arise: while we by design place probability mass on values of $z$ that correctly predict the observation $y$, under model mis-specification this could be due to spurious correlations or other perversities of the model. Furthermore, even though predictive performance is high on the observed data (and data similar to the observed data), there is no reason for this to continue to be the case in settings very different from the observed data, which is particularly problematic if one is considering the effects of an intervention. For instance, while inferring preferences between hot and cold drinks might seem like a silly example, the design of timber auctions constitutes a much more important example with a roughly similar flavour, where it is important to correctly understand the utility functions of bidders in order to predict their behaviour under alternative auction designs (the model is also more complex, allowing even more opportunities for mis-specification to cause problems).

## 3 A Possible Solution: Counterfactual Reasoning

In general, under model mis-specification we have the following problems:

• It is often no longer meaningful to talk about the “true” value of a latent variable $\theta$ (or at the very least, not one within the specified model family).
• Even when there is a latent variable $z$ with a well-defined meaning, the imputed distribution over $z$ need not match reality.

We can make sense of both of these problems by thinking in terms of counterfactual reasoning. Without defining it too formally, counterfactual reasoning is the problem of making good predictions not just in the actual world, but in a wide variety of counterfactual worlds that “could” exist. (I recommend this paper as a good overview for machine learning researchers.)

While typically machine learning models are optimized to predict well on a specific distribution, systems capable of counterfactual reasoning must make good predictions on many distributions (essentially any distribution that can be captured by a reasonable counterfactual). This stronger guarantee allows us to resolve many of the issues discussed above, while still thinking in terms of predictive performance, which historically seems to have been a successful paradigm for machine learning. In particular:

• While we can no longer talk about the “true” value of $\theta$, we can say that a value of $\theta$ is a “good” value if it makes good predictions on not just a single test distribution, but many different counterfactual test distributions. This allows us to have more confidence in the generalizability of any inferences we draw based on $\theta$ (for instance, if $\theta$ is the coefficient vector for a regression problem, any variable with positive sign is likely to robustly correlate with the response variable for a wide variety of settings).
• The imputed distribution over a variable $z$ must also lead to good predictions for a wide variety of distributions. While this does not force $z$ to match reality, it is a much stronger condition and does at least mean that any aspect of $z$ that can be measured in some counterfactual world must correspond to reality. (For instance, any aspect of a utility function that could at least counterfactually result in a specific action would need to match reality.)
• We will successfully predict the effects of an intervention, as long as that intervention leads to one of the counterfactual distributions considered.

(Note that it is less clear how to actually train models to optimize counterfactual performance, since we typically won’t observe the counterfactuals! But it does at least define an end goal with good properties.)

Many people have a strong association between the concepts of “counterfactual reasoning” and “causal reasoning”. It is important to note that these are distinct ideas; causal reasoning is a type of counterfactual reasoning (where the counterfactuals are often thought of as centered around interventions), but I think of counterfactual reasoning as any type of reasoning that involves making robustly correct statistical inferences across a wide variety of distributions. On the other hand, some people take robust statistical correlation to be the definition of a causal relationship, and thus do consider causal and counterfactual reasoning to be the same thing.

I think that building machine learning systems that can do a good job of counterfactual reasoning is likely to be an important challenge, especially in cases where reliability and safety are important, and necessitates changes in how we evaluate machine learning models. In my mind, while the Turing test has many flaws, one thing it gets very right is the ability to evaluate the accuracy of counterfactual predictions (since dialogue provides the opportunity to set up counterfactual worlds via shared hypotheticals). In contrast, most existing tasks focus on repeatedly making the same type of prediction with respect to a fixed test distribution. This latter type of benchmarking is of course easier and more clear-cut, but fails to probe important aspects of our models. I think it would be very exciting to design good benchmarks that require systems to do counterfactual reasoning, and I would even be happy to incentivize such work monetarily.

Acknowledgements

Thanks to Michael Webb, Sindy Li, and Holden Karnofsky for providing feedback on drafts of this post. If any readers have additional feedback, please feel free to send it my way.

## Individual Project Fund: Further Details

In my post on where I plan to donate in 2016, I said that I would set aside $2000 for funding promising projects that I come across in the next year: The idea behind the project fund is … [to] give in a low-friction way on scales that are too small for organizations like Open Phil to think about. Moreover, it is likely good for me to develop a habit of evaluating projects I come across and thinking about whether they could benefit from additional money (either because they are funding constrained, or to incentivize an individual who is on the fence about carrying the project out). Finally, if this effort is successful, it is possible that other EAs will start to do this as well, which could magnify the overall impact. I think there is some danger that I will not be able to allocate the$2000 in the next year, in which case any leftover funds will go to next year’s donor lottery.

In this post I will give some further details about this fund. My primary goal is to give others an idea of what projects I am likely to consider funding, so that anyone who thinks they might be a good fit for this can get in contact with me. (I also expect many of the best opportunities to come from people that I meet in person but don’t necessarily read this blog, so I plan to actively look for projects throughout the year as well.)

I am looking to fund or incentivize projects that meet several of the criteria below:

• The project is in the area of computer science, especially one of machine learning, cyber security, algorithmic game theory, or computational social choice. [Some other areas that I would be somewhat likely to consider, in order of plausibility: economics, statistics, political science (especially international security), and biology.]
• The project either wouldn’t happen, or would seem less worthwhile / higher-effort without the funding.
• The organizer is someone who either I or someone I trust has an exceptionally high opinion of.
• The project addresses a topic that I personally think is highly important. High-level areas that I tend to care about include international security, existential risk, AI safety, improving political institutions, improving scientific institutions, and helping the global poor. Technical areas that I tend to care about include reliable machine learning, machine learning and security, counterfactual reasoning, and value learning. On the other hand, if you have a project that you feel has a strong case for importance but doesn’t fit into these areas, I am interested in hearing about it.
• It is unlikely that this project or a substantially similar project would be done by someone else at a similar level of quality. (Or, whoever else is likely to do it would instead focus on a similarly high-value project, if this one were to be taken care of.)
• The topic pertains to a technical area that I or someone I trust has a high degree of expertise in, and can evaluate more quickly and accurately than a non-specialized funder.

It isn’t necessary to meet all of the criteria above, but I would probably want most things I fund to meet at least 4 of these 6.

Here are some concrete examples of things I might fund:

• Someone is thinking of doing a project that is undervalued (in terms of career benefits) but would be very useful. They don’t feel excited about allocating time to a non-career-relevant task but would feel more excited if getting an award of $1000 for their efforts. • Someone I trust is starting a new discussion group in an area that I think is important, but can’t find anyone to sponsor it, and wants money for providing food at the meetings. • Someone wants to do an experiment that I find valuable, but needs more compute resources than they have, and could use money for buying AWS hours. • Someone wants to curate a valuable dataset and needs money for hiring mechanical turkers. • Someone is organizing a workshop and needs money for securing a venue. • One project I am particularly interested in is a good survey paper at the intersection of machine learning and cyber security. If you might be interested in doing this, I would likely be willing to pay you. • There are likely many projects in the area of political activism that I would be interested in funding, although (due to crowdedness concerns) I have a particularly high bar for this area in terms of the criteria I laid out above. If you think you might have a project that could use funding, please get in touch with me at jacob.steinhardt@gmail.com. Even if you are not sure if your project would be a good target for funding, I am very happy to talk to you about it. In addition, please feel free to comment either here or via e-mail if you have feedback on this general idea, or thoughts on types of small-project funding that I missed above. ## Donations for 2016 The following explains where I plan to donate in 2016, with some of my thinking behind it. This year, I had$10,000 to allocate (the sum of my giving from 2015 and 2016, which I lumped together for tax reasons; although I think this was a mistake in retrospect, both due to discount rates and because I could have donated in January and December 2016 and still received the same tax benefits).

To start with the punch line: I plan to give $4000 to the EA donor lottery,$2500 to GiveWell for discretionary granting, $2000 to be held in reserve to fund promising projects,$500 to GiveDirectly, $500 to the Carnegie Endowment (earmarked for the Carnegie-Tsinghua Center), and$500 to the Blue Ribbon Study Panel.

For those interested in donating to any of these: instructions for the EA donor lottery and the Blue Ribbon Study Panel are in the corresponding links above, and you can donate to both GiveWell and GiveDirectly at this page. I am looking in to whether it is possible for small donors to give to the Carnegie Endowment, and will update this page when I find out.

At a high level, I partitioned my giving into two categories, which are roughly (A) “help poor people right now” and (B) “improve the overall trajectory of civilization” (these are meant to be rough delineations rather than rigorous definitions). I decided to split my giving into 30% category A and 70% category B. This is because while I believe that category B is the more pressing and impactful category to address in some overall utilitarian sense, I still feel a particular moral obligation towards helping the existing poor in the world we currently live in, which I don’t feel can be discharged simply by giving more to category B. The 30-70 split is meant to represent the fact that while category B seems more important to me, category A still receives substantial weight in my moral calculus (which isn’t fully utilitarian or even consequentialist).

The rest of this post treats categories A and B each in turn.

Category A: The Global Poor

Out of $3000 in total, I decided to give$2500 to GiveWell for discretionary regranting (which will likely be disbursed roughly but not exactly according to GiveWell’s recommended allocation), and $500 to some other source, with the only stipulation being that it did not exactly match GiveWell’s recommendation. The reason for this was the following: while I expect GiveWell’s recommendation to outperform any conclusion that I personally reach, I think there is substantial value in the exercise of personally thinking through where to direct my giving. A few more specific reasons: • Most importantly, while I think that offloading giving decisions to a trusted expert is the correct decision to maximize the impact of any individual donation, collectively it leads to a bad equilibrium where substantially fewer and less diverse brainpower is devoted to thinking about where to give. I think that giving a small but meaningful amount based on one’s own reasoning largely ameliorates this effect without losing much direct value. • In addition, I think it is good to build the skills to in principle think through where to direct resources, even if in practice most of the work is outsourced to a dedicated organization. • Finally, having a large number of individual donors check GiveWell’s work and search for alternatives creates stronger incentives for GiveWell to do a thorough job (and allows donors to have more confidence that GiveWell is doing a thorough job). While I know many GiveWell staff and believe that they would do an excellent job independently of external vetting, I still think this is good practice. Related to the last point: doing this exercise gave me a better appreciation for the overall reliability, strengths, and limitations of GiveWell’s work. In general, I found that GiveWell’s work was incredibly thorough (more-so than I expected despite my high opinion of them), and moreover that they have moved substantial money beyond the publicized annual donor recommendations. An example of this is their 2016 grant to IDinsight. IDinsight ended up being one of my top candidates for where to donate, such that I thought it was plausibly even better than a GiveWell top charity. However, when I looked into it further it turned out that GiveWell had already essentially filled their entire funding gap. I think this anecdote serves to illustrate a few things: first, as noted, GiveWell is very thorough, and does substantial work beyond what is apparent from the top charities page. Second, while GiveWell had already given to IDinsight, the grant was made in 2016. I think the same process I used would not have discovered IDinsight in 2015, but it’s possible that other processes would have. So, I think it is possible that a motivated individual could identify strong giving opportunities a year ahead of GiveWell. As a point against this, I think I am in an unusually good position to do this and still did not succeed. I also think that even if an individual identified a strong opportunity, it is unlikely that they could be confident that it was strong, and in most cases GiveWell’s top charities would still be better bets in expectation (but I think that merely identifying a plausibly strong giving opportunity should count as a huge success for the purposes of the overall exercise). To elaborate on why my positioning might be atypically good: I already know GiveWell staff and so have some appreciation for their thinking, and I work at Stanford and have several friends in the economics department, which is one of the strongest departments in the world for Development Economics. In particular, I discussed my giving decisions extensively with a student of Pascaline Dupas, who is one of the world experts in the areas of economics most relevant to GiveWell’s recommendations. Below are specifics on organizations I looked into and where I ultimately decided to give. Object-level Process and Decisions (Category A) My process for deciding where to give mostly consisted of talking to several people I trust, brainstorming and thinking things through myself, and a small amount of online research. (I think that I should likely have done substantially more online research than I ended up doing, but my thinking style tends to benefit from 1-on-1 discussions, which I also find more enjoyable.) The main types of charities that I ended up considering were: • GiveDirectly (direct cash transfers) • IPA/JPAL and similar groups (organizations that support academic research on international development) • IDinsight and similar groups (similar to the previous group, but explicitly tries to do the “translational work” of going from academic research to evidence-backed large-scale interventions) • public information campaigns (such as Development Media International) • animal welfare • start-ups or other small groups in the development space that might need seed funding • meta-charities such as CEA that try to increase the amount of money moved to EA causes (or evidence-backed charity more generally) I ultimately felt unsure whether animal welfare should count in this category, and while I felt that CEA was a potentially strong candidate in terms of pure cost-effectiveness, directing funds there felt overly insular/meta to me in a way that defeated the purpose of the giving exercise. (Note: two individuals who reviewed this post encouraged me to revisit this point; as a result, next year I plan to look into CEA in more detail.) While looking into the “translational work” category, I came across one organization other than IDinsight that did work in this area and was well-regarded by at least some economists. While I was less impressed by them than I was by IDinsight, they seemed plausibly strong, and it turned out that GiveWell had not yet evaluated them. While I ended up deciding not to give to them (based on feeling that IDinsight was likely to do substantially better work in the same area) I did send GiveWell an e-mail bringing the organization to their attention. When looking into IPA, my impression was that while they have been responsible for some really good work in the past, this was primarily while they were a smaller organization, and they have now become large and bureaucratic enough that their future value will be substantially lower. However, I also found out about an individual who was running a small organization in the same space as IPA, and seemed to be doing very good work. While I was unable to offer them money for reasons related to conflict of interest, I do plan to try to find ways to direct funds to them if they are interested. While public information campaigns seem like they could a priori be very effective, briefly looking over GiveWell’s page on DMI gave me the impression that GiveWell had already considered this area in a great deal of depth and prioritized other interventions for good reasons. I ultimately decided to give my money to GiveDirectly. While in some sense this violates the spirit of the exercise, I felt satisfied about having found at least one potentially good giving opportunity (the small IPA-like organization) even if I was unable to give to it personally, and overall felt that I had done a reasonable amount of research. Moreover, I have a strong intuition that 0% is the wrong allocation for GiveDirectly, and it wasn’t clear to me that GiveWell’s reasons for recommending 0% were strong enough to override that intuition. So, overall,$2500 of my donation will go to GiveWell for discretionary re-granting, and $500 to GiveDirectly. Trajectory of Civilization (Category B) First, I plan to put$2000 into escrow for the purpose of supporting any useful small projects (specifically in the field of computer science / machine learning) that I come across in the next year. For the remaining $5000, I plan to allocate$4000 of it to the donor lottery, $500 to the Carnegie Endowment, and$500 to the Blue Ribbon Study Panel on Biodefense. For the latter, I wanted to donate to something that improved medium-term international security, because I believe that this is an important area that is relatively under-invested in by the effective altruist community (both in terms of money and cognitive effort). Here are all of the major possibilities that I considered:

• Donating to the Future of Humanity Institute, with funds earmarked towards their collaboration with Allan Dafoe. I decided against this because my impression was that this particular project was not funding-constrained. (However, I am very excited by the work that Allan and his collaborators are doing, and would like to find ways to meaningfully support it.)
• Donating to the Carnegie Endowment, restricted specifically to the Carnegie-Tsinghua Center. My understanding is that this is one of the few western organizations working to influence China’s nuclear policy (though this is based on personal conversation and not something I have looked into myself). My intuition is that influencing Chinese nuclear policy is substantially more tractable than U.S. nuclear policy, due to far fewer people trying to do so. In addition, from looking at their website, I felt that most of the areas they worked in were important areas, which I believe to be unusual for large organizations with multiple focuses (as a contrast, for other organizations with a similar number of focus areas, I felt that roughly half of the areas were obviously orders of magnitude less important than the areas I was most excited about). I had some reservations about donating (due to their size: $30 million in revenue per year, and$300 million in assets), but I decided to donate $500 anyways because I am excited about this general type of work. (This organization was brought to my attention by Nick Beckstead; Nick notes that he doesn’t have strong opinions about this organization, primarily due to not knowing much about them.) • Donating to the Blue Ribbon Study Panel: I am basically trusting Jaime Yassif that this is a strong recommendation within the area of biodefense. • Donating to the ACLU: The idea here would be to decrease the probability that a President Trump seriously erodes democratic norms within the U.S. I however currently expect the ACLU to be well-funded (my understanding is that they got a flood of donations after Trump was elected). • Donating to the DNC or the Obama/Holder redistricting campaign: This is based on the idea that (1) Democrats are much better than Republicans for global stability / good U.S. policy, and (2) Republicans should be punished for helping Trump to become president. I basically agree with both, and could see myself donating to the redistricting campaign in particular in the future, but this intuitively feels less tractable/underfunded than non-partisan efforts like the Carnegie Endowment or Blue Ribbon Study Panel. • Creating a prize fund for incentivizing important research projects within computer science: I was originally planning to allocate$1000 to $2000 to this, based on the idea that computer science is a key field for multiple important areas (both AI safety and cyber security) and that as an expert in this field I would be in a unique position to identify useful projects relative to others in the EA community. However, after talking to several people and thinking about it myself, I decided that it was likely not tractable to provide meaningful incentives via prizes at such a small scale, and opted to instead set aside$2000 to support promising projects as I come across them.

(As a side note: it isn’t completely clear to me whether the Carnegie Endowment accepts small donations. I plan to contact them about this, and if they do not, allocate the money to the Blue Ribbon Study Panel instead.)

In the remainder of this post I will briefly describe the $2000 project fund, how I plan to use it, and why I decided it was a strong giving opportunity. I also plan to describe this in more detail in a separate follow-up post. Credit goes to Owen Cotton-Barratt for suggesting this idea. In addition, one of Paul Christiano’s blog posts inspired me to think about using prizes to incentivize research, and Holden Karnofsky further encouraged me to think along these lines. The idea behind the project fund is similar to the idea behind the prize fund: I understand research in computer science better than most other EAs, and can give in a low-friction way on scales that are too small for organizations like Open Phil to think about. Moreover, it is likely good for me to develop a habit of evaluating projects I come across and thinking about whether they could benefit from additional money (either because they are funding constrained, or to incentivize an individual who is on the fence about carrying the project out). Finally, if this effort is successful, it is possible that other EAs will start to do this as well, which could magnify the overall impact. I think there is some danger that I will not be able to allocate the$2000 in the next year, in which case any leftover funds will go to next year’s donor lottery.

When I meet someone who works in a field outside of computer science, I usually ask them a lot of questions about their field that I’m curious about. (This is still relevant even if I’ve already met someone in that field before, because it gives me an idea of the range of expert consensus; for some questions this ends up being surprisingly variable.) I often find that, as an outsider, I can think of natural-seeming questions that experts in the field haven’t thought about, because their thinking is confined by their field’s paradigm while mine is not (pessimistically, it’s instead constrained by a different paradigm, i.e. computer science).

Usually my questions are pretty naive, and are basically what a computer scientist would think to ask based on their own biases. For instance:

• Neuroscience: How much computation would it take to simulate a brain? Do our current theories of how neurons work allow us to do that even in principle?
• Political science: How does the rise of powerful multinational corporations affect theories of international security (typical past theories assume that the only major powers are states)? How do we keep software companies (like Google, etc.) politically accountable? How will cyber attacks / cyber warfare affect international security?
• Materials science: How much of the materials design / discovery process can be automated? What are the bottlenecks to building whatever materials we would like to? How can different research groups effectively communicate and streamline their steps for synthesizing materials?

When I do this, it’s not unusual for me to end up asking questions that the other person hasn’t really thought about before. In this case, responses range from “that’s not a question that our field studies” to “I haven’t thought about this much, but let’s try to think it through on the spot”. Of course, sometimes the other person has thought about it, and sometimes my question really is just silly or ill-formed for some reason (I suspect this is true more often than I’m explicitly made aware of, since some people are too polite to point it out to me).

I find the cases where the other person hasn’t thought about the question to be striking, because it means that I as a naive outsider can ask natural-seeming questions that haven’t been considered before by an expert in the field. I think what is going on here is that I and my interlocutor are using different paradigms (in the Kuhnian sense) for determining what questions are worth asking in a field. But while there is a sense in which the other person’s paradigm is more trustworthy — since it arose from a consensus of experts in the relevant field — that doesn’t mean that it’s absolutely reliable. Paradigms tend to blind one to evidence or problems that don’t fit into that paradigm, and paradigm shifts in science aren’t really that rare. (In addition, many fields including machine learning don’t even have a single agreed-upon paradigm.)

I think that as a scientist (or really, even as a citizen) it is important to be able to see outside one’s own paradigm. I currently think that I do a good job of this, but it seems to me that there’s a big danger of becoming more entrenched as I get older. Based on the above experiences, I plan to use the following test: When someone asks me a question about my field, how often have I not thought about it before? How tempted am I to say, “That question isn’t interesting”? If these start to become more common, then I’ll know something has gone wrong.

A few miscellaneous observations:

• There are several people I know who routinely have answers to whatever questions I ask. Interestingly, they tend to be considered slightly “crackpot-ish” within their field; and they might also be less successful by conventional metrics, relatively to how smart they are considered by their colleagues. I think this is a result of the fact that most academic fields over-reward progress within that field’s paradigm and under-reward progress outside of it.
• Beyond “slightly crakpot-ish academics”, the other set of people who routinely have answers to my questions are philosophers and some people in program manager roles (this includes certain types of VCs as well).
• I would guess that in general technical fields that overlap with the humanities are more likely to take a broad view and not get stuck in a single paradigm. For instance, I would expect political scientists to have thought about most of the political science questions I mentioned above; however, I haven’t talked to enough political scientists (or social scientists in general) to have much confidence in this.

## Two Strange Facts

Here are two strange facts about matrices, which I can prove but not in a satisfying way.

1. If $A$ and $B$ are symmetric matrices satisfying $0 \preceq A \preceq B$, then $A^{1/2} \preceq B^{1/2}$, and $B^{-1} \preceq A^{-1}$, but it is NOT necessarily the case that $A^2 \preceq B^2$. Is there a nice way to see why the first two properties should hold but not necessarily the third? In general, do we have $A^p \preceq B^p$ if $p \in [0,1]$?
2. Given a rectangular matrix $W \in \mathbb{R}^{n \times d}$, and a set $S \subseteq [n]$, let $W_S$ be the submatrix of $W$ with rows in $S$, and let $\|W_S\|_*$ denote the nuclear norm (sum of singular values) of $W_S$. Then the function $f(S) = \|W_S\|_*$ is submodular, meaning that $f(S \cup T) + f(S \cap T) \leq f(S) + f(T)$ for all sets $S, T$. In fact, this is true if we take $f_p(S)$, defined as the sum of the $p$th powers of the singular values of $W_S$, for any $p \in [0,2]$. The only proof I know involves trigonometric integrals and seems completely unmotivated to me. Is there any clean way of seeing why this should be true?

If anyone has insight into either of these, I’d be very interested!

## Difficulty of Predicting the Maximum of Gaussians

Suppose that we have a random variable $X \in \mathbb{R}^d$, such that $\mathbb{E}[XX^{\top}] = I_{d \times d}$. Now take k independent Gaussian random variables $Z_1, \ldots, Z_k \sim \mathcal{N}(0, I_{d \times d})$, and let J be the argmax (over j in 1, …, k) of $Z_j^{\top}X$.

It seems that it should be very hard to predict J well, in the following sense: for any function $q(j \mid x)$, the expectation of $\mathbb{E}_{x}[q(J \mid x)]$, should with high probability be very close to $\frac{1}{k}$ (where the second probability is taken over the randomness in $Z$). In fact, Alex Zhai and I think that the probability of the expectation exceeding $\frac{1}{k}$ should be at most $\exp(-C(\epsilon/k)^2d)$ for some constant C. (We can already show this to be true where we replace $(\epsilon/k)^2$ with $(\epsilon/k)^4$.) I will not sketch a proof here but the idea is pretty cool, it basically uses Lipschitz concentration of Gaussian random variables.

I’m mainly posting this problem because I think it’s pretty interesting, in case anyone else is inspired to work on it. It is closely related to the covering number of exponential families under the KL divergence, where we are interested in coverings at relatively large radii ($\log(k) - \epsilon$ rather than $\epsilon$).

## Maximal Maximum-Entropy Sets

Consider a probability distribution ${p(y)}$ on a space ${\mathcal{Y}}$. Suppose we want to construct a set ${\mathcal{P}}$ of probability distributions on ${\mathcal{Y}}$ such that ${p(y)}$ is the maximum-entropy distribution over ${\mathcal{P}}$:

$\displaystyle H(p) = \max_{q \in \mathcal{P}} H(q),$

where ${H(p) = \mathbb{E}_{p}[-\log p(y)]}$ is the entropy. We call such a set a maximum-entropy set for ${p}$. Furthermore, we would like ${\mathcal{P}}$ to be as large as possible, subject to the constraint that ${\mathcal{P}}$ is convex.

Does such a maximal convex maximum-entropy set ${\mathcal{P}}$ exist? That is, is there some convex set ${\mathcal{P}}$ such that ${p}$ is the maximum-entropy distribution in ${\mathcal{P}}$, and for any ${\mathcal{Q}}$ satisfying the same property, ${\mathcal{Q} \subseteq \mathcal{P}}$? It turns out that the answer is yes, and there is even a simple characterization of ${\mathcal{P}}$:

Proposition 1 For any distribution ${p}$ on ${\mathcal{Y}}$, the set

$\displaystyle \mathcal{P} = \{q \mid \mathbb{E}_{q}[-\log p(y)] \leq H(p)\}$

is the maximal convex maximum-entropy set for ${p}$.

To see why this is, first note that, clearly, ${p \in \mathcal{P}}$, and for any ${q \in \mathcal{P}}$ we have

$\displaystyle \begin{array}{rcl} H(q) &=& \mathbb{E}_{q}[-\log q(y)] \\ &\leq& \mathbb{E}_{q}[-\log p(y)] \\ &\leq& H(p), \end{array}$

so ${p}$ is indeed the maximum-entropy distribution in ${\mathcal{P}}$. On the other hand, let ${\mathcal{Q}}$ be any other convex set whose maximum-entropy distribution is ${p}$. Then in particular, for any ${q \in \mathcal{Q}}$, we must have ${H((1-\epsilon)p + \epsilon q) \leq H(p)}$. Let us suppose for the sake of contradiction that ${q \not\in \mathcal{P}}$, so that ${\mathbb{E}_{q}[-\log p(y)] > H(p)}$. Then we have

$\displaystyle \begin{array}{rcl} H((1-\epsilon)p + \epsilon q) &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log((1-\epsilon)p(y)+\epsilon q(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}[-\log(p(y) + \epsilon (q(y)-p(y))] \\ &=& \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[-\log(p(y)) - \epsilon \frac{q(y)-p(y)}{p(y)} + \mathcal{O}(\epsilon^2)\right] \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon \mathbb{E}_{(1-\epsilon)p+\epsilon q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) - \epsilon^2 \mathbb{E}_{q}\left[\frac{q(y)-p(y)}{p(y)}\right] + \mathcal{O}(\epsilon^2) \\ &=& H(p) + \epsilon(\mathbb{E}_{q}[-\log p(y)]-H(p)) + \mathcal{O}(\epsilon^2). \end{array}$

Since ${\mathbb{E}_{q}[-\log p(y)] - H(p) > 0}$, for sufficiently small ${\epsilon}$ this will exceed ${H(p)}$, which is a contradiction. Therefore we must have ${q \in \mathcal{P}}$ for all ${q \in \mathcal{Q}}$, and hence ${\mathcal{Q} \subseteq \mathcal{P}}$, so that ${\mathcal{P}}$ is indeed the maximal convex maximum-entropy set for ${p}$.