You can’t tell whether climate models are oversensitive just by looking at temperature charts


Both those who argue for and against the reliability of climate models usually try to drive the point home by showing charts that contrast the evolution of global air temperature in the historical record with the temperature “projections” of climate models. (I’ll explain the reason for the quotation marks later).

Their respective arguments, though often not made explicitly, are that:

a) If the models more or less track the historic evolution of global atmospheric temperatures, then their sensitivity to greenhouse gas forcing must be approximately right, and therefore their projections for future temperature change should be trusted.

b) If the models deviate from said temperature evolution, then their sensitivity must be wrong, and the same applies to the models’ projections. Pretty much all the people making this argument claim that models are oversensitive.

Both sides are wrong because air temperature changes, in the real world as in models, do not depend only on sensitivity to greenhouse gases; they also hinge on the quantity of forcing and on the level of ocean heat uptake. You need to know how much forcing and ocean heat uptake occur in order to calculate climate sensitivity. If you don’t know these measures, you can’t make claims about how well or badly models perform.

In the rest of the article I illustrate the issue with some numbers — worry not, the math is trivial.

Estimating climate sensitivity

We’re going to see examples of how the estimated climate sensitivity can change radically as a result of small variations in the input figures (air temperature, forcing, ocean heat uptake). The numbers are made-up but in the right neighborhood; if you want to familiarize yourself with the quantities involved in climate sensitivity, you can check out this app of mine.

Before we start, a couple notes. First, keep in mind that we’re always talking about changes: how much air temperatures, or forcing, or global heat uptake have increased.

Second, consider that for the calculation of transient sensitivity, or transient climate response (TCR), the only things that matter are atmospheric temperatures and forcing. But if you want to calculate equilibrium climate sensitivity (ECS) then you need to know ocean heat uptake or, more precisely, global heat uptake. Over 90% of global heat uptake happens in the ocean, so both measures are similar. Global heat uptake is more commonly called energy imbalance or radiative imbalance — these terms are interchangeable, because radiation is the only process whereby the Earth can gain or lose energy.

Now, how would such a calculation go about?

Suppose that we’re trying to infer climate sensitivity using air temperature figures going back to 1850. Data in that era was so sparse and unreliable, you decide to use the 1850–1900 average as the starting point; as a final point you take the 2009–2018 average. Data is better nowadays, but it’s still unwise to use a single year as a reference point — it may be affected by volcanoes, El Niño, etc. A longer period will tend to smooth out those variations and hopefully give a clearer look at the global warming signal.

You crunch the numbers and it turns out that, according to the observational record, air temperatures between these two periods have increased by 1ºC. On the other hand, climate models show a temperature rise of 1.1ºC. Well, only 0.1ºC, or 10% of the rise in temperatures; it certainly seems a small mismatch!

(You can also see why I used quotation marks around “projections”: obviously the current generation of climate models hadn’t been developed by the year 1850! Rather, for the CMIP5 group of models, the period up to and including 2005 is a hindcast; that is to say, the modelers already knew what the observational record said about 1850–2005. Only the period since 2006 is a forecast).

Now consider forcing. According to observational estimates, forcing has risen by 2.5 watts per square meter; in the models, the rise has been 2 watts per square meter. Would people say this difference is big or small? Well, it’s hard to know what people would say because in practice they don’t say anything about this issue; googling for comparisons of forcing levels between models and reality yields very, very few results.

(One curious fact about climate models is that calculating the forcings that arise in their simulations is complicated — there doesn’t seem to be any definitive dataset compiling these figures for most models. So it’s hard to say with certainty whether forcing in the real world has been stronger than in the models; again, I’m using these figures only as illustration.)

Finally, the models almost nail the change in global heat uptake: it was 0.7 W/m2 according to our estimates of the real world, and 0.8 W/m2 in the models.

You may have guessed where this is going: the errors can compound. For a calculation of transient sensitivity, you have to “scale” the forcing of the period involved, so that it matches the forcing that results from a doubling of CO2 (3.8 W/m2). The result would be a TCR figure 37.5% higher in models than in observations:

· Real world: (3.8 / 2.5) * 1 = 1.52ºC

· Model: (3.8 / 2) * 1.1 = 2.09ºC

And what about equilibrium sensitivity? The mismatch widens because, from the forcing figure, we have to subtract the heat uptake value for the same period:

· Real world: (3.8 / (2.5–0.7)) * 1 = 2.11ºC

· Model: (3.8 / (2–0.8)) * 1.1 = 3.48ºC

A person who had looked independently at the three inputs may have concluded that the models were approximately right in all cases — yet the models’ estimate ended up 65% higher than reality.

What if the models’ values for atmospheric temperature and global heat uptake absolutely match reality?

If that is the case but the models’ forcing figures do not match reality, then their sensitivity results will still be wrong. And the error will be worse for ECS than for TCR. Let’s redo the exercise above, but now assuming that the only difference between models and reality is in forcing (again 2.5 W/m2 in the observational record, 2 W/m2 in models).

For TCR, as you might expect, the difference between model and reality is 25%.

· Real world: (3.8 / 2.5) * 1 = 1.52ºC

· Model: (3.8 / 2) * 1 = 1.9ºC

And for ECS this grows to 38%.

· Real world: (3.8 / (2.5–0.7)) * 1 = 2.11ºC

· Model: (3.8 / (2–0.7)) * 1 = 2.92ºC

Now, if a model depicts twice as much warming as has happened, or only half as much, then it’s likely that its sensitivity is indeed significantly wrong. But you cannot be certain unless you know the models’ forcings and heat uptake.

February 2020 update: what if the forcing caused by a doubling of CO2 differs between models and reality?

This issue didn’t cross my mind until I read Hausfather’s paper on old climate models. The radiative forcing caused by a doubling of CO2 is normally denoted F_2x and, as mentioned earlier, is currently estimated to be around 3.8 watts per square meter. The calculations above, which one could call naïve, assumed the same F_2x for the real world and for climate models. In fact, F_2x is different in each climate model but, on average, apparently lower than in reality. For example, Dessler & Forster (2018) give an average value of 3.45 W/m2 for CMIP5 models (see Table S1).

In the naïve example forcing was 20% smaller in the models than in reality. Now let’s say that we measure forcing as a percentage of F_2x. In such a situation, the 2.5W/m2 of real-world forcing would be 65.8% of a doubling of CO2. The equivalent number for climate models would be 2.27W/m2 but, following the naïve estimate, let’s assume climate models have a 20% smaller forcing. That would mean a forcing in models of 1.82W/m2.

This issue does not affect the estimation of TCR: the values remain the same as in the naïve calculation, and the overshoot in models (with respect to reality) is still 25%.

· Real world: (3.8 / 2.5) * 1 = 1.52ºC

· Model: (3.45 / 1.82) * 1 = 1.9ºC

But it does affect the calculation of ECS, because that involves absolute W/m2. In the naïve calculation the overshoot was 38%, and now it’s 46%.

· Real world: (3.8 / (2.5–0.7)) * 1 = 2.11ºC

· Model: (3.45 / (1.82–0.7)) * 1 = 3.08ºC

Another way to look at the issue: if models have a smaller F_2x than reality, then their heat uptake or energy imbalance should be correspondingly smaller. You should not expect models to match real-world heat uptake. However, keep in mind that the models’ F_2x is not a complete certainty. Other sources report higher numbers than Dessler & Forster, and if the models’ F_2x were above 3.8W/m2 then the error in ECS estimation would go the other way (the difference between model and reality would be smaller than the naïve calculation shows).

You’re just a guy on the internet. Why am I supposed to believe what you wrote?

What I’ve described in this article is the standard way of estimating climate sensitivity. Both papers that use the thermometer record and those that work from paleoclimate data follow the same basic approach: look at the change in temperatures over a given period and compare it with the change in forcing in the same period. The main difference is that, in the thermometer-era papers, authors also look at heat uptake in order to distinguish between TCR and ECS. By contrast, in the paleo papers the timeframe is so long that, by the end of the period studied, the climate has (or is assumed to have) regained equilibrium; besides, with paleo data there is no way to estimate energy imbalance anyway. For this reason, paleo papers only calculate ECS, not TCR.

So the math of this article has been done many times before, though obviously using real data and in a more sophisticated, comprehensive manner. There are plenty of papers, presentations, etc. on the issue, but you probably don’t want to slog through a dozen links; if you’re only going to read one technical article, I recommend this one.