Related Readings
  • Answers for Aristotle: How Science and Philosophy Can Lead Us to A More Meaningful Life
    by Massimo Pigliucci
  • Nonsense on Stilts: How to Tell Science from Bunk
    by Massimo Pigliucci
  • Denying Evolution: Creationism, Scientism, and the Nature of Science
    by Massimo Pigliucci

RS 240 - David Manheim on "Goodhart's Law and why metrics fail"

Release date: September 16th, 2019

David Manheim

If you want to understand why things go wrong in business, government, education, psychology, AI, and more, you need to know Goodhart's Law: "When a measure becomes a target, it ceases to become a good measure." In this episode, decision theorist David Manheim explains the dynamics behind Goodhart's Law and some potential solutions to it.


David's Twitter

A Ribbonfarm blog post by David about Goodhart's Law

"Bureaucracy" by James Q Wilson

Edited by Brent Silk

Music by Miracles of Modern Science

Full Transcripts 

Reader Comments (12)

You get what you prioritize. Your anecdotes about nails and telemarketing showed prioritization of quantity over quality. In engineering there's the iron triangle: "Fast, good, cheap, pick two."

People don't know what the heck they want. They
September 21, 2019 | Unregistered CommenterMax
People complain about IQ tests because they measure natural ability rather than learning, and they complain about the SAT because it measures learning - advantaging those who can afford test prep and tutors.
The same people who dismiss math tests complain that the U.S. is behind other countries in math, based on math test scores.
They say money doesn't buy happiness but complain about income inequality and pay gaps. Is there a gender happiness gap? If men prioritize money, and women prioritize work-life balance, who's happier?
They cry about biased algorithms, and the alternative is to let people use their very biased discretion?

Of course a lot of important things are tricky to rate, like doctors, hospitals, universities, nonprofits. How do you measure a company's innovation. The number of patents may be a simplistic metric, but what's a better one? Lines of code are still used to measure productivity and project size, even though everyone knows that it's a simplistic metric, but what's a better one? More complicated metrics are not always better.
September 21, 2019 | Unregistered CommenterMax
Using the bathroom before weighing yourself is fine as long as you do it consistently. You'll underestimate your average weight, but you'll still see whether it's going up or down.
September 22, 2019 | Unregistered CommenterMax
Using the bathroom before weighing yourself is fine as long as you do it consistently. You'll underestimate your average weight, but you'll know if it's going up or down.
September 22, 2019 | Unregistered CommenterMax
Clearly the BEST way to measure scientific progress is to count the number of beakers society produces.
September 22, 2019 | Unregistered CommenterEran
Of course if you measure quality over quantity, then you need to quantify quality. Like, how do you measure the quality of a scientific paper? Count the number of times it's cited? Its weight in a systematic review? That's another quantity.
September 22, 2019 | Unregistered CommenterMax
The discussion about setting effective metrics/goals in organizations reminded me of something Joshua Greene stated in "Moral Tribes" - that selfish competition often leads to the best results.An example was that adversarial prosecuting and defending attorneys achieved better outcomes than a person seeking balanced justice.

I recall personal experiences about the tension between product teams and quality teams debating when a product is ready to ship. Even though the company wants to make the best decision balancing the trade-offs, asking everyone to do this may not work as well as creating competitive tension with focused metrics. I am not sure how this fits in with Goodhart's Law, though it seems related.

Building on David's comments about his paper on multiple metrics, perhaps part of the solution is to put different groups/people in charge of competing metrics where there are natural tension points and trade-offs. These groups then have to negotiate to achieve overall results. This process also allows the different groups to evolve their strategies as others attempt to game the system. A higher level manager/entity would need to arbitrate issues that cannot be resolved without a broader perspective. This may be obvious, and it is how many markets work. That said I did not hear the role of competitive tension discussed during the podcast.
September 23, 2019 | Unregistered Commentercc
Very interesting interview, thank you. Can you provide the reference for the study mentioned at 38:55? I feel a similar dynamics sometimes takes place between University professors and their PhD students.
September 23, 2019 | Unregistered CommenterPierre Dragicevic
WOW Julia! Remind me to never work at your company. ;) Gave me a good laugh because I've worked both for and besides some pretty horrific micromanagers and some nasty Never-Good-Enough types before, but nothing quite that extreme.

(Yes, to be fair, I understand that you were trying to explain it off the top of your head without really sounding it out beforehand, but it did come out like the plot to some psychological terror: “I know exactly how I want every person in the organization to act, *but* instead of telling you I'll just see what you do and punish or reward you based on _what I *then* say_ is how I would have done it... but really only after seeing if I like the results or not.”)

Another thing that strikes me is that so many businesses seem a bit OCD about definitely how to do _every single thing_, sometimes down to exactly how employees wash their hands after using the restroom or how a janitor must inspect a mop head at the start of each shift, etc. It's true that "common sense" isn't the same for everyone, but some executives seem to be into extreme soul-crushing at times.

Regardless, thanks for another great podcast (and now I know the name of that rule).
September 23, 2019 | Unregistered Commenterswan
Fascinating topic.

The first paragraph of Swan's post resonates highly with me. This attitude usually manifests itself with scenarios where, during employee feedback, you point out some technical issues that are ambiguous or conflicting and in asking for clarification for how it should be done or why it's done that way, the discussion devolves to "It is what it is" or "It's always been done that way" <grrrr>

My job appears to simply have two metrics, the number of jobs per day and QC failure rate prior to dispatching a job. Other employee metrics that seem intuitively important to me but are not tracked or even discussed during the employee performance review meetings are:

* Number and cost of parts used.
* Number of 'beyond economic repair' write-offs.
* Number of jobs repaired that get returned due to customer complaint about the repair.
* Record and recognition of employee contribution to procedure improvement or implementation of new repair procedures.

High numbers in any of the first three areas indicate problems with either employee training, aptitude and are usually both. The thing is they also manifest in higher jobs per day done. The first one because it's quicker to fit a brand new part than spend time diagnosing a fault and repairing that part. The second because if not tracked and challenged, it's easy to initiate a fault in a unit that has none and claim it's 'BER' because you're having a 'bad day' and need to bang out jobs deemed complete. The third is usually due to superficial testing and should be picked up by QC, but only if QC is doing the same level of testing prior to dispatch, which isn't always the case, so it gets to a customer, works for a short period and the fault reappears and the product is returned.

The situation then arises with non-technical managers not being able to differentiate between quantity vs quality and the common problem of apples to oranges comparisons emerging. The confidence in and the validity of the metrics or their analysis is then thrown into question, possibly leading to productivity issues due to low morale at the disingenuousness of it all.

It appears to me then that:

* The number of metrics being used needs to be sufficient to reflect the complexity of the process. Measure what matters?

* The interpretation of the results of measurements should be done by people who have practical experience of the very thing they are responsible for measuring. No tick-boxing bean counter approaches to analysis.

* Measure the metrics? Any persistent or new 'failure modes' with the current metrics in place should mean revisiting the metrics and / or analysis to fix validity concerns. This should kill off any "It is what it is" or "It's always been done like that" attitudes and responses by those responsible for providing feedback to employees.
September 29, 2019 | Unregistered CommenterDarren Evans
That was so good! Super dense and fascinating episode. You have to bring this guy back at some point.
September 29, 2019 | Unregistered Commentersty.silver
Someone recently told me that Google recently studied its engineers, and found no correlation between GPA and real world performance.

Teaching to the test does teach students something. Namely, skills necessary to pass the test. So, as David Mannheim already suggests in this podcast, design a test that requires more comprehensive skills on the part of the student. These types of tests will cost more to create and grade, but would improve both actual student learning and the test itself as a more accurate measurement of learning. For instance, instead of multiple choice tests, have "show your work" tests where any well justified answer counts as correct. This would test not only mere test preparation, but also general skill in a particular subject and the ability to creatively solve a problem in the subject.

Paying employees per call will generate useless calls. Paying employees per unit revenue will generate low profitability transactions. So, quite obviously, pay employees per unit PROFIT. Even better, since generating reliable profits requires cooperation throughout an organization, pay employees a percentage of the profits of the entire organization (Profit Share Bonus).

Perhaps a measure of subjectivity, often disregarded as analytical error, could actually solve some of these measurement problems. For the janitorial function, instead of insisting that a janitor clean the floor X times per week, merely require the janitor to inspect the floor each evening, and if the janitor personally finds the floor substantially clean, to skip it and move to another floor. Then the janitor will only spend time clean substantially dirty floors. Then the company need only hire enough janitors to clean a nightly average number of floors, not the larger number needed to clean each floor X nights per week, and save on janitorial expense.

Dolphins obviously have abstract thought, as well as the ability to plan and calculate. Dolphins are people.

The Mandate for Employer provided healthcare has a LOT of negative consequences. It has led to people working in jobs they don't really want just to keep the healthcare. It has indeed led to hours less than the full time hours healthcare requirement. Worst of all, it has led to an effective increase in hiring a US citizen as an employee. This in turn has led to a massive increase in outsourcing to non US jurisdictions that do not have requirements such as mandatory employer provided healthcare and minimum wages.

Number of Papers published has much more to do with career advancement than actual scientific advancement.

Body Weight provides a somewhat poor measure for the metric of physical fitness. Body Fat % provides a better measure. Furthermore, after a hard work out, the muscles of the body retain water and this increases body weight short term but increases muscle mass in the long term. With proper diet, it also reduces body fat %. So actually just looking at the body and summarizing the body fat and muscle/bone mass provides a much better estimate of health than the scale does.

Also, to get an accurate measure of weight change over time, a person must weight themselves at the same time of day, since human body weight varies over time. So it makes sense to always weigh oneself only in the morning, just after waking up and going to the bathroom, but before eating or drinking. This will provide a more accurate measurement of weight change over time.

Wrongful Termination Lawsuits create a lot of perverse incentives within companies. Objective performance metrics do make these cases more defensible. Therefore, the law should perhaps change to adapt to a more flexible risk model.

Since paying employees a share of annual profits increases short term risk, perhaps employees should instead receive a pension contribution based on the profits of the company over the next 25 years.

The massive size and general mission of the US Military essentially requires some degree of redundancy and inefficiency.

Really great podcast.
September 30, 2019 | Unregistered CommenterJameson

