An accessible introduction to the problem of multiple testing

From Cam Harvey, with applications to portfolio management:

We provide some new tools to evaluate trading strategies. When it is known that many strategies and combinations of strategies have been tried, we need to adjust our evaluation method for these multiple tests. Sharpe Ratios and other statistics will be overstated. Our methods are simple to implement and allow for the real-time evaluation of candidate trading strategies.

Recommended. This is a huge issue in classifying causal mechanisms in medicine, evidence-based policymaking and management, etc.

On one hand, any given “experiment” (read: regression result) may be faulty (sample-dependent); on the other hand, if we have multiple tests, we need good ways of aggregating those results.

If we have this problem with inference in the field of finance (with the amount of time, attention, and cash expended to figure out the best trading strategies), then what hope is there in evidence-based policymaking (where, in most cases, we have – at best – one experiment)?