The Hidden Costs of Complexity in Data Science



  • Author: Chris Walsh
  • Full Title: The Hidden Costs of Complexity in Data Science
  • Document Note: Loved the example comparing Sherman and Tiger tanks. You should start a project by setting a complexity limit guided by:
    • What is the minimum required accuracy of the model?
    • Will this project require some sort of scaling?
    • How much resources are available for the project
  • URL: the-hidden-costs-of-complexity-in-data-science-6b5958117bfb


  • It’s not an unfamiliar problem in other disciplines. Engineers have dealt with analogous problems for decades. In World War II, the US designed Sherman tank used highly interchangeable parts. If one tank was knocked out, parts could be taken from another [1]. The Sherman was a relatively simple machine, if it was damaged in battle it could be quickly repaired in the field [2]. On the other hand, Germany designed the Tiger. The Tiger was powerful, precise and complicated [2]. It was a tank as finely tuned as a Swiss watch. Allied forces deeply feared a fully operational Tiger. The problem was, how do you keep a Tiger running? Precision engineering means customization, of parts and knowledge. If you want to fix a Tiger, you need Tiger specific parts. If you want to fix a Tiger, you need Tiger specific expertise. (View Highlight)
  • The Sherman is the data science model built with little customization. Its developer continually monitoring the costs of complexity. Its parts are recognizable, interpretable and fixable for our colleagues. It is built efficiently and gets the job done with the appropriate level of accuracy. (View Highlight)
  • The Tiger is that data science model, so often built with great passion, whose developer was blind to the costs of complexity. It is an incredible machine, when it works. Its precision means it will break. When it breaks only a very small group of people will be capable of fixing it. (View Highlight)