Balancing the Weight of Variables in a Decision Tree
Author: BBVA AI Factory
Full Title: Balancing the Weight of Variables in a Decision Tree
Document Note: In industrialised process where the output of the model is used to take decisions it is a bad practice to over-rely on a low number of variables due to fail to be reported or technical failure. For this reason it might be important to dillute the importance of a variable to make the model more robust. In linear regression models we may impose penalties or constraints on the magnitude of the coefficients but this is not possible in decission trees. Different workarounds include: introduce noise to the data or to use the more important variables in the last steps of the decission. Proposed solution, extended trees: We fit a base tree without most important variables, then extend the leaves of previous trees with variable left out.
- Sometimes, when facing the challenge of modelling the operation of a process where different variables are involved, we find that some of them dominate over the others. These variables will thus exert greater influence compared to the rest. (View Highlight)
- In an industrialised process where the output of a model is used to make decisions, it is generally bad practice to over-rely on a low number of variables as these might fail to be reported (or may be misreported) due to technical failure. (View Highlight)
- Tags: favorite