Variable selection using random forests

Saved in:
Bibliographic Details
Title: Variable selection using random forests
Authors: Genuer, Robin1 Robin.Genuer@math.u-psud.fr, Poggi, Jean-Michel1,2 Jean-Michel.Poggi@math.u-psud.fr, Tuleau-Malot, Christine3 malot@unice.fr
Source: Pattern Recognition Letters. Oct2010, Vol. 31 Issue 14, p2225-2236. 12p.
Subjects: Statistics, Regression analysis, Prediction theory, Tree graphs, Mathematical models, Mathematical variables
Abstract: Abstract: This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good parsimonious prediction model. The main contribution is twofold: to provide some experimental insights about the behavior of the variable importance index based on random forests and to propose a strategy involving a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy. [Copyright &y& Elsevier]
Copyright of Pattern Recognition Letters is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Abstract: This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good parsimonious prediction model. The main contribution is twofold: to provide some experimental insights about the behavior of the variable importance index based on random forests and to propose a strategy involving a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy. [Copyright &y& Elsevier]
ISSN:01678655
DOI:10.1016/j.patrec.2010.03.014