Skip to Content.

mget-help - RE: [mget-help] random forest and missing data

Please Wait...

Subject: Marine Geospatial Ecology Tools (MGET) help

Text archives


From: "Jason Roberts" <>
To: "'Liza Hoos'" <>, <>
Subject: RE: [mget-help] random forest and missing data
Date: Mon, 18 Nov 2013 15:18:49 -0500

Hi Liza,

 

MGET explicitly calls na.omit(trainingData) prior to fitting a random forest model with the R randomForest package. This will drop all rows that are missing values for any predictor variables. MGET does not attempt to fill in missing values using rfImpute or any other technique. Thus, the model is fitted only to records that have values for all predictor variables.

 

MGET can also fit random forests using the R party package. When that package is used, MGET will explicitly call na.omit as above so long as the Number of Surrogate Splits” parameter is not specified or is zero. If the Number of Surrogate Splits is greater than zero, then all training records will be provided to the model fitting function, and it will be up to that function to decide what to do. The party package is capable of dealing with missing data; please see its documentation for details.

 

Jason

 

From: Liza Hoos [mailto:]
Sent: Monday, November 18, 2013 2:12 PM
To: ; Jason Roberts
Subject: [mget-help] random forest and missing data

 

Hi Jason,

I just wanted to double check that when using the Fit Random Forest Model tool in MGET with the randomForest package, records that contain missing values are not being used in the model fitting process. In other words, the rfImpute option in the random forest package is not being used, and therefore the model is running only on records with values for all dependent variables...is that correct?

Thanks,

Liza

Archives powered by MHonArc.
Top of Page