Skip to Content.

mget-help - RE: [mget-help] Question about Conditional Interference Trees - 'Fit Random Forest Model' tool

Please Wait...

Subject: Marine Geospatial Ecology Tools (MGET) help

Text archives


RE: [mget-help] Question about Conditional Interference Trees - 'Fit Random Forest Model' tool


Chronological Thread 
  • From: Jason Roberts <>
  • To: Marc Nelson <>
  • Cc: "" <>
  • Subject: RE: [mget-help] Question about Conditional Interference Trees - 'Fit Random Forest Model' tool
  • Date: Tue, 21 Mar 2017 15:29:50 +0000
  • Accept-language: en-US
  • Authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=duke.edu;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

Hi Marc,

 

Yes, as I understand it, the party package does use conditional inference trees—that is its main raison d'etre—and they will be used regardless of whether you enable the “Use conditional variable importance” option in MGET. That option is not utilized during model fitting, only after the model is fitted and variable importance is being estimated. Specifically, it is the value passed by MGET to the party package’s varimp() function for the conditional parameter. The relevant documentation is:

 

Function varimp can be used to compute variable importance measures similar to those computed by [the function] importance. Besides the standard version, a conditional version is available, that adjusts for correlations between predictor variables.

 

If conditional = TRUE, the importance of each variable is computed by permuting within a grid defined by the covariates that are associated (with 1 - p-value greater than threshold) to the variable of interest. The resulting variable importance score is conditional in the sense of beta coefficients in varimp regression models, but represents the effect of a variable in both main effects and interactions. See Strobl et al. (2008) for details.

 

Note, however, that all random forest results are subject to random variation. Thus, before interpreting the importance ranking, check whether the same ranking is achieved with a different random seed – or otherwise increase the number of trees ntree in ctree_control.

 

I attached some relevant papers.

 

I have had mixed success with conditional variable importance. It seems that the implementation for it is very memory intensive. Because MGET runs inside ArcGIS’s traditional 32-bit Windows applications (ArcMap, ArcCatalog), it can only utilize the 32-bit version of R. In practice, this limits MGET to about 1 to 2 GB of memory, regardless of how much is available from the operating system itself. Memory-hungry code can quickly blow through this. It seems surprising that 900 records x 9 predictors would do that, although if there is some code that would, it is the conditional importance code.

 

If it is a borderline situation you might be able to get it to succeed by starting a fresh copy of ArcGIS and then running the tool without doing anything else first. Also, you could try running it from ArcCatalog instead of ArcMap; that might result in more memory being available to R that would otherwise be taken up by ArcMap for the map itself, etc.

 

At the end of the day, if you believe the conditional variable importance is the way to go, your best bet is probably to fit the model yourself in a 64-bit version of R. That would allow more memory to be accessed by the algorithm. It is not a guaranteed solution but might work.

 

Best,

Jason

 

From: [mailto:] On Behalf Of Marc Nelson
Sent: Tuesday, March 21, 2017 8:47 AM
To:
Subject: [mget-help] Question about Conditional Interference Trees - 'Fit Random Forest Model' tool

 

Hi Jason / team,

 

A quick question about the 'Fit Random Forest Model' tool. In the model options, there is an option to use conditional variable importance (party pkg only). The help dialogue states that this option enables conditional interference trees.

 

Inline image 1

 

Inline image 2

 

As you warn may happen in the help dialogue, ArcMap crashes on me when I select this option (I have 900 records x 9 satellite bands). However, when I run the party pkg without selecting the conditional variable importance box, the summary file still states that the model was fit using conditional interference trees:

 

Inline image 3

< br>

So, my question is: does the party package use conditional interference trees after all? I am wondering because my own preliminary research is confirming that the randomForest pkg does indeed do a poor job at estimating variable importance for my data, which is a bit odd as it is all continuous satellite band data.

 

Thanks again for the help on the several occasions I've reached out!

 

/Marc

Attachment: Strobl_et_al_2008.pdf
Description: Strobl_et_al_2008.pdf

Attachment: Genuer_et_al_2010_Variable_selection_using_random_forests.pdf
Description: Genuer_et_al_2010_Variable_selection_using_random_forests.pdf

Attachment: Strobel_2009_Introduction_To_Recursive_Partitioning.pdf
Description: Strobel_2009_Introduction_To_Recursive_Partitioning.pdf

Attachment: Strobl_et_al_2007.pdf
Description: Strobl_et_al_2007.pdf




Archive powered by MHonArc 2.6.19.

Top of Page