Subject: Marine Geospatial Ecology Tools (MGET) help
Text archives
| From: | "Jason Roberts" <> |
|---|---|
| To: | "'Liza Hoos'" <> |
| Cc: | <> |
| Subject: | RE: [mget-help] RandomForestPredictFromArcGISRasters using model built with Party package |
| Date: | Mon, 2 Dec 2013 15:57:02 -0500 |
Hi Liza, The problem is not obvious. Most likely, something is implicitly coercing one of the predictors to a factor, but the coercion is not occurring just on the training data at model fitting time—it is happening again on the raster data at prediction time. The rasters do not have the same set of unique values as the training data; the result is that the factor levels for prediction are different than the training data—a major party foul, so to speak. That is the theory, at any rate. It could also be a bug in MGET’s special handling of factors that tries to prevent this exact situation. It could very well be party itself doing the coercion internally; I have seen some other things about it that suggest it is mainly designed for predicting categorical responses with categorical predictors. For example, when you try to do conditional variable importance with a model that uses continuous predictors, party tends to blow up due to insufficient memory. From what I could tell, this was because it was creating many-leveled factors internally for ranges of the continuous predictors, and the memory requirement was proportional to the product of the number of levels of each factor. Short on immediate ideas, I consulted my new favorite guide to R to see if it had a hint, and to kill some time while mulling what to do next. It boosted my morale but did not provide an answer, as I was already aware of all of the pitfalls with factors that are noted there. Could you please repro the problem with as compact a model formula as you can, and then send me the output model file for both the party model that fails and the randomForest model that succeeds? Thanks, Jason From: Liza Hoos [mailto:] Hi Jason, I am having trouble with the RandomForestPredictFromArcGISRasters tool. If I build a model specifying the use of the randomForest package in the FitToArcGISTable tool, then use that model in the PredictFromArcGISRasters tool, everything runs smoothly. However, if I try to use PredictFromArcGISRasters using a model built with the party package in the FitToArcGISTable tool, I receive the error: "Classes of new data do not match original data". I thought that perhaps the one categorical variable in my model was somehow the cause of this error message, but I tried using the party package to build the model again, leaving out the categorical variable, and still received the same error message. I also thought that perhaps that specifying the Surrogate Splits parameter as greater than 0 could be causing the problem, but I tried specifying it as 0 and running the model again and still received the same error message. Do you have any guesses as to what could be causing this problem? I attached a txt file with output from the run with the party package and randomForest package - all parameters are exactly the same between these two runs except for which R package is used. Thanks so much, Liza |