mget-help - RE: [mget-help] Random Forest predictions

Subject: Marine Geospatial Ecology Tools (MGET) help

Text archives

From:	"Jason Roberts" <>
To:	"'Anders Knudby'" <>
Cc:	<>
Subject:	RE: [mget-help] Random Forest predictions
Date:	Fri, 2 Nov 2012 10:48:17 -0400

Hi Anders,

Here is the critical piece of code, from the file PredictModel.r (if you have
ArcGIS 10.0 it will be in
C:\Python26\ArcGIS10.0\Lib\site-packages\GeoEco\Statistics\PredictModel.r):

    # Handle random forest models fitted with randomForest or party.

    else if (!is.null(rPackage) && rPackage %in% c("randomForest", "party"))
    {
        # For binary classification forests, get the predicted probability of
        # the second class and apply the cutoff.

        if (rPackage == "randomForest" && model$type == "classification" &&
minResponseValue == 0 && maxResponseValue == 1)
        {
            predictedResponse <- as.vector(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")[,2]))
            if (!is.null(cutoff))
                predictedResponse <- as.integer(predictedResponse >= cutoff)
        }
        else if (rPackage == "party" &&
(model@responses@is_nominal[1]
||
model@responses@is_ordinal[1])
&& minResponseValue == 0 && maxResponseValue == 1)
        {
            predictedResponse <- sapply(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")), "[", i = 2)
            if (!is.null(cutoff))
                predictedResponse <- as.integer(predictedResponse >= cutoff)
        }

        # For other models (regression forests or classification forests with
        # more than two classes), get the predicted response. For
classification
        # forests, coerce it to an integer.

        else
        {
            predictedResponse <- suppressWarnings(predict(model,
newdata=predictorValues, type="response"))

            if (is.factor(predictedResponse))
                predictedResponse <- as.integer(as.vector(predictedResponse))
            else
                predictedResponse <- as.vector(predictedResponse)
        }
    }

So, what MGET does depends on whether it is a binary classification
forest--that is, a classification forest with two classes, 0 and 1. If it is,
then MGET calls predict(..., type="prob") and extracts the second column of
the result. That column contains the predicted probability of the second
class (i.e. the predicted probability, a continuous value ranging from 0 to
1, that the result was a 1 rather than a 0). If the user provided a cutoff
value (e.g. for ROC analysis), MGET applies it to convert the continuous
probability to a binary value of 0 or 1.

For all other models--classification forests with more than two classes, or
regression forests in which the response is a continuous value (e.g. the
weight of an animal)--MGET calls predict(..., type="response"). This returns
the predicted response. If I recall correctly, it is a matrix of floats, so
the bit of code after predict() coerces those into a vector of integers for
classification forests or a vector of floats for regression forests, because
later parts of MGET like vectors rather than matrices. (But I might be
misremembering the reason for that code.)

I hope that helps. If it does not, let me know. We should be able to obtain
similar results (accounting for the randomness you mentioned), and I would
like to see that happen.

Best,

Jason

-----Original Message-----
From: Anders Knudby
[mailto:]

Sent: Thursday, November 01, 2012 8:32 PM
To:

Subject: [mget-help] Random Forest predictions

Hi, I've been comparing your Random Forest tools to an R script I wrote
myself, here's what I find.

From "Fit Random Forest Model" I get very similar results to my own
(accounting for slight differences from RF inherent randomness). All good.

From "Predict Random Forest from Rasters" I've tried very hard and I
absolutely can't get the same results as I get with this command from R:
predictions = predict(rf_model, newdata=df, type="prob"). Obviously
"rf_model" is my model (which I can get from the randomForest command or
MGET, no difference). "df" has the data I read from my rasters (presumably
identical to what MGET does, I also use .asc files). Is it the type="prob"
command that is the issue? In the randomForest help I see type options of
"response", "prob" or "vote". My assumption is that if I don't provide a
"binary classification cutoff", MGET would use type="prob". Right?

Any pointers?

Anders

[mget-help] Random Forest predictions — Anders Knudby
- RE: [mget-help] Random Forest predictions — Jason Roberts

< date >
< thread >

Please Wait...

Text archives