Subject: Marine Geospatial Ecology Tools (MGET) help
Text archives
From: | "Jason Roberts" <> |
---|---|
To: | "'Anders Knudby'" <> |
Cc: | <> |
Subject: | RE: [mget-help] Random Forest predictions |
Date: | Fri, 2 Nov 2012 10:48:17 -0400 |
Hi Anders,
Here is the critical piece of code, from the file PredictModel.r (if you have
ArcGIS 10.0 it will be in
C:\Python26\ArcGIS10.0\Lib\site-packages\GeoEco\Statistics\PredictModel.r):
# Handle random forest models fitted with randomForest or party.
else if (!is.null(rPackage) && rPackage %in% c("randomForest", "party"))
{
# For binary classification forests, get the predicted probability of
# the second class and apply the cutoff.
if (rPackage == "randomForest" && model$type == "classification" &&
minResponseValue == 0 && maxResponseValue == 1)
{
predictedResponse <- as.vector(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")[,2]))
if (!is.null(cutoff))
predictedResponse <- as.integer(predictedResponse >= cutoff)
}
else if (rPackage == "party" &&
(model@responses@is_nominal[1]
||
model@responses@is_ordinal[1])
&& minResponseValue == 0 && maxResponseValue == 1)
{
predictedResponse <- sapply(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")), "[", i = 2)
if (!is.null(cutoff))
predictedResponse <- as.integer(predictedResponse >= cutoff)
}
# For other models (regression forests or classification forests with
# more than two classes), get the predicted response. For
classification
# forests, coerce it to an integer.
else
{
predictedResponse <- suppressWarnings(predict(model,
newdata=predictorValues, type="response"))
if (is.factor(predictedResponse))
predictedResponse <- as.integer(as.vector(predictedResponse))
else
predictedResponse <- as.vector(predictedResponse)
}
}
So, what MGET does depends on whether it is a binary classification
forest--that is, a classification forest with two classes, 0 and 1. If it is,
then MGET calls predict(..., type="prob") and extracts the second column of
the result. That column contains the predicted probability of the second
class (i.e. the predicted probability, a continuous value ranging from 0 to
1, that the result was a 1 rather than a 0). If the user provided a cutoff
value (e.g. for ROC analysis), MGET applies it to convert the continuous
probability to a binary value of 0 or 1.
For all other models--classification forests with more than two classes, or
regression forests in which the response is a continuous value (e.g. the
weight of an animal)--MGET calls predict(..., type="response"). This returns
the predicted response. If I recall correctly, it is a matrix of floats, so
the bit of code after predict() coerces those into a vector of integers for
classification forests or a vector of floats for regression forests, because
later parts of MGET like vectors rather than matrices. (But I might be
misremembering the reason for that code.)
I hope that helps. If it does not, let me know. We should be able to obtain
similar results (accounting for the randomness you mentioned), and I would
like to see that happen.
Best,
Jason
-----Original Message-----
From: Anders Knudby
[mailto:]
Sent: Thursday, November 01, 2012 8:32 PM
To:
Subject: [mget-help] Random Forest predictions
Hi, I've been comparing your Random Forest tools to an R script I wrote
myself, here's what I find.
From "Fit Random Forest Model" I get very similar results to my own
(accounting for slight differences from RF inherent randomness). All good.
From "Predict Random Forest from Rasters" I've tried very hard and I
absolutely can't get the same results as I get with this command from R:
predictions = predict(rf_model, newdata=df, type="prob"). Obviously
"rf_model" is my model (which I can get from the randomForest command or
MGET, no difference). "df" has the data I read from my rasters (presumably
identical to what MGET does, I also use .asc files). Is it the type="prob"
command that is the issue? In the randomForest help I see type options of
"response", "prob" or "vote". My assumption is that if I don't provide a
"binary classification cutoff", MGET would use type="prob". Right?
Any pointers?
Anders
Here is the critical piece of code, from the file PredictModel.r (if you have
ArcGIS 10.0 it will be in
C:\Python26\ArcGIS10.0\Lib\site-packages\GeoEco\Statistics\PredictModel.r):
# Handle random forest models fitted with randomForest or party.
else if (!is.null(rPackage) && rPackage %in% c("randomForest", "party"))
{
# For binary classification forests, get the predicted probability of
# the second class and apply the cutoff.
if (rPackage == "randomForest" && model$type == "classification" &&
minResponseValue == 0 && maxResponseValue == 1)
{
predictedResponse <- as.vector(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")[,2]))
if (!is.null(cutoff))
predictedResponse <- as.integer(predictedResponse >= cutoff)
}
else if (rPackage == "party" &&
(model@responses@is_nominal[1]
||
model@responses@is_ordinal[1])
&& minResponseValue == 0 && maxResponseValue == 1)
{
predictedResponse <- sapply(suppressWarnings(predict(model,
newdata=predictorValues, type="prob")), "[", i = 2)
if (!is.null(cutoff))
predictedResponse <- as.integer(predictedResponse >= cutoff)
}
# For other models (regression forests or classification forests with
# more than two classes), get the predicted response. For
classification
# forests, coerce it to an integer.
else
{
predictedResponse <- suppressWarnings(predict(model,
newdata=predictorValues, type="response"))
if (is.factor(predictedResponse))
predictedResponse <- as.integer(as.vector(predictedResponse))
else
predictedResponse <- as.vector(predictedResponse)
}
}
So, what MGET does depends on whether it is a binary classification
forest--that is, a classification forest with two classes, 0 and 1. If it is,
then MGET calls predict(..., type="prob") and extracts the second column of
the result. That column contains the predicted probability of the second
class (i.e. the predicted probability, a continuous value ranging from 0 to
1, that the result was a 1 rather than a 0). If the user provided a cutoff
value (e.g. for ROC analysis), MGET applies it to convert the continuous
probability to a binary value of 0 or 1.
For all other models--classification forests with more than two classes, or
regression forests in which the response is a continuous value (e.g. the
weight of an animal)--MGET calls predict(..., type="response"). This returns
the predicted response. If I recall correctly, it is a matrix of floats, so
the bit of code after predict() coerces those into a vector of integers for
classification forests or a vector of floats for regression forests, because
later parts of MGET like vectors rather than matrices. (But I might be
misremembering the reason for that code.)
I hope that helps. If it does not, let me know. We should be able to obtain
similar results (accounting for the randomness you mentioned), and I would
like to see that happen.
Best,
Jason
-----Original Message-----
From: Anders Knudby
[mailto:]
Sent: Thursday, November 01, 2012 8:32 PM
To:
Subject: [mget-help] Random Forest predictions
Hi, I've been comparing your Random Forest tools to an R script I wrote
myself, here's what I find.
From "Fit Random Forest Model" I get very similar results to my own
(accounting for slight differences from RF inherent randomness). All good.
From "Predict Random Forest from Rasters" I've tried very hard and I
absolutely can't get the same results as I get with this command from R:
predictions = predict(rf_model, newdata=df, type="prob"). Obviously
"rf_model" is my model (which I can get from the randomForest command or
MGET, no difference). "df" has the data I read from my rasters (presumably
identical to what MGET does, I also use .asc files). Is it the type="prob"
command that is the issue? In the randomForest help I see type options of
"response", "prob" or "vote". My assumption is that if I don't provide a
"binary classification cutoff", MGET would use type="prob". Right?
Any pointers?
Anders
Archives powered by MHonArc.