Subject: Marine Geospatial Ecology Tools (MGET) help
Text archives
From: | "Jason Roberts" <> |
---|---|
To: | "'Scott Gunn'" <> |
Cc: | "'Ram Deo'" <>, "'Luis Miguel Verissimo'" <>, <> |
Subject: | RE: [mget-help] random forest in MGET |
Date: | Thu, 17 Jan 2013 17:31:04 -0500 |
Hi Scott, I’m terribly sorry but it is still broken. After Ram and Luis originally brought it up, I made some progress on a fix but could not complete it before a series of major events happened (birth of a child, winter holidays, getting temporarily assigned to work on critical project that has fallen behind). Up to this point, I’ve had a pretty good record responding quickly to problems but in this case, I wasn’t able to do so. I apologize to Ram and Luis, and to you for having spent your time on something that should have been working by now. Part of the complexity was that the more a dug into the problem, the more I realized that categorical variables were broken in different ways. Realizing that, I determined to provide a comprehensive solution that would: · Allow categorical predictor variables and response variables for all of the modeling frameworks MGET supports (GLM, GAM, trees, random forests [both the randomForest package and cforest], linear mixed models). · Allow categorical variables to be strings or integers. · If an string was used, automatically provide guidance to the user how to construct integer predictor rasters in lieu of the strings, since a raster can only hold numbers (not strings). · Detect when a categorical predictor raster contains a value that was not used to fit the model, and gracefully set this to No Data, instead of allowing R to fail with an obscure message. Providing all of that in a timely fix proved difficult given my other responsibilities. At this point, since you and possibly others are still waiting for a fix, it might be best to provide a surgical fix that just lets you build and predict your model, rather than supporting all that stuff now. If you send me a private email with the table you’re providing to the Fit Random Forest and the other parameters you’re providing to that tool, I’ll see what I can do. Best, Jason From: Scott Gunn [mailto:] Hi Jason and all, On Mon, Nov 5, 2012 at 2:36 PM, Jason Roberts <> wrote: Dear Ram, Thanks for contacting us about MGET. I’m sorry about the problem you are seeing. I am actively investigating this problem, among others, after being contacted by Luis Miguel Verissimo, a fellow colleague at your institution. Do know him? I added him to the CC list, in case he is interested in this problem as well. The problem is that MGET uses the R randomForest package to fit these models, and that package seems to be limited in the syntax it will accept for model formulas. I can reproduce the error you obtained with your original formula: tot_volcuf ~ modslope + tmndvi20 + avbawht + factor(cover) Apparently the problem relates to how the randomForest package parses the formula and uses it to build an internal representation of the data passed into it. You can work around it with an ugly hack: tot_volcuf ~ modslope + tmndvi20 + avbawht + factor(cover) - cover In R syntax, the - cover means “do not use the variable cover as a predictor”. Well, simply omitting + cover accomplishes the same thing; it prevents cover from being used as a predictor. But by explicitly saying - cover, we trick R into not failing with the message you mentioned. I do not completely understand this at a deep level yet, and I have not provided a full explanation of what I think is going on with R here, but you can try this as a quick workaround. Unfortunately, when you do, I suspect you will still encounter a problem when you try to run the MGET Predict Random Forest From Rasters tool. That tool fails when the model formula includes categorical variables, but for a different reason. I am working on that problem as well. Sorry again for all of these problems. The Random Forest tools are new in MGET and I have not tested them very well. I should be able to provide a solution to these problems this week, however. Best regards, Jason From: Ram Deo [mailto:] Hello Sir, I am a PhD student at Michigan Technological University, School of Forest Resources and Environmental Science. I found the ArcMap extension "Marine Geospatial Ecology Tools 0.8a45" very interesting, and relavant for my research work. I am trying to apply the "Fit Random Forest Model" to my reference data but I am continously getting an error message. I have one response variable (continuous) and four explanatory variables (one of them categorical) in my training data set. I am trying to fit a model of the form: tot_volcuf ~ modslope + tmndvi20 + avbawht + factor(cover) The error message that I am getting is: Error in factor(cover) : object 'cover' not found. Herewith I have attached my data table. Could you please run the above model with my data? It would be a great help if you could suggest me the necessary steps and would be highly acknowledged. Thanks in advance. Sincerely, Ram Kumar Deo Ph.D. Student School of Forest Resources and Environmental Science Michigan Technological University Houghton, MI-94431 |