Subject: Marine Geospatial Ecology Tools (MGET) help
Text archives
| From: | "Jason Roberts" <> |
|---|---|
| To: | "'Mara'" <>, <> |
| Subject: | RE: [mget-help] GAM and GLM |
| Date: | Thu, 16 Apr 2009 15:37:33 -0400 |
Hi Mara,
I am the software engineer on the team here, not a statistics expert, but I
will try to answer your questions. If this doesn't help, we may be able to
pull someone else in.
I have not seen the errors you mention when fitting GAMs with MGCV using
splines with shrinkage, but I have only just started to use the shrinkage
smoothers. For more information on those, I recommend the MGVC documentation:
http://cran.r-project.org/web/packages/mgcv/mgcv.pdf, see gam.selection
discussion starting on page 39, and check out the references at the end of
the discussion. If you can reproduce the error reliably, we can enable some
additional logging output to obtain more details. If that does not provide
any clues, we can try contacting the MGCV author, Simon Wood, directly.
My colleague, a statistics postdoc, worked with the shrinkage smoothers last
summer. She found the worked ok in some situations but exhibited strange
behavior in certain circumstances. Because she was not able to fully
understand what was going on, she ended up implementing a more traditional
model selection strategy (stepwise model selection that minimized the UBRE
score).
The gam package predates MGCV and was written by the inventors of GAMs,
principally Trevor Hastie I believe. It does not support shrinkage smoothers.
If you try to use those you should get an error, or at least some unexpected
results. My stats colleague seemed more excited about the MGCV package, and
said that Simon Wood is more actively working on improving it, while the gam
package does not seem to be under active development.
If you are seeing Df=1 for your terms in the GAM package, I believe it means
the fitting algorithm determined that linear fits were appropriate for your
model terms. This will occur, of course, if you fit the model without using a
smoothing function. But if you do use a smoothing function and the Df
approaches 1 for all terms, you could just as well use a GLM for that model
and achieve similar results. (But remember, I am not the stats expert here!)
Regarding the cook's distance plot, I will see if there is a way to get it
label more than three samples. Ideally, how many would you like it to label?
Jason
-----Original Message-----
From: Mara
[mailto:]
Sent: Thursday, April 16, 2009 1:23 PM
To:
Subject: [mget-help] GAM and GLM
Hello!
I use mget to predict the occurence of species and Jason already helped me
with
inital problems. I have to admit that I am a beginner in statistic modeling
and
also just started using R. Please forgive me if I thus address minor problems
here (and that I wrote half a novel).
When I want to fit a GAM using the mgcv package I sometimes have problems when
adding splines with "shrinkage". (Models with exactly the same input but
without shrinking run perfectly.) I used variables separately to see if I can
track down the problem but here the models will always run. Depending on the
combination of variables I get one of the following error messages:
RPy_RException: Error: NA/NaN/Inf in foreign function call (arg 3)
or
RPy_RException: Error: no valid set of coefficients has been found:please
supply starting values
As far as I understand this normally means there are missing values or maybe
typing errors but like I said before the input is always the same (and it
works
without shrinkage).
On the other hand, using the same predictors but a difference response will
work perfectly even with shrinkage...
Using the gam package I don't get results at all but Df=1 for all predictor
variables:
Df
(Intercept) 1
Depth 1
...
What does this mean?
Last but not least I have a comment/question regarding GLMs. I used cook's
distance to identify outliers. Row numbers are used to identify samples and
numbers of the first three samples with highest values are plotted. As far as
I
can see it there is no way to identify the other numbers/samples. Only when I
remove the first three outliers (using the "where clause") I can see the row
numbers of the next three samples with highest values. Unfortunately, the row
numbers are now not identically to the input table as three samples were not
considered. Identification of the propper row can be very time consuming. Is
there a way to improve this (other than changing the input table)?!
Thanks already for reading all of this! Am looking forward to help, thanks!
I am the software engineer on the team here, not a statistics expert, but I
will try to answer your questions. If this doesn't help, we may be able to
pull someone else in.
I have not seen the errors you mention when fitting GAMs with MGCV using
splines with shrinkage, but I have only just started to use the shrinkage
smoothers. For more information on those, I recommend the MGVC documentation:
http://cran.r-project.org/web/packages/mgcv/mgcv.pdf, see gam.selection
discussion starting on page 39, and check out the references at the end of
the discussion. If you can reproduce the error reliably, we can enable some
additional logging output to obtain more details. If that does not provide
any clues, we can try contacting the MGCV author, Simon Wood, directly.
My colleague, a statistics postdoc, worked with the shrinkage smoothers last
summer. She found the worked ok in some situations but exhibited strange
behavior in certain circumstances. Because she was not able to fully
understand what was going on, she ended up implementing a more traditional
model selection strategy (stepwise model selection that minimized the UBRE
score).
The gam package predates MGCV and was written by the inventors of GAMs,
principally Trevor Hastie I believe. It does not support shrinkage smoothers.
If you try to use those you should get an error, or at least some unexpected
results. My stats colleague seemed more excited about the MGCV package, and
said that Simon Wood is more actively working on improving it, while the gam
package does not seem to be under active development.
If you are seeing Df=1 for your terms in the GAM package, I believe it means
the fitting algorithm determined that linear fits were appropriate for your
model terms. This will occur, of course, if you fit the model without using a
smoothing function. But if you do use a smoothing function and the Df
approaches 1 for all terms, you could just as well use a GLM for that model
and achieve similar results. (But remember, I am not the stats expert here!)
Regarding the cook's distance plot, I will see if there is a way to get it
label more than three samples. Ideally, how many would you like it to label?
Jason
-----Original Message-----
From: Mara
[mailto:]
Sent: Thursday, April 16, 2009 1:23 PM
To:
Subject: [mget-help] GAM and GLM
Hello!
I use mget to predict the occurence of species and Jason already helped me
with
inital problems. I have to admit that I am a beginner in statistic modeling
and
also just started using R. Please forgive me if I thus address minor problems
here (and that I wrote half a novel).
When I want to fit a GAM using the mgcv package I sometimes have problems when
adding splines with "shrinkage". (Models with exactly the same input but
without shrinking run perfectly.) I used variables separately to see if I can
track down the problem but here the models will always run. Depending on the
combination of variables I get one of the following error messages:
RPy_RException: Error: NA/NaN/Inf in foreign function call (arg 3)
or
RPy_RException: Error: no valid set of coefficients has been found:please
supply starting values
As far as I understand this normally means there are missing values or maybe
typing errors but like I said before the input is always the same (and it
works
without shrinkage).
On the other hand, using the same predictors but a difference response will
work perfectly even with shrinkage...
Using the gam package I don't get results at all but Df=1 for all predictor
variables:
Df
(Intercept) 1
Depth 1
...
What does this mean?
Last but not least I have a comment/question regarding GLMs. I used cook's
distance to identify outliers. Row numbers are used to identify samples and
numbers of the first three samples with highest values are plotted. As far as
I
can see it there is no way to identify the other numbers/samples. Only when I
remove the first three outliers (using the "where clause") I can see the row
numbers of the next three samples with highest values. Unfortunately, the row
numbers are now not identically to the input table as three samples were not
considered. Identification of the propper row can be very time consuming. Is
there a way to improve this (other than changing the input table)?!
Thanks already for reading all of this! Am looking forward to help, thanks!
Archives powered by MHonArc.