## How to interpret two continous variables output using GAM?

5

I really need help with GAM. I have to find out whether association is linear or non-linear by using GAM. The predictor variable is temperature at lag0 and the output is cardiovascular admissions (count variable). I have tried a lot but I am not able to understand how to interpret the graph and output that I am getting.

I tried this formula using mgcv package:

model1<- gam(cvd ~ s(templg0), family=poisson)
summary(model1)
plot(model1)


So here is the output for summary that I am getting:

Family: poisson

Formula:
cvd ~ s(templg0)

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.195669   0.004877   655.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
edf Ref.df Chi.sq  p-value
s(templg0) 3.422  4.295  57.23 2.93e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.0152   Deviance explained = 1.68%
UBRE =  1.016  Scale est. = 1         n = 1722


Can someone please explain the output in detail. What this output is explaining? and also can someone help what this plot (picture attached) is showing? Please be kind as I have invested a lot of time but can not find how to interpret this.

0

What you do here is to use smoothing splines regression. Have a look at the book "Introduction to Statistical Learning" Chapter 7.5 for a very good overview on the method.

The s-function in GAM allows you to specify how the GAM is fitted. You have not supplied k, so some default is chosen:

the dimension of the basis used to represent the smooth term. The default depends on the number of variables that the smooth is a function of. k should not be less than the dimension of the null space of the penalty for the term (see null.space.dimension), but will be reset if it is. See choose.k for further information.

GAM basically does several linear regressions (specified by k) along the x-axis if you want to say so. Thus, GAM allows to model wild non-linearity. If you want to check linearity in your data, you should check different values of k and look at the plot.

GAM Example:

library(gam)
library(ISLR)
df = ISLR::Auto

# GAM with 10 knots
gam5= gam(horsepower ~ s(mpg, 10), data=df)
summary(gam5)
plot(gam5, se=T)


GAM Result:

Would you conclude that your model is linear? No. In the range between about ]mpg=0 to mpg=20[ there is a linear relation as well as betweeen ]mpg=20 to mpg=40[. But linearity does not hold for the entire range of data. So I would differentiate these segments, e.g. by dummy encoding and interaction terms.

Note that the y-axis is rescaled. So there is no natural interpretation of the y-axis here.

Comparison to non-parametric (NP) estimation:

To deal with non-linearity non-parametric regression is an obvious alternative. What happens if we do NP?

# Nonparametric regression
library(SemiPar)
fit <- spm(df$$horsepower ~ f(df$$mpg))
plot(fit)


NP Result:

As you can see NP delivers almost the same result. However, the y-axis in the figure has a natural interpretation, which can be useful.

First make sure that you check different values for k in s(...,k), so check different number of knots and see how the figure changes. Also have a look at the book to understand the background.

In your current figure, I see some kinks at about x=10 and x=20. However, I would not say that this is severe non-linearity (but there is non-linearity in the data). Generally, if you can draw a line over the plot range (along the x-axis), and if this line is not outside your confidence bands, you can claim for a linear relationship.

Peter, Thank you very much for the detailed answer. Can you also please tell me what things to look in output. Parametric cofficients etc what does these values are showing in output. and this graph is hard for me to understand. how to interpret this graph? why there are three lines and what each line is depicting and what would be the criteria for linear or non-linear association – Hasan Sohail – 2019-07-25T12:42:56.507

@Peter I don't think that non-parametric regression offers a solution to the non-linear association. – Subhash C. Davar – 2020-03-29T08:37:15.193

Furthermore, I appreciate your answer to the unstructured question. The association is linear or non-linear depends on interpretation of the variables involved. Hasan is apparently concerned about the increase or decrease in temperature and number of admissions(count data). Fall in temperature results in higher admissions - inverse relationship. Treating count variable as continuous variable for parametric regression- GLM- is fraught with danger. Non-parametric regression is better which you suggested. I do not know if there is any connection between GAM model and non-parametric models. – Subhash C. Davar – 2020-03-29T09:09:16.800

0

The z-value (see glm output) shows predictor variable (temperature at lag0) has a statistically significant positive effect on the cardiovascular admissions. A substantive value of chi-square also support the idea of major effect of temperatures.(Note that p-value indicates the observed sample is random i.e. randomly distributed). The Graph is showing the GAM part . In no case,it evaluates whether there is a linear or the non-linear association.

Please note p value indicates goodness of fit of observed distribution with a specified value of alpha (say, 5% significance level). Chi-square indicates the association of two variables – Subhash C. Davar – 2020-03-28T13:50:15.517