## Extracting model equation and other data from 'glm' function in R

2

1

I've made a logistic regression to combine two independent variables in R, using pROC package and I obtain this:

 summary(fit)

Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)

Deviance Residuals:
Min       1Q     Median     3Q      Max
-1.5751  -0.8277  -0.6095   1.0701   2.3080

Coefficients:
Estimate  Std. Error z value Pr(>|z|)
(Intercept) -0.153731   0.538511  -0.285 0.775281
X           -0.048843   0.012856  -3.799 0.000145 ***
Y            0.028364   0.009077   3.125 0.001780 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 287.44  on 241  degrees of freedom
Residual deviance: 260.34  on 239  degrees of freedom
AIC: 266.34

Number of Fisher Scoring iterations: 4

>     fit

Call:  glm(formula = Case ~ X + Y, family = "binomial", data = data)

Coefficients:
(Intercept)       X            Y
-0.15373     -0.04884      0.02836

Degrees of Freedom: 241 Total (i.e. Null);  239 Residual
Null Deviance:      287.4
Residual Deviance:  260.3        AIC: 266.3


Now I need to extract some information from this data and I'm not sure about how to do it. First, I need the model equation: suppose that fit is a combined predictor called CP; could it be CP=-0.15-0.05X+0.03Y?

Then, the resulting combined predictor from the regression should present a median value, so that I can compare median from the two groups Case and Controls which I used to make the regression (in other words, my X and Y variables are N-dimensional with N = N1+N2, where N1 = Number of Controls, for which Case=0, and N2 = Number of Cases, for which Case=1).

IMHO, this question better fits Cross Validated or StackOverflow SE sites. – Aleksandr Blekh – 2015-04-09T15:29:58.890

3

In order to extract some data from the fitted glm model object, you need to figure out where that data resides (use documentation and str() for that). Some data might be available from the summary.glm object, while more detailed data is available from the glm object itself. For extracting model parameters, you can use coef() function or direct access to the structure.

UPDATE:

From Princeton's* introduction to R course's website, GLM section - see for details & examples:

The functions that can be used to extract results from the fit include

- 'residuals' or 'resid', for the deviance residuals
- 'fitted' or 'fitted.values', for the fitted values (estimated probabilities)
- 'predict', for the linear predictor (estimated logits)
- 'coef' or 'coefficients', for the coefficients, and
- 'deviance', for the deviance.


Some of these functions have optional arguments; for example, you can extract five different types of residuals, called "deviance", "pearson", "response" (response - fitted value), "working" (the working dependent variable in the IRLS algorithm - linear predictor), and "partial" (a matrix of working residuals formed by omitting each term in the model). You specify the one you want using the type argument, for example residuals(lrfit,type="pearson").

*) More accurately, this website is by Germán Rodríguez from Princeton University.

given that fit-< glm(formula = Case ~ X + Y, family = "binomial", data = data)

what is fitted(fit)? Could it be what i'm looking for? – Ciochi – 2015-04-09T17:55:57.983

@Ciochi: No. In your example above, a fitted glm model object would be the fit object. See UPDATE section in my answer. – Aleksandr Blekh – 2015-04-10T00:19:10.900

@Ciochi: So, my suggestion is to use standard access functions (see UPDATE) for extracting traditional information and str() and low-level (direct) access (via \$), if you need other information, not accessible via high-level functions. I hope that this clarifies things. – Aleksandr Blekh – 2015-04-10T00:29:36.703

thanks for the answers, i appreciate it. I've already read the Update section you've added, but i still dont get it, mostly because i'm pretty noob on statistics, i just use basic stuff like t-test etc. Indeed, i dont get what the fit object is. Is it a new variable? – Ciochi – 2015-04-10T00:42:13.957

@Ciochi: You're welcome. Feel free to accept/upvote my answer, if it is helpful. Yes, fit is new variable that gets created and initialized with the return value of the glm() function (the return value is an object of class glm). – Aleksandr Blekh – 2015-04-10T01:49:19.313

I still have some problems in getting the value of this new variable fit. Reading elsewhere it seems to me that this variable fit is a variable of estimated probabilities, thus ranging from 0 to 1, as the values i get when i run fitted(fit). – Ciochi – 2015-04-10T10:49:28.203

@Ciochi: Sorry, but I don't quite understand what your current issue is. – Aleksandr Blekh – 2015-04-10T10:52:04.373

The problem is that i need to use those values for an upcoming proceeding. Being the new variable fit called Z, i need to to present data as Z: Controls vs Cases, mean ± sd vs mean ± sd, P<0.01, for example. – Ciochi – 2015-04-10T11:01:59.230

@Ciochi: I think that you might receive more attention and help with this question at the Cross Validated SE site. I can ask local moderator to migrate it, if you want. – Aleksandr Blekh – 2015-04-10T11:08:26.110