Whenever your task includes something like "...when XY will fail...", i'd say go for survival analysis first, it is easy and fast and it will give you overview of your data.

With your data you can either turn them into intervals to be able to plot survival curves, or proceed directly to Cox regression, which can work with continuous data and will yield the hazard ratio.

You can start with Kaplan-Meier curve (as bonus there are Confidence intervals):

```
km <- survfit(Surv(datetime, Failed) ~ 1,conf.int=0.90, conf.type="log-log", data=Dataset)
summary(km)
plot(km, xlab="month", ylab="estimated S(t)", main="Kaplan-Meier with log-log, C.I.=90%")
```

The curve will look somewhat like this:

Further you can split the curve to see if any of your parameters have different influence. You can do that by simply replace the `~1`

with something like `~AttributeX`

So you should get this kind of plot:

Of course R will give you various tests and p-values as well, such as Log-Rank to verify whether the influence is significant (`survdiff`

).

Then you can proceed to Cox Regression, which will tell you what is the hazard ratio (=whether the attribute influences the hazard positively or negatively and to what extent). It looks like this in R:

```
cox<-coxph(Surv(datetime, Failed)~AttributeX, data=pbc)
summary(cox)
```

It is good practice to verify the assumptions - proportional hazard and functional form (again R will give you p-values or you can plot the residuals - Martingale or Schoenfeld).

If you are interested to know **WHEN** the event will occur, search for Accelerated Failure Time models, which will give you the parametric survival time distribution, where you can simply put the time and obtain the probability.

In R:

```
wei<-survreg(Surv(datetime, Failed)~ AttributeA + AttributeB + AttributeC,data=Dataset)
```

There are more possible distribution, you can check which one fits the best your data. I have never done the prediction, but there is a function `predict`

which is described in documentation or there are already similar questions with answers on Crossvalidated, such as this.

Describe your data better. After a machine has "Failed=1" is that the end of it - like a death in a classical survival analysis? Do you have several days for each machine where it didn't fail, and then either no "Failed" for machines that are still working (in survival analysis these are "censored") or one "Failed" record for the failures? You may have data suitable for survival analysis with time-varying explanatory variables (your "pressure" etc). Standard methods exist, but can't predict

whena machine will fail, only probabilities of fails within time spans. – Spacedman – 2016-09-29T07:46:13.127Also, you might do better asking on the Statistics Site (stats.stackexchange.com) rather than this Data Science. – Spacedman – 2016-09-29T07:47:55.293

I'm voting to close this question as off-topic because it probably belongs on stats.stackexchange.com as it seems to be about simple classical statistics and there are zillions of experts on that over there. – Spacedman – 2016-09-29T07:48:41.457