Anomaly detection in cooling process data without exact labels

3

I have a data set where I look at the cooling of a process. The starting temperature may vary between 580 and 180 degrees. I know that at some point the cooling system failed (see examples in the plot) and I tried to predict this failure.

As the data is completely unlabeled (the failure itself had to be detected manually, too), I ran unsupervised algorithms on it.

To give an idea about the data, here is a plot including the results of one example of unsupervised learning. The top plot show two parameters that I extracted for each process, which varies in duration. The colors which resemble the position in the bottom plot result from an unsupervised learning algorithm using only 30 minutes of data, append the last value as often as possible if less than 30 minutes were available.

The top plot shows the aim temperature of the process, the middel one shows the smallest value of the derivative, giving a clear indication that around process no 800 the cooling system was broken. The different colors results from unsupervised clustering using AgglomerativeClustering, Ward, and 8 clusters as the aim

The example results of three clusters would look like this (to give an idea about the shape of the data)

The temperatures of different clusters within the observed 30 Minutes

Up to know, I have changed the algorithms between AgglomerativeClustering, Birch and DBSCAN with various parameters.

Almost all clusters indicated that the failing of the cooling system occured by showing new or unusal clusters during the time the problems occured (see here clusters 4 and 7), but none of them showed a similar behaviour predicting the failure.

This led me to the following quesions:

  1. Would the same procedure using (obviously unsupervised) neural networks be an option and if so, how (I am working in python)?

  2. What other approaches could there be to handling this problem?

  3. At what point can I savely say that predicting the failure (not merely detecting it) is not contained in the data? I assume this is true as the processes are very different and any minor change in cooling due to a system that is about to fail would remain undiscovered as other parameters have a much bigger influence, but I'd very much appreciate any opinion on this.

Edit: Sample Data

   [[565. , 565. , 564. , 555. , 542. , 527. , 511.5, 496. , 460. ,
    434. , 413. , 393. , 376. , 359. , 344. , 329. , 315. , 303. ,
    291. , 279. , 268. , 258. , 249. , 239. , 231. , 222. , 214. ,
    207. , 200. , 193. ], #would go on with 188,...
   [540. , 540. , 539. , 531. , 520. , 508. , 496. , 494. , 456. ,
    436. , 420. , 404. , 390. , 377. , 364. , 353. , 341. , 331. ,
    321. , 312. , 303. , 295. , 286. , 279. , 271. , 263. , 263. ,
    263. , 263. , 263. ], #the process was ended too early, 263 got repeated to match the format
   [530. , 530. , 529. , 520. , 509. , 495. , 455. , 427. , 405. ,
    384. , 365. , 348. , 332. , 317. , 302. , 288. , 275. , 263. ,
    252. , 242. , 232. , 222. , 213. , 204. , 196. , 188. , 181. ,
    174. , 168. , 161. ], #would go on with 154
   [181. , 174. , 165. , 158. , 152. , 147. , 146. , 141. , 137. ,
    132. , 128. , 125. , 121. , 118. , 114. , 111. , 109. , 106. ,
    103. , 101. ,  98. ,  96. ,  94. ,  92. ,  91. ,  89. ,  87. ,
     85. ,  84. ,  82. ]] #would go on with 81

Eulenfuchswiesel

Posted 2018-04-04T11:46:28.690

Reputation: 181

Add the sample data else the figures aren't that Intuitive.. – Aditya – 2018-04-04T15:40:45.753

No answers