This is a fun problem. This is a time series and from this time series you want to identify the trigger of a certain event. So it is a binary classification problem. Based on the information from the specified window will a spike occur? Yes or No.
The first step is to set up your database. What you will have is a set of instances (which can have some overlap but to avoid bias it is best for them to be independently drawn) and then for each instance a human needs to label if there was a spike or if there was not a spike.
Then you need to identify the time window you want to use for your time series analysis. You have done this and decided 30 minutes is a good start.
Now, you have 6 waveforms in a 30 minute window from which you can extract data to get information about your classification. You can use the raw data samples as your features, but this is WAY TOO many features and will lead to poor results. Thus you need some feature extraction, dimensionality reduction, techniques.
There are a million ways you can extract data from these waveforms. First, ask yourself, as a human what are the telltale signs that these other waveforms should have which would mean a spike would arise. For example, in seismic data, if you see agitation in a waveform from a neighboring town then you should expect to see agitation in your town soon.
In general, I like to extract all the basic statistics from my waveforms. Get the mean, standard deviation, fluctuation index, etc. Get whatever you think might help. Check how these statistics correlate with your labels. The more correlation the better they might be. Then there are some very good techniques for extracting time and frequency information from your time-series. Look into envelope mode decomposition and empirical mode decomposition. I have used empirical mode decomposition successfully on some time series data and obtained far better results than I expected.
Now even though you have your reduced feature space you can do better! You can apply some dimensionality reduction techniques such as PCA or LDA to get a lower dimensional space which may better represent your data. This might help, no guarantees.
Now you have a small dataset with instances that are a Frankenstein concoction which represents your 6 waveforms across the 30 minute window. Now you are all set to select your classifier. You will want a binary classification algorithm, luckily that is the most common. There are many to choose from. How to choose?
How many instances do you have?
$\# instances > 100* \#features$?
Then you are all set to use a deep learning technique such as neural networks, 1D convolutional neural networks, stacked autoencodders, etc...
Less than that!!!!
The you should stick with shallow methods. Check out kernel support vector machines, random forests, k-nearest neighbors etc..
Common misconception: A shallow method CAN and WILL perform better than a deep learning technique if you have properly selected your features. feature extraction is the most important aspect of a machine learning architecture.
I want to use anomaly detection!
This would also work and there are some good techniques that would do this. However, the nature of anomaly detection is to learn the distribution of the nominal case. So you would feed your algorithm all the instances in your dataset that did not result in a spike. Then from this your algorithm would be able to identify when a novel instance is significantly different from this nominal distribution and it will flag it as a n anomaly. This would mean that a spike will occur in your context.
You can also use more rudimentary anomaly detection techniques such as a generalized likelihood ratio test. But, this is kind of old-school.