Interactive labeling/annotating of time series data

12

5

I have a data set of time series data. I'm looking for an annotation (or labeling) tool to visualize it and to be able to interactively add labels on it, in order to get annotated data that I can use for supervised ML.

E.g. the input data is a csv-file and the output is another csv-file of the format timestamp,label.

Therefore I need something like this:

1. to visualize data
2. to select a specific area
3. output the labels with timestamps

As an example:

Building such a tool in python will not take too long, however I was just wondering how other people solve this problem and maybe there are already nice OS tools for doing this. Thank you!

If you're plotting in python, chances are your best bet is to annotate in python, not in the OS. – Adrian Keister – 2018-09-13T15:56:58.013

@AdrianKeister When I wrote OS, I meant an open source project. – mibrl12 – 2018-09-14T11:31:00.077

I was about to ask the exactly same question before i found yours. I also need such a tool to annotate data for my thesis. Did you solve the problem yet? I was about to use django and write my own data labeleler. – dataddicted – 2019-03-03T10:06:39.797

@dataddicted I started writing the tool, however due to the small amount of data, I just labeled it manually and forgot about it for now ^^ please share the link to your github if you start doing it seriously ;) – mibrl12 – 2019-03-05T16:45:35.677

7

We had this same problem again and again at Geocene, so we came up with this open-source web app called TRAINSET. You can use TRAINSET to brush labels onto time series data. You import data in a defined CSV format, then label the data, and export a labeled CSV. You can also import a pre-labeled CSV if you're really just trying to refine labels. You can use the hosted version of TRAINSET at https://trainset.geocene.com or you can deploy it yourself by following the readme at https://github.com/geocene/trainset

It is awesome! Thank you for sharing and thank you for even hosting it with an example! – mibrl12 – 2020-01-14T13:24:36.700

You're welcome @mibrl12 – daterdots – 2020-08-06T16:15:10.453

3

I am currently developing a set of tools to annotate and detect patterns in time series data: https://github.com/avenix/WDK

check the AnnotationApp in 1-Annotation

Thank for your answer. However I would prefer a stand alone app, since Matlab is.. well.. Matlab :) – mibrl12 – 2019-03-22T13:00:20.370

I dont know what you mean with Matlab is... Matlab? Have you found a standalone annotation tool yet? If not, paying a Matlab licence and installing Matlab might be more cost- efficient than building your own tool. – Juan Haladjian – 2019-03-25T10:12:39.647

2

I also need such a tool to annotate data but did not found any suitable tool. Therefore, i wrote a small python app by myself, just abused matplotlib for this task.

I used matplotlib.use('TkAgg') and SpanSelector with my own onselect(xmin, xmax) method called for this task. Check this code example: https://matplotlib.org/gallery/widgets/span_selector.html

do you have it available on github? – mibrl12 – 2019-03-18T13:12:37.597

This actually works really well - you can combine with RadioButtons if you have different named regions. – David Waterworth – 2020-08-08T06:12:22.950

1

There is an open source platform for visualization called Grafana, that is a very powerful and flexible software used also for monitoring time series. They support annotation.

That tool is pretty powerful and versatile, you can read data from a variety of data sources.

Then once annotated as in the picture, you can query the Grafana annotation database to retrieve all the annotations/labels that you put thanks to the Grafana annotation API.

Bonus tip 1: you can add customised tags on your annotation so that you can get additional info on your data (e.g. anomaly_A, anomaly_B, flat_normal_data).

Bonus tip 2: you can also show only one specific kind of anomaly still in the same platform thanks to this functionality.

Future improvements: extension to this powerful features are in discussion, so that it will be even more easy to annotate in presence of diagrams displaying multiple time series at ones (e.g. anomaly of many time series).

Applications: anomaly detection labelling, medical signal annotation, stock market annotation, etc.

Unfortunately there is no UI on grafana labs yet :( – thinwybk – 2020-08-27T15:24:56.667

1

We proposed a modification/patch for this feature and we are discussing it with Grafana, if you are interested too you are welcome to join the discussion and leave a like. Thanks in advance https://github.com/grafana/grafana/issues/24674#issuecomment-655385423

– Matteo Paltenghi – 2020-08-28T21:49:48.293

Sounds interesting. I'll have a look at it. – thinwybk – 2020-08-29T10:32:23.420

0

I'm using axvspan() function from matplotlib.pyplot. Main disadvantage is a difficult configuration of text labels.

import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0,3.14,0.01)
s = np.sin(t)
plt.axvspan(t[12], t[100], facecolor='blue', alpha=0.2)
plt.plot(t,s,color='red')


How can this be made interactive? The author asked for "interactively add labels". – Martin Thøgersen – 2020-06-08T22:25:01.560

0

Nova can do it interactively. https://github.com/hcmlab/nova It's much more powerful than just labeling time-series data, but you can just do labeling with it. Also, I suggest you set the sample rate frequency to 1Hz. Best of Luck.

0

A little bit too late to the party but it's better than never. We've released a major version update to our time-series data labeling tool called Label Studio.

Now it supports a variable number of channels with millions of data points in each, with zoom/pan, region labeling, and instance (single event) labeling.

It works with different time-series data types, for example, time may come as a float or as a strangely formatted date, has multi-user support, and multi-label classification.

Please visit https://heartex.ai for the commercial version and https://labelstud.io/ for the open-source (right now needs some hand compiling)

Is this example from the commercial version? I notice it mentions "Time series" but when I run the open source version the closest I can find is audio? – David Waterworth – 2020-08-08T05:05:21.383