0

# Introduction

Given is a vehicle which is moving past a station. If the vehicle passes the station (for simplicity determined by GPS) it sends RF-signals, to the station saying "Hello I am here now" marked as **A** and "Bye I am leaving" marked as **B**. The station is just a µ-Controller logging the received signals in a file with a timestamp and a corresponding letter (**A** or **B**). The time on the µC is set by hand and *drifts over time*.
The vehicle also logs the sent messages with a timestamp.
Usually this works well, 98% of all bypasses are good (**A** followed by **B**).

But there are a few stations 5% or so, for which only about 60% of bypasses are good. Now my task is to find out why this is not working correctly!

# Approach

So I plotted the data:

On the x-axis is the *period under review*. On the y-axis is the minimal time delta, in seconds, from the station to any corresponding (**A** or **B**) log entry in the vehicle, for one station entry.

## Time-drift correction

Thinking that the slope you can see is the *drift over time* and the offset is the time-shift cased by manually setting the clock on the µC, I found the function `y = k*x + d`

and corrected the shifts.

So I got this:

(which does not look better to be honest)

Same axis as before. In red is the data which has a *minimal delta* over 60 seconds.

## Synchronization

Now thinking that the time is not synchronized perfectly, I experimented with some algorithms. Namely DTW from github and Biopython's pairwise2 needle algorithm.

### DTW

The input data for DTW was the ordered (by timestamp) sequence of the differences between an **A** or **B**.

`np.delete(n1.diff().to_numpy(),0)`

whereby the `n1`

is a Pandas Dataframe.

It found missing data, both on the vehicle side and on the station side.

### Needleman–Wunsch algorithm

The input data for `align`

-class was just the sequence of **A** & **B**s without the timestamp.

It found missing data, both on the vehicle side and on the station side.

### Realizing

I was now realizing that the *drift* correction was unnecessary, as the timestamp is either not used (`align`

) or a lagged difference is calculated (DTW). Also verified by plotting the DTW - path and the alignment-score of `align`

before and after the correction.

# Question

What I want to know is, which algorithm can I use to determine a shift for synchronizing the data-sets, when assuming one data-set is perfect (the vehicle side). And which sub-set is probably the corresponding one on the station side.