0

# Introduction

Given is a vehicle which is moving past a station. If the vehicle passes the station (for simplicity determined by GPS) it sends RF-signals, to the station saying "Hello I am here now" marked as A and "Bye I am leaving" marked as B. The station is just a µ-Controller logging the received signals in a file with a timestamp and a corresponding letter (A or B). The time on the µC is set by hand and drifts over time. The vehicle also logs the sent messages with a timestamp. Usually this works well, 98% of all bypasses are good (A followed by B).

But there are a few stations 5% or so, for which only about 60% of bypasses are good. Now my task is to find out why this is not working correctly!

# Approach

So I plotted the data: On the x-axis is the period under review. On the y-axis is the minimal time delta, in seconds, from the station to any corresponding (A or B) log entry in the vehicle, for one station entry.

## Time-drift correction

Thinking that the slope you can see is the drift over time and the offset is the time-shift cased by manually setting the clock on the µC, I found the function y = k*x + d and corrected the shifts.

So I got this: (which does not look better to be honest)

Same axis as before. In red is the data which has a minimal delta over 60 seconds.

## Synchronization

Now thinking that the time is not synchronized perfectly, I experimented with some algorithms. Namely DTW from github and Biopython's pairwise2 needle algorithm.

### DTW

The input data for DTW was the ordered (by timestamp) sequence of the differences between an A or B.

np.delete(n1.diff().to_numpy(),0) whereby the n1 is a Pandas Dataframe. It found missing data, both on the vehicle side and on the station side.

### Needleman–Wunsch algorithm

The input data for align-class was just the sequence of A & Bs without the timestamp.

It found missing data, both on the vehicle side and on the station side.

### Realizing

I was now realizing that the drift correction was unnecessary, as the timestamp is either not used (align) or a lagged difference is calculated (DTW). Also verified by plotting the DTW - path and the alignment-score of align before and after the correction.

# Question

What I want to know is, which algorithm can I use to determine a shift for synchronizing the data-sets, when assuming one data-set is perfect (the vehicle side). And which sub-set is probably the corresponding one on the station side.