Decode SQLite database blobs: how to start?

2

Updated: see bottom of this question.

Not sure if I am at the right channel here but I have good hopes someone here might be able to help me.

I am trying to process and analyse data from a Footscan system that is exported from the Footscan 9 Gait Essentials software to a .rsdb database. This database is a standard SQLite database with a different file extension.

I have no problems accessing the data via Python but the most interesting data (the raw data from the sensor) is stored as a 'blob' (see sqlite documentation for more details).

Exporting this data from SQLitePro to .csv shows that the data in the blob looks like this (although much longer):

<9c4e79cb cbd09de5 c380fc78 a74e819f ... ab2172c7 d9cf311a 357aebac>

My question is: what would be my first steps to try to discover how this data is encoded? And does anyone maybe recognize this type of encoding?

Extra hints: The data represents all forces of a force plate with 64x64 force sensors, sampled at 300Hz, so it's most probably a multidimensional array.


Update

I am making some progress. As commented below, the values are most probably hexadecimal values, but I'm not sure if the values are base 16: most values seem to be >10^9...

Furthermore: I found a table in the database called 'Contact' which linkes to entries in the blob table which contain the most hexadecimal values (>40000). My guess is that these are the entries which are recognized as actual steps. I extracted the data from one of these steps here (Contact id 2; blob id 24). This blob contains 50792 hexadecimal values.

This is the entry for this step in 'Contacts':

id: {35d9a73e-608d-4ba3-9823-de0540f11c93}
timestamp: 2
deleted: NULL
FrameWidth: 64
Frames: 24
FrameOffset: 12204
FrameCount: 214
FrameHeight: 64
FrameStoreWithContacts_Contacs: {5d10546e-5ef3-4469-b898-d7b9bd58458f}
OriginalId: {32ae0f73-8022-4151-92e7-510c0e86d09d}
orphan: false

Full export of the blob table can be downloaded here. The full .rsdb can be found here (you might have to rename to .sqlite). The tables 'Foot' and 'Region' in the full database might contain useful data too, but I'm not yet sure how...

Aart Goossens

Posted 2016-02-13T15:57:15.663

Reputation: 121

2Do you mean the encoding beyond the hexadecimal system of encoding bytes, or have you never seen hexadecimal before? – Spacedman – 2016-02-13T16:15:36.193

I have seen hexadecimal before but I am not sure if I understand your question completely. Can you elaborate? – Aart Goossens – 2016-02-14T11:15:13.780

1Convert the data to binary. If the data's not private, post it and I can show you what I mean. The first few bytes you provided don't suggest a file type, but emacs thinks it might be Japanese/JIS text. – None – 2016-02-14T15:49:07.197

1Its not clear from your Q whether you understand that "9c4e79cb" means anything to you (ie you understand its hexadecimal). So there's (at least) two "encodings" going on - first the representation of the data as hex, and then however the force plate measurements are arranged into that data. It could just be 64x64xN bytes. There's not really enough context in the Q to figure it out... – Spacedman – 2016-02-14T23:22:52.130

1I think its a bit tricky ... blobs can be of any structure inside. It helps that you expect a certain array of data (64x64) sampled at 300Hz, though how long is the data stored for? If you know how many seconds it is stored, and you know the blob size, you can then estimate the individual point resolution. – Marcus D – 2016-02-15T13:22:40.260

Thanks for all you comments so far! I understand that some information is lacking: I will post more data and some more details I found about the data tomorrow. – Aart Goossens – 2016-02-15T20:44:50.553

I updated my question with extra information, actual data and new insights. Any help is appreciated. I will keep you updated on my progress. I might also post my progress as a separate answer to not pollute the question too much. – Aart Goossens – 2016-02-16T20:44:49.300

The blobs are assorted lengths which aren't multiples of 64. The blob data looks random, with no obvious pattern or repeat or structure. That makes me think the blob data is compressed or encoded in some unknown manner, which makes sense since there's likely to be a lot of redundancy in the original measurements. Its possibly a raw chunk of gzipped data, but any of a number of compression algorithms might have been used. Have you asked the vendor? – Spacedman – 2016-02-17T12:59:05.813

Hi @Spacedman thanks for your reply. I emailed the vendor but didn't get a response yet. I am still hoping that they just 'cropped' the data to the sensor area where the foot actually was. In the table 'Foot' there are columns AreaWidth, AreaHeight and FrameCount but these values multiplied give >4 times the expected number of data points. – Aart Goossens – 2016-02-17T16:57:12.357

No answers