Transition from MongoDB Time Series Collections to InfluxDB

Question

With version 5.0 MongoDB's specialized Time Series Collections were introduced to deal with such data. As I already stored some sensor meta data (configuration, specification ...) in MongoDB, I decided to make use of these special collections to store sensor readings next to the sensor meta data.

According to the docs I used a single document for each sensor reading like this (pseudo code):

{
        "timestamp": timestamp,
        "value": value,
        "metadata": {
            "sensorId": sensor_uid,
            "unit": sensor_unit,            
            "type": sensor_type,
            "fromFile": reading_imported_from_file,
        },
}

Around 50 different sensors are read at the same time which results in 50 documents with equal timestamp but varying value and metadata.

I am currently working on migrating our time series data storage from MongoDB to InfluxDB as this seems to provide a sleeker API and has some basic data visualization already included. As already described above, in MongoDB I used a single document per sensor which might be considered as bad practice when using InfluxDB:

A measurement per sensor seems unnecessary and can add a significant amount of system overhead depending on the number of sensors you have. I’d suggest storing all sensor data in a single measurement with multiple fields, [...]

Based on this I came up with the following data structure to be passed to InfluxDB (Python dictionary pseudo code for influxdb-client):

{
    "time": 1,
    "measurement": measurement_name,
    "tags": {
        "location": location,
        "from_file": reading_imported_from_file,
    },
    "fields": {
        "sensor_1": reading_from_sensor_1,
        "sensor_2": reading_from_sensor_2,
        "sensor_3": reading_from_sensor_3,
    },
}

However, I did not figure out how to store the other meta data like sensorId, unit, or type. On the one hand side I could easily solve this by violating the before mentioned suggestion and use a single measurement per sensor. On the other hand side, from a relational perspective these meta information should be tied to the sensorId and be therefore accessible from a sensor configuration/specification database using the sensorId as a key. Unfortunately, these values can change throughout a single measurement or experiment due to changing device configurations on-site which are not reflected in the configuration database.

How could I solve this issue? Am I missing something or do I simply have to deal with this design/performance vs. ease-of-use tradeoff?

score 1 · Answer 1 · answered Mar 15 '22 at 16:06

I know the answer will be unpopular because it won't be an answer about InfluxDB as you might expect, but here I go.

You don't fit your use case to a database; instead, choose databases that fit your use case. They are not for you if you feel like fighting with them. A sleeker API and visualization should be additional features, not the major factor to overturn your decision unless that is the primary thing you want from the database. In the long run, you'll be better off with the one that feels more natural with your data.

By migrating to InfluxDB, your problem seems bigger or smaller, you choose.

Transition from MongoDB Time Series Collections to InfluxDB

1 Answers1