2

I have a huge time series (about 30 million) of network paths with the following format:

timestamp, path, latency

The path is a sequence of IP address, so it can be represented either as a string or an array of integers. Currently the data are stored in text files which makes it very slow the analysis and querying of paths. It was suggested to me to use a timeseries database (TSDB), such as InfluxDB or OpenTSDB, to store them efficiently, but some background reading I did suggests that TSDBs are appropriate for numerical values. For instance OpenTSDB mentions:

OpenTSDB is a time series database. A time series is a series of numeric data points of some particular metric over time.

Is there any optimization I'll gain from using a TSDB instead of a relational DB in my case, and generally for timeseries that include non-numerical values?

The main queries I plan to do is basically to get all the paths between two timestamps, check if there are path changes, and how this changes affect the lattency. Additionally I may need to search for path with specific hops (e.g. select all records where the path includes the IP hop 1.2.3.4), or all the paths with latency over a certain threshold.

Vasilis
  • 239
  • 1
  • 8

1 Answers1

1

Yes. Time-series database support grouping over categoric (non-numeric) elements.

For example, let's say there was a time-series database that stored the temperature reading from multiple IoT sensors - the sensor name would be a string (hence, non-numeric). A filter or group-by operation can be performed on the database for this particular sensor due to storing it.

However, in your specific example, you use an IP address.
IP address are numeric.
IPv4 address span 32 bits, hence, you can store them as 32 bit integers if you so wished to. And because searches and subnetting can be abstracted to integer arithmetic any operation that is done on an IP address can be done to an integer.

If you want to search for a path with specific hops, just search for a list of integers. You can even extend this to search for a list of integer within a range (hops within a specific subnet).