QuestDB High CPU Usage with Table-Per-Metric Approach (Ex-InfluxDB Refugee)

Background

We’re enterprise users who recently started considering our options after InfluxData limited features in their V3 core offering. Our application tracks system metrics in a radar control system, writing data at a modest rate (approximately 4 flushes/second totaling ~612 rows). Each DB instance will be self-contained on the same system the controller runs on. Resource usage is extremely modest without QuestDB in use.

Problem

Despite this relatively low write volume, QuestDB is consuming nearly all available CPU on our 12-core test VM during write operations, leaving insufficient resources for other processes. According to QuestDB performance benchmarks, our write volume should be well within capabilities and we have not changed the amount of data going into Quest vs what we were doing with InfluxV1. Having this similar approach with Influx, you never even noticed it looking at htop or top for example and the system was very usable.

Implementation Details

Schema Approach

We’re using a table-per-metric approach (similar to InfluxDB V1), where each metric gets its own table:

CREATE TABLE IF NOT EXISTS ${metricName} (
  timestamp TIMESTAMP,
  value ${dataType}, -- BOOLEAN, STRING, or DOUBLE based on metric type
  flags INT,
  faultable BOOLEAN,
  faulted BOOLEAN,
  interlock STRING
) TIMESTAMP(timestamp)
PARTITION BY HOUR
TTL ${ttlValue}
WAL
DEDUP UPSERT KEYS(timestamp);

Data Insertion Approach

  • Using the QuestDB Node.js client (@questdb/nodejs-client)
  • ILP (InfluxLine Protocol) over HTTP for data ingestion
  • We get an entire sampled report with all system metrics contained within around every 250ms
  • Each metric is inserted separately (table per metric)
  • We’re using auto_flush=off and handling flushes manually

Code Sample

Our insertion logic looks like:

async insertBite(timestamp, items) {
  // Create tables if needed (basically a no-op after first BITE report comes in)
  const tableCreateSuccess = await this.createBiteTables(items);

  if (tableCreateSuccess) {
    // Write each item to its own table
    for (const item of items) {
      this.sender.table(item.name);
      // Set column values based on data type
      switch (item.data_type) {
        case 0: this.sender.booleanColumn("value", Boolean(item.value)); break;
        case 3: this.sender.stringColumn("value", item.value); break;
        default: this.sender.floatColumn("value", item.value);
      }
      this.sender.intColumn("flags", item.flags);
      this.sender.booleanColumn("faultable", Boolean(item.faultable));
      this.sender.booleanColumn("faulted", Boolean(item.faulted));
      this.sender.stringColumn("interlock", item.interlock);
      await this.sender.at(moment(timestamp).valueOf(), "ms");
    }
    
    // Flush after the sender buffer is filled with all the items from the report
      try {
        await this.sender.flush();
        this.logger.log("log", "BITE report logged to QuestDB.");
      } catch (error) {
        this.logger.log(
          "error",
          "There was an error logging a BITE report to the database:\n" + error
        );
      }
    }
  }
}

While this is running, the server basically becomes unusable during the inserts which are constant. We have no periods where there isn’t data coming in as this data is sampled at regular intervals.

Virtual Test Environment

OS: Ubuntu Server 24.04
CPU: 12-core
Memory: 32GB
Disk: QCOW2 on ZFS with Optane-backed ZFS intent log (SYNC ON) over 10Gbit/s on NFS.

Questions

  1. Is our table-per-metric approach causing excessive CPU usage? Is a single-table approach more-so the expected approach with QuestDB?

  2. Are the WAL and DEDUP UPSERT KEYS options particularly CPU-intensive?

  3. Are there server configuration parameters that we should be adjusting to reduce CPU usage?

  4. Is there a more efficient way to structure our insertion code for a table-per-metric approach?

  5. Are there any known issues or best practices when migrating from InfluxDB to QuestDB that we should be aware of?

Any guidance would be tremendously appreciated. We’ve invested significant development in our software for QuestDB as a potential time-series database solution, and we were hoping to commit to making it work efficiently in our environment as other offerings just don’t align with our use case as well.

Hi @kowen , thanks for your detailed post!

1

Table per metric is not the way to go here, especially since you receive all of the metrics in one batch every 250ms.

Use a single table, and differentiate the metrics using a SYMBOL column. Make sure you set the capacity appropriately: ALTER TABLE SYMBOL CAPACITY | QuestDB

How many metrics do you have, out of interest?

Also, are you able to enforce an append-only model? If so, you can avoid write amplification and more CPU-intensive O3 writes.

2

No, WAL is plain necessary. DEDUP is around an 8% overhead with TSBS schema, so relatively light.

3

Not in this case, just a design change.

4

Larger batches to a single table - for example, grouping four 250ms batches into one flush per second. With ILP/HTTP, by default, autoflush uses 75k rows or 1 second intervals as the flush trigger - it is for a reason! ILP/TCP can be more CPU-efficient, but you give up error feedback and handling, and the trade-off shouldn’t be necessary for your data volume.

5

It can be quite use case dependent, but in general, QuestDB is a bit more vertical - fewer tables. Influx <= 2 uses an index-per-metric approach, which works well if you write data on a per-metric basis, and not so well if you have many metrics written at once. We are the other way round, and can scale much higher with single tables.

Nulls also take space, so avoid sparse rows if you can.

Hi nwoolmer,

Thanks for your input! The system that we are doing the testing with is currently outputting 151 unique metrics per report. However, this number of metrics can be highly variable per system that we build as each product line and customer has different requirements for what needs to be reported. We are doing append only via ILP and data does not need to be modified in an any way after insertion. We would also likely be okay with a 1 second flush to the DB, however I would definitely need to consult with one of our systems engineers to make sure that won’t impact monitoring/real-time analytics of the system in a way that would harm any of our processes.

The single table approach and use of symbols would be okay for us except one glaring issue to me is that we have a variable data type on our “value” column. Not every metric’s value is a number. Could be a boolean sometimes, could be a string sometimes, etc… We set these types on table creation routine after we receive the first report. I’ve thought of one way that involves multiple tables for these different types but that way of doing it doesn’t quite set well with me for some reason.

Hope this can provide a little more insight for you. We are willing to try different things at this stage. We are also evaluating TimeScale currently. It doesn’t give any issues with the table-per-metric schema and CPU usage but it seems to not be super efficient at storing this frequency of data on disk. We were very concerned to already have 100gb of data after only about a week or so of data written. We are targeting ~3 weeks of data retention. Influx v1 was able to accomplish the same reports over 3 weeks in around 90gb.

Thanks!

Flushing every 250ms or even quicker is fine, you have relatively low volumes. It is more for the high throughput solutions where the larger batches become essential. You should be able to commit and apply those batches well within 250ms, assuming storage latency is sensible!

151 metrics is cool, SYMBOL is good up to around 100k or so. You can go further, but usually you want to start splitting into smaller tables.

Make sure you set SYMBOL capacity for the customers who need it - it is only 256 by default. Exceeding this limit will start to slow ingestion. Luckily, you can now change it in place, if you forget.

For the variable data type, if really depends on what you are doing. You can store it all as VARCHAR and then cast it to numbers on demand. You can also have several value columns, one for each type. You will store some extra null values, but deploying on zfs will handle that for you.

Re: Timescale, this option can make sense if you are an existing Postgres user (and not using RDS). Whether it is the right fit long term depends on a number of factors. We are shifting our default backing format to Parquet (you can do so already, but it is currently a beta feature), to avoid vendor lock-in. In the Enterprise version, this becomes a fully integrated cold storage solution.

Our business model is also not to drive users to a managed SaaS, but to stay open source first, and encourage self-hosting. We also have a semi-managed solution, BYOC, where we deploy infrastructure within your own cloud account. This keeps your costs under control (you own the hardware) and your data safe (your own the data).

The way you receive and ingest your data is a good fit. If you received it already on a per-metric basis, instead of in one group, then it would be a little more annoying, but your setup is the happy path for QuestDB.

We can help you with your PoC and see how you get on. I would suggest looking at our docs, particularly the concepts and SQL extensions. Also, take a look at our roadmap and recent talks, to see if our direction is the right one for you.

One feature that has a lot of potential, are our incremental materialized views! Furthermore, we are shifting away from ILP to a binary protocol, adding n-dimensional arrays (like NumPy), and much more.

Make sure you deploy on zfs until we make Parquet GA, and let me know if you want to discuss deployment options some more.

Thanks!

One more thing to explore - you can bring in data row-modelled, and then split it into other tables with more favourable querying layouts.

Materialised Views will allow you to do this automatically - you define a simple downsampling query i.e. give me the average value for this symbol in 1 hour chunks. Then any time new data is written, the view is incrementally updated based on the new data.

This happens eagerly, usually with sub-second latency. It also scales up well, so more cores and smaller sampling units will get only quicker.

Better yet - the views have separate TTL to the base tables. So you could keep the raw data for 3 weeks, then keep a view for the latest 7 days, for faster Grafana charts.

There is a guide here: Materialized views | QuestDB