Which strategy with this Use Case

alb · April 16, 2025, 12:22pm

Hi all,
I’m a beginner with QuestDB, and I found it very interesting!!

I would like to understand if can solve issues in huge archiving data and performance quering

I have data file CSV with 230 Million of rows daily and more over the Timestamp I have values and a Name, composed like a path
Name MainFolder.SubFolder.SubSubFolder.Name

I’m using the structure below, I’m asking to the Community if I should add something (indexing or other) to get more performance in queries?

 CREATE TABLE TableName(
  a_Name string,
  a_Value int,
  a_Timestamp TIMESTAMP,
  ) TIMESTAMP(a_Timestamp )
PARTITION BY DAY;

Thanks’ in advance
Alb

nwoolmer · April 16, 2025, 12:24pm

If your paths have a limit to how nested they are, it might be a good idea to unroll them into separate columns, with the higher paths as SYMBOLs (if there aren’t too many distinct values). Filtering on SYMBOL columns is much faster than on strings.

You can also try searching using LIKE on the single path column, and see how fast it is.

It is better to use VARCHAR rather than STRING if you are not using SYMBOL. This will half the amount of storage you need if your text is mostly ASCII.

What kind of queries are you hoping to run?

alb · April 16, 2025, 6:46pm

Hi, thanks’ for your quick answer!
The most common queries are time based (daily, hourly) looking for on the MainFolder and Name (the First and Last substrings) made in Grafana

Best Regards
Alb

BepTheWolf · April 16, 2025, 6:47pm

Some time ago, I had to change a field from symbol to varchar, because with bulk inserts questdb stopped inserting data after a number of rows that varied randomly from 150000 to 250000. Have you found anything about this?
PS: the field had all distinct values

nwoolmer · April 16, 2025, 6:47pm

@BepTheWolf You have to set SYMBOL capacities appropriately. If every entry is unique, it should not be a SYMBOL column.

Think of it more like an enum, that can scale up to hundreds of thousands of entries… but not infinite!

If you dramatically exceed the symbol capacity, your ingestion will slow to a crawl.

nwoolmer · April 16, 2025, 6:49pm

@alb Do you have an estimate of the cardinality i.e. how many entries you expect to be in those columns?

If each are less than perhaps 100,000, symbols could be a good option.

Filtering on SYMBOL columns (or numbers) can be JITed i.e. compiled to native code. Filtering against VARCHAR will use Java-side string comparisons.

alb · April 22, 2025, 5:52pm

hi,
the estimation is 28000 I applied and I get more performance (reducing time of execution)

Best Regards
Alb

Topic		Replies	Views
"symbol" type is invalid of query Community question	6	40	July 31, 2025
SELECT from multiple timeseries tables with the same query Community sql , question	17	281	December 31, 2024
Weird behavior of questdb under out of order sparse data Community question	1	44	February 14, 2025
Using QuestDB as logs storage Community	4	225	July 17, 2024
High Dimensional Tables Community	2	61	May 8, 2025

Which strategy with this Use Case

Related topics