Which strategy with this Use Case

Hi all,
I’m a beginner with QuestDB, and I found it very interesting!!

I would like to understand if can solve issues in huge archiving data and performance quering

I have data file CSV with 230 Million of rows daily and more over the Timestamp I have values and a Name, composed like a path
Name MainFolder.SubFolder.SubSubFolder.Name

I’m using the structure below, I’m asking to the Community if I should add something (indexing or other) to get more performance in queries?

 CREATE TABLE TableName(
  a_Name string,
  a_Value int,
  a_Timestamp TIMESTAMP,
  ) TIMESTAMP(a_Timestamp )
PARTITION BY DAY;

Thanks’ in advance
Alb

If your paths have a limit to how nested they are, it might be a good idea to unroll them into separate columns, with the higher paths as SYMBOLs (if there aren’t too many distinct values). Filtering on SYMBOL columns is much faster than on strings.

You can also try searching using LIKE on the single path column, and see how fast it is.

It is better to use VARCHAR rather than STRING if you are not using SYMBOL. This will half the amount of storage you need if your text is mostly ASCII.

What kind of queries are you hoping to run?

Hi, thanks’ for your quick answer!
The most common queries are time based (daily, hourly) looking for on the MainFolder and Name (the First and Last substrings) made in Grafana :slight_smile:

Best Regards
Alb

Some time ago, I had to change a field from symbol to varchar, because with bulk inserts questdb stopped inserting data after a number of rows that varied randomly from 150000 to 250000. Have you found anything about this?
PS: the field had all distinct values

@BepTheWolf You have to set SYMBOL capacities appropriately. If every entry is unique, it should not be a SYMBOL column.

Think of it more like an enum, that can scale up to hundreds of thousands of entries… but not infinite!

If you dramatically exceed the symbol capacity, your ingestion will slow to a crawl.

1 Like

@alb Do you have an estimate of the cardinality i.e. how many entries you expect to be in those columns?

If each are less than perhaps 100,000, symbols could be a good option.

Filtering on SYMBOL columns (or numbers) can be JITed i.e. compiled to native code. Filtering against VARCHAR will use Java-side string comparisons.