I want to understand the INDEX CAPACITY setting better. When I first setup my tables, I just made the INDEX CAPACITY the same as the SYMBOL CAPACITY. That worked fine until I looked at the partition storage sizes. It seemed to grow exponentially. So I reduced the INDEX CAPACITY and that solved the storage problem, but I want to make sure that was the correct decision.
I’m assuming the INDEX CAPACITY refers to how often the data in that INDEX is expected to repeat within one partition? Is that correct? If so, I can tune it accordingly. I think I may have it set too low a the moment.
Also, is the CAPACITY automatically expanded if needed? I guess the only downside would be that the ingestion rate could suffer if that happens?
First things first, it is unlikely that you need indexes. It is rare for them to be better than parallel scans, and when they are, usually you want to index an additional symbol column so that you can still run queries with parallel scans when you choose.
It might be counterintuitive if you are used to other databases, but for QuestDB, without index is usually better than with index.
I want to understand the INDEX CAPACITY setting better.
INDEX CAPACITY
should ideally be set to a value close to this:
the number of rows per unique symbol within a single partition
.
Let’s say you are storing sensor data. You have 100 unique sensors, which send data every second. Then you use hourly partitions.
This means you have 100
unique symbols (so SYMBOL CAPACITY
should be >= 100) and 3600
rows per symbol. So that would make an index capacity of 3600
.
However, as you identify, this takes storage space. You probably won’t notice much difference with a smaller capacity, like 128
or 256
, and will save storage this way.
Also, is the CAPACITY automatically expanded if needed?
No, it will just be slower. The same is true for symbols.
1 Like