Python and questdb console timing discrepency by a large amount

Hello,

I have, what I would say, is a standard questdb configuration but reading from the questdb database is incredibly slow. My set up is:

questdb-9.0.3 that I downloaded from https://github.com/questdb/questdb/releases/download/9.0.3/questdb-9.0.3-rt-linux-x86-64.tar.gz

Then, I run ./questdb.sh start.

I have 30 sensors recording data at 2^14 Hz and I’m saving 2 minutes of data and reading 2 minutes of data. I can save the data into the database following the Python dataframe example:

import pandas as pd
from questdb.ingress import Sender
import numpy as np
import datetime as dt
from time import time

conf = 'http::addr=127.0.0.1:9000;'
fs = 2**14
step = dt.timedelta(seconds=1)
Nchans = 30
i = 0

start = dt.datetime.now()

# Sweep data - make 2 minutes
for j in range(120):
    x = np.arange(Nchans*fs).reshape((fs, Nchans)) + i
    names = ['Chan'+str(i) for i in range(1, Nchans+1)]
    df = pd.DataFrame(x, columns=names, copy=True)
    i += Nchans * fs
    
    tic = time()
    print(f'Saving {j} second...')
    with Sender.from_conf(conf) as sender:
        sender.dataframe(df, table_name='sensor',
                         at=start+step)
    toc = time()
    print(f'{toc-tic:.2f} seconds elapsed.')

This is fairly slow taking a total of 15-20 seconds, but it’s not unbearable.

When I try to read from questdb following the Python basic example:

import time
import psycopg

conn = psycopg.connect(
    host='127.0.0.1',
    port=8812,
    user='admin',
    password='quest',
    dbname='qdb',
    autocommit=True  # Important for QuestDB
)

tic = time.time()
with conn.cursor(binary=True) as cur:
    cur.execute("SELECT * FROM sensor;")
    data = cur.fetchall()
toc = time.time()
print(f'{toc-tic:.2f} seconds elapsed.')
conn.close()

It takes 19 seconds !! to read 1.9M rows or 2 minutes of data. While the questdb console takes around 20-30 milliseconds.

What am I doing wrong with the Python example that the questdb console is doing instead? Or what can I do to speed things up? Thanks,

Hi,

On my machine the ingestion code executes in about 7 seconds, so one thing to consider is what type of hardware you are using, as a slow disk might cause issues. The code is not very efficient as you are looping and creating/closing many connections, but that would just save a second or so in this case.

In real-life, you could either use ILP directly for ingestion, or ingest in parallel to optimise more.

Regarding read performance, what you see in the web console is only a subset of the data, which gets paginated as you scroll. QuestDB is not a tool to do SELECT * super fast, but a tool to run complex analytical queries very fast. In the common scenario, you SELECT data from a dataset with billions of rows, and you return back a few rows, maybe a few thousands even, but it is not that common to return millions of rows. In this case, the query itself is very fast to find the rows to return, but the rows need to be sent back to the client, and the client needs to deserialize into Python, which is quite slow.

We have a client specifically for cases where you want to return large amounts of data and avoid some serialization in python GitHub - questdb/py-questdb-query: Fast query over HTTP(S)/CSV for QuestDB. That should give you faster results, but still the issue is you are moving a lot of data in and out. For those cases you are probably better doing an export than a normal query. Actually we have a Pull Request soon to be merged that allows you to export directly in Parquet Format. If you really need to extract millions of rows for your queries, that might be the most efficient option.

I hope this helps :slight_smile:

Javier

As an addendum to Javier’s post, keep in mind that the web console is not shipping 1.9M rows in one go - it pages the data in chunks of 10_000 or so, so you are only returning a subset.

For a select like this, try ConnectorX: Polars | QuestDB

1 Like