QuestDB Postgres interface not compatible with Tableau (Prep)

I use Tableau for data analysis and thought let’s give QuestDB a try, as it has a Postgres Wire interface.

Well: long story short: NOPE. Tableau expects a Postgres database with Postgres tables to read the configuration. Those tables don’t exist in QuestDB.

Furthermore, Tableau seems to use LOCAL TEMPORARY TABLES, which QuestDB can’t handle too.

Just a small list of the errors while trying to create a table in Tableau Prep:

2024-12-07T14:04:24.875374Z E i.q.g.e.QueryProgress err [id=-1, sql=`SELECT setting FROM pg_catalog.pg_settings WHERE name='max_index_keys'`, principal=admin, cache=false, jit=false, time=127860, msg=table does not exist [table=pg_catalog.pg_settings], errno=0, pos=20]

2024-12-07T14:04:24.879370Z E i.q.g.e.QueryProgress err [id=-1, sql=`CREATE LOCAL TEMPORARY TABLE "#Tableau_50_DDD30AD1-8C47-496F-86D6-7DEFB5F49395_3_prep" (
	"Field" VARCHAR(65000)
	) ON COMMIT PRESERVE ROWS`, principal=admin, cache=false, jit=false, time=15702, msg=expected 'atomic' or 'table' or 'batch', errno=0, pos=7]

FROM (SELECT 1 AS COL) AS CHECKTEMP`, principal=admin, cache=false, jit=false, time=23996, msg=table and column names that are SQL keywords have to be enclosed in double quotes, such as "INTO", errno=0, pos=43]

2024-12-07T14:14:03.327934Z E i.q.g.e.QueryProgress err [id=-1, sql=`show "lc_collate"`, principal=admin, cache=false, jit=false, time=91241, msg=expected 'TABLES', 'COLUMNS FROM <tab>', 'PARTITIONS FROM <tab>', 'TRANSACTION ISOLATION LEVEL', 'transaction_isolation', 'max_identifier_length', 'standard_conforming_strings', 'parameters', 'server_version', 'server_version_num', 'search_path', 'datestyle', or 'time zone', errno=0, pos=17]

I assume this can’t be easily fixed on your side, ie using QuestDB as a fast database backend for Tableau is not feasible soon?

Unfortunately, some tools use features of Postgres we don’t support, often unncessarily.

Some compatibilities are easy to fix, for example, we support queries such as information_schema.columns().

Others are not - expecting transaction syntax, temp tables etc.

We would have to write a specific connector for Tableau and upstream it, as we did for Grafana. We can certainly look into it.

In the meantime, you can pull data from /exp and import it into Tableau, or you may be able to use the web connector against this same endpoint.

Ah yes. This is from Tableau Data prep, ie outputting to QuestDB.

I can’t control the SQL queries used.

I assume that adding the pg tables would solve one of the problems, but if the other commands are not supported, it will be difficult for you. I have also no idea if you can simply fake/ignore the LOCAL TEMPORARY stuff to prevent that the query breaks.

I did test Postgres and got around 250.000 rows/minute. TimescaleDB does 2 million rows/minute. I was hoping that QuestdB would be much faster as i try to ingest 500 million rows :upside_down_face:

I haven’t tried Tableau, ie reading data. Maybe that just works to speed-up dashboards when QuestDB is used.

Outputting .csv from Tableau Data prep is just terribly slow for 500 million rows. The in memory database (.hyper) is only 17GB.

Getting QuestDB compatible with Tableau/Data Prep would be highly appreciated of course, but I have no idea what other queries Tableau uses that QuestDB currently does not support.

If you write to QuestDB through our regular means i.e ILP, you can certainly exceed those rates. A 4 core box with the TSBS schema supports 750k rows/second with the out-of-the-box config.

You can also export Parquet from Tableau, put them in QuestDB’s copy root directory (configured in server.conf), and then write it into the native format using read_parquet and an INSERT INTO SELECT statement.

I’m still bound by the output possibilities of Tableau Data Prep. I can choose between some files and (cloud) databases. Parquet is only supported by Amazon S3, which requires Amazon. You can’t use some local Minio cluster…

A workaround would be using a Python script that reads .hyper files and outputs just the way QuestDB ILP works. But that requires an extra step.

I can try that to check the read/query performance using Tableau, assuming Tableau won’t check if a real Postgres database is on the other side.

If I use a JDBC connection, I can choose between Postgres, MySQL, and SQL92 dialects.

In the meantime, I have upped the pace of the TimescaleDB to 5 million rows/minute. That is around 80.000 rows/second. About 10x slower than your example.
Pfff. Just found out that my primary timestamp has NULL values, so restarting the output again and waiting around 100 minutes to complete the 500 million row output :smile:

TimescaleDB does need indexes, and that is not very convenient using analytic tooling. QuestDB would be a much better fit, but if I can’t get it to work, TimescaleDB is the only option I guess.

I will let you know if I have any good results with my quest!

I tested data prep and had success creating a connection to questdb as a source using the JDBC data source and outputting to a text file. Unfortunately, when outputting to QuestDB I found similar errors to the ones described above

That sounds nice! if prep can do that, Tableau (server) would work too I guess.

Wat SQL dialect did you choose for the JDBC connector? Postgres?

I went with SQL92. Didn’t try anything else

With the postgresql connector (no JDBC, native postgresql) I could connect and see table names, columns and rows, but when trying to export it complained about metadata for the types. With JDBC I could avoid that

I see that QuestDB has more limitations on column names than Postgres has.

I can’t use “Tag Name - Split 8” for instance. Something Tableau Prep generates for automatic field splits.

The characters “()” are also not allowed for some reason. “[ ]” are oke.

I was trying to COPY a 200GB csv file with headers. Now I have to find out how to change the header of a 200GB csv file :smile:

The COPY went well using TimescaleDB. And after setting indexes on some of the 100 fields/row , dashboards in Tableau are fairly responsive depending on the dashboard. Some only take 1-5 seconds, others almost 10 minutes. That is an indexing problem I guess, as Tableau itself takes just seconds using the .hyper database.

I have a designated timestamp field…

image

But still the COPY complains:
COPY vos_all FROM 'vos_test_small.csv' WITH HEADER true DELIMITER ';' FORMAT 'dd-MM-yyyy HH:mm:ss' TIMESTAMP "Tijdstempel Event";

I’m a bit lost :confused:

Apart from the altered fieldnames, this file was imported into TimescaleDB without any complaints. TimescaleDB uses the same timestamp field for partitioning…

Second part of message since I’m not allowed yet to put two images in a single post :cry:

As you can see, I’m still trying to see what QuestDB can do combined with Tableau, but then I need the COPY to work of course!

Is your timestamp format correct for all rows?

The column name limitations are related to file names on disk, you also cannot use ‘.’ for example.

Sorry to hear about your troubles!

If you could share just a small extract from the CSV file (feel free to anonymize anything, but please keep formats unchanged), I can try and provide the appropriate command for the COPY

It seems my problems are related to: Inconsisent use of format in csv import · Issue #4761 · questdb/questdb · GitHub

I created a very simple csv with only a few fields and a timestamp (of course).

I created the following table (Dutch names…):

DROP TABLE het_weer;
CREATE TABLE het_weer (
    "Tijdstempel Event" timestamp,
    "Wind" double precision,
    "Windrichting" double precision,
    "Temperatuur" double precision
) timestamp("Tijdstempel Event") PARTITION BY DAY WAL;

And filled the table with 2 lines of data:

Tijdstempel Event;Wind;Windrichting;Temperatuur
05-12-2024 10:45:22;5;100;16
10-12-2024 16:30:20;10;200;32

And the COPY:

COPY het_weer FROM 'het_weer_data.csv' WITH HEADER true DELIMITER ';' FORMAT 'dd-MM-yyyy HH:mm:ss' TIMESTAMP "Tijdstempel Event";

This import fails again with the same error: column is not a timestamp, although it is!

If I remove the space in “Tijdstempel Event”, ie the column is renamed to “TijdstempelEvent”, then the import finds the timestamp ok, but fails with the same error as in the Github issue:

I have no idea why of course, but the Github issue is still open.

I hope you can fix this. I have tried 10 different existing csv files, and all fail with the same error.

BTW most of the fields have a space as the fields have human readable formatting and displayed that way on the dashboards in Tableau. As a Tableau user I don’t use SQL directly by using the AS specifier to rename a field. The source fields have the required names already to make sure every dashboard is using the same names…

None of the timestamp formats seem to work in my case. Same io_uring error every time.

Only if I remove the “PARTITION BY…” part in the CREATE TABLE, does the import work!

So even the below timestamp format does not work:

TijdstempelEvent;Wind;Windrichting;Temperatuur
2024-12-05T10:45:22.123;5;100;16
2024-12-10T16:30:20.456;10;200;32

I can’t imagine that I’m the first user using this, so I’m still very curious about what the problem is. I’m using the standard Docker container. Might there be a rights problem orso? That the container needs to use this?

Hmmm:

They won’t fix that according to that issue as io_uring is a security issue!

It seems I have to find the flags that allow this call to be made again from within the container.
Or did I miss some documentation from QuestDB?

I disabled io_uring in the server.conf file and I can import the two lines in the het_weer csv file.

Next is testing the date formats. First tests fail, so I guess the timestamp format still does not work and that I have to change the data prep flow that outputs the 500 million rows to get acceptable timestamp and number/float formatting.

That will be a 3 hour wait before the csv is created once I modified the existing flow!

If you can get timestamps in either epoch micros or ISO UTC (yyyy-MM-ddTHH:mm:ss.ffffffZ), then you should be confident it will read consistently.

Great. Then I will go that route and report back!

Well, I have made a small extract with 100.000 rows to test the connection with QuestDB from Tableau Desktop using a generic JDBC driver with SQL92 dialect and using the Postgres Wire protocol with QuestDB.

Some things work, and some things just don’t work as you see below.

As soon as I want to count occurrences by day/week/month/year I get this error. As long as I don’t use day/week/month/year counting/summing seems ok.

Remember that I don’t control the SQL query. I just drag-and-drop.

CASTing is supported by QuestDB, so what is going wrong here, @nwoolmer ?
Is the problem related to CASTing to an INTEGER instead of an INT? The error position (128 in the last example) is exactly the word “INTEGER” :confused:

Timestamps are also still a bit of a problem in QuestDB:

  • timestamps are imported as Null values unless you use the dateTtimeZ format
  • As Prep doesn’t support date/time formatting, I went through a lot of hoops to get the dateTtimeZ format as output
  • null values in timestamps give errors if you want to extract the year from them, ie the query fails (other databases just ignore the Null values)
Bad Connection: Tableau could not connect to the data source.
Error Code: FAB9A2C5
ERROR: unsupported cast
  Position: 127
SELECT SUM(1) AS "cnt_vos_all_0311A9F8EF054839B9A9DFF1ADB5D50F_ok",
  CAST(EXTRACT(DAY FROM "vos_all"."Tijdstempel_Event") AS INTEGER) AS "dy_Tijdstempel_Event_ok",
  CAST(EXTRACT(MONTH FROM "vos_all"."Tijdstempel_Event") AS INTEGER) AS "mn_Tijdstempel_Event_ok",
  {fn QUARTER("vos_all"."Tijdstempel_Event")} AS "qr_Tijdstempel_Event_ok",
  CAST(EXTRACT(YEAR FROM "vos_all"."Tijdstempel_Event") AS INTEGER) AS "yr_Tijdstempel_Event_ok"
FROM "public"."vos_all" "vos_all"
GROUP BY 2,
  3,
  4,
  5

And a much simpler dashboard: counting events per year:

Unable to complete action
Bad Connection: Tableau could not connect to the data source.
Error Code: FAB9A2C5
ERROR: unsupported cast
  Position: 128
SELECT SUM(1) AS "cnt_vos_all_0311A9F8EF054839B9A9DFF1ADB5D50F_ok",
  CAST(EXTRACT(YEAR FROM "vos_all"."Tijdstempel_Event") AS INTEGER) AS "yr_Tijdstempel_Event_ok"
FROM "public"."vos_all" "vos_all"
GROUP BY 2

According to Timestamp, date and time functions | QuestDB, Extract is supported and returns an INT. So in this case an INT is CASTed to an INTEGER. That doesn’t seem impossible :smile:

AFAIK SQL92 should support both INT and INTEGER as data types.

Can this be fixed?, as it seems QuestDB is much faster than TimescaleDB in the dashboards/queries that do work :frowning_face: