ndaray is not c-contiguous

vtan · February 13, 2025, 8:25am

doing some benchmarking with quest using Python. Have below function to generate a dataframe.

def generate_test_data(iteration, columns):
# Create a list of column names (just generic names ‘col_0’, ‘col_1’, …)
col_names = [f"col{i}" for i in range(columns)]

# Initialize the start time as current time
start_time = dt.datetime.now()

# Generate index as time series starting from current time, incremented by 1 ms
time_index = [start_time +
              dt.timedelta(milliseconds=i) for i in range(iteration)]

# Generate random data for the DataFrame (random floats in this case)
data = np.random.rand(iteration, columns)
data = np.asarray(data, order='C')
# Create DataFrame with generated time index
df = pd.DataFrame(data, columns=col_names, copy=True)
df['timestamp'] = time_index

print(df)
return df

The code to push into quest is pretty straight forward.

def write_df(df: pd.DataFrame, table: str):
with Sender.from_conf(conf) as sender:
sender.dataframe(
df, table_name=table, at=“timestamp”
)

if the dimension is more than 1 column, i am getting

File “src/questdb/ingress.pyx”, line 2403, in questdb.ingress.Sender.dataframe
File “src/questdb/dataframe.pxi”, line 2396, in questdb.ingress._dataframe
File “src/questdb/dataframe.pxi”, line 2296, in questdb.ingress._dataframe
File “src/questdb/dataframe.pxi”, line 1177, in questdb.ingress._dataframe_resolve_args
File “src/questdb/dataframe.pxi”, line 1117, in questdb.ingress._dataframe_resolve_cols
File “src/questdb/dataframe.pxi”, line 1017, in questdb.ingress._dataframe_resolve_source_and_buffers
File “src/questdb/dataframe.pxi”, line 814, in questdb.ingress._dataframe_series_as_pybuf
questdb.ingress.IngressError: Bad column ‘col0’: ndarray is not C-contiguous

adamcimarosti · February 13, 2025, 4:41pm

vtan:

def generate_test_data(iteration, columns):

Create a list of column names (just generic names ‘col_0’, ‘col_1’, …)

col_names = [f"col{i}" for i in range(columns)]

# Initialize the start time as current time
start_time = dt.datetime.now()

# Generate index as time series starting from current time, incremented by 1 ms
time_index = [start_time +
              dt.timedelta(milliseconds=i) for i in range(iteration)]

# Generate random data for the DataFrame (random floats in this case)
data = np.random.rand(iteration, columns)
data = np.asarray(data, order='C')
# Create DataFrame with generated time index
df = pd.DataFrame(data, columns=col_names, copy=True)
df['timestamp'] = time_index

print(df)
return df

As far as I can tell, you’re just after a test function.

Your existing logic generates a matrix in numpy and then slices it into pandas columns.
Doing so, the memory of each column would not be contiguous: This is not something that we support in the dataframe() method as – in practice – one would generally create data for different columns independently.
The columns in your code here are, as a result, strided (jump non-contiguously in memory from one element to the next).
For your specific code, the fix is simple, you can generate the numpy array arranged as column-major (fortran style) rather than row-major (C style).
In other words, changing np.asarray(data, order='C') to np.asarray(data, order='F').

While you’re at it, you might want to avoid allocating Python objects when generating the timestamp column.

start_time = pd.Timestamp.utcnow()
time_index = pd.date_range(start=start_time, periods=iteration, freq='1ms')

If you end up in this edge case again for other reasons, you can also flatten a numpy column to contiguous memory via np.ascontiguousarray(arr). Pandas operations themselves should never generate non-contiguous arrays.

Here is the updated code, populating a buffer:

#!/usr/bin/env -S uv run --no-project

# /// script
# dependencies = ["questdb", "pandas", "numpy", "pyarrow"]
# ///

import questdb.ingress as qi
import numpy as np
import pandas as pd

def generate_test_data(iteration, columns):
    col_names = [f"col{i}" for i in range(columns)]
    start_time = pd.Timestamp.utcnow()
    time_index = pd.date_range(start=start_time, periods=iteration, freq='1ms')
    data = np.random.rand(iteration, columns)
    data = np.asarray(data, order='F')
    df = pd.DataFrame(data, columns=col_names, copy=True)
    df['timestamp'] = time_index
    print(df)
    return df

def main():
    df = generate_test_data(1000, 20)
    buf = qi.Buffer()
    buf.dataframe(df, table_name='foo', at='timestamp')

if __name__ == '__main__':
    main()

I hope this helps.

Topic		Replies	Views
Effecient way to upload a pandas dataframe into database Community question , client	5	90	November 26, 2024
Databento livedata - Looking for complete code example Community	3	132	September 3, 2024
What's the best way to upload an ILP file into QuestDB? Community question	15	195	September 19, 2024
Window function to get future rows? Community question , client	2	74	July 9, 2024
Which strategy with this Use Case Community sql , suggestion	6	70	April 22, 2025

ndaray is not c-contiguous

Related topics