API Reference

Complete documentation of all classes, functions, and methods in the PardoX Python SDK v0.3.4.


Top-Level Functions (import pardox as px)

read_csv

Reads a CSV file into a DataFrame using the multi-threaded Rust parser.

def read_csv(path: str, schema: dict | None = None) -> DataFrame
Parameter Type Description
path str Path to the .csv file.
schema dict or None Optional column type overrides: {"col": "Float64", ...}. Supported types: Int64, Float64, Utf8.

Returns: DataFrame

df = px.read_csv("sales.csv")
df = px.read_csv("sales.csv", schema={"price": "Float64", "id": "Int64"})

read_prdx

Loads a native PardoX binary file (.prdx).

def read_prdx(path: str) -> list[dict]
Parameter Type Description
path str Path to the .prdx file.

Returns: list[dict] (preview rows)


from_arrow

Zero-copy conversion from a PyArrow Table or RecordBatch.

def from_arrow(data: pyarrow.Table | pyarrow.RecordBatch) -> DataFrame
import pyarrow as pa, pardox as px
arrow_table = pa.Table.from_pydict({"a": [1, 2, 3]})
df = px.from_arrow(arrow_table)

pardox.io — Database I/O

All database functions bypass the Python runtime — connection and data transfer happen entirely in the Rust core.

PostgreSQL

read_sql(connection_string, query) → DataFrame

from pardox.io import read_sql
df = read_sql("postgresql://user:pass@localhost:5432/db", "SELECT * FROM orders")

execute_sql(connection_string, query) → int

Executes DDL or DML. Returns rows affected (0 for DDL).

from pardox.io import execute_sql
execute_sql(CONN, "DROP TABLE IF EXISTS orders")
execute_sql(CONN, "CREATE TABLE orders (id BIGINT, amount FLOAT)")
n = execute_sql(CONN, "DELETE FROM orders WHERE status = 'cancelled'")

Raises: RuntimeError on connection or SQL failure.


MySQL

read_mysql(connection_string, query) → DataFrame

from pardox.io import read_mysql
df = read_mysql("mysql://user:pass@localhost:3306/db", "SELECT * FROM products")

execute_mysql(connection_string, query) → int

from pardox.io import execute_mysql
execute_mysql(CONN, "CREATE TABLE IF NOT EXISTS products (id BIGINT, price DOUBLE)")

SQL Server

read_sqlserver(connection_string, query) → DataFrame

from pardox.io import read_sqlserver
CONN = "Server=localhost,1433;Database=mydb;UID=sa;PWD=MyPwd;TrustServerCertificate=Yes"
df = read_sqlserver(CONN, "SELECT TOP 1000 * FROM dbo.orders")

execute_sqlserver(connection_string, query) → int

from pardox.io import execute_sqlserver
execute_sqlserver(CONN, "DROP TABLE IF EXISTS dbo.orders_bak")

!!! warning “Password special characters” Avoid ! in SQL Server passwords. Known tiberius v0.12 bug — fix tracked for v0.4.0.


MongoDB

read_mongodb(connection_string, db_dot_collection) → DataFrame

from pardox.io import read_mongodb
df = read_mongodb("mongodb://admin:pass@localhost:27017", "mydb.orders")

execute_mongodb(connection_string, database, command_json) → int

from pardox.io import execute_mongodb
execute_mongodb("mongodb://...", "mydb", '{"drop": "orders_archive"}')

Class: DataFrame

The main data structure. Holds an opaque pointer to a Rust HyperBlockManager.

Construction

# From CSV
df = px.read_csv("file.csv")

# From SQL
df = read_sql(conn, "SELECT …")

# From MySQL / SQL Server / MongoDB
df = read_mysql(conn, query)
df = read_sqlserver(conn, query)
df = read_mongodb(conn, "db.collection")

# From Arrow
df = px.from_arrow(arrow_table)

Properties

shape → tuple[int, int]

rows, cols = df.shape
print(f"{rows:,} rows × {cols} columns")

columns → list[str]

print(df.columns)   # ['id', 'price', 'quantity', ...]

dtypes → dict[str, str]

print(df.dtypes)    # {'id': 'Utf8', 'price': 'Float64', 'quantity': 'Int64'}

Inspection

show(n=10)

Prints the first n rows as an ASCII table to stdout.

df.show(5)

head(n=5) → DataFrame

Returns a new DataFrame with the first n rows.

top5 = df.head(5)

tail(n=5) → DataFrame

Returns a new DataFrame with the last n rows.

last5 = df.tail(5)

iloc(start, end) → DataFrame

Returns rows in the range [start, end).

subset = df.iloc(100, 200)   # rows 100–199

Type Operations

cast(col, target_type) → DataFrame

Converts a column to a new type in-place. Returns self.

df.cast("quantity", "Float64")
df.cast("id",       "Utf8")

Supported types: Int64, Float64, Utf8


Arithmetic Methods

All arithmetic methods return a new DataFrame with the result stored in a named column.

mul(col_a, col_b) → DataFrame

revenue_df = df.mul("price", "quantity")   # result column: 'result_mul'

add(col_a, col_b) → DataFrame

total_df = df.add("price", "tax")          # result column: 'result_add'

sub(col_a, col_b) → DataFrame

profit_df = df.sub("revenue", "cost")      # result column: 'result_sub'

std(col) → float

Sample standard deviation of a column. Pure Rust, no NumPy.

std_val = revenue_df.std("result_mul")

min_max_scale(col) → DataFrame

Normalizes column values to [0, 1]. Returns new DataFrame with result_minmax.

normed_df = df.min_max_scale("price")

Sorting

sort_values(by, ascending=True, gpu=False) → DataFrame

Sorts the DataFrame by a Float64 column. Returns a new sorted DataFrame.

sorted_df = df.sort_values("price", ascending=True)
sorted_df = df.sort_values("price", ascending=False, gpu=True)
Parameter Type Description
by str Column name to sort by. Must be Float64.
ascending bool True = ascending (default).
gpu bool Use GPU Bitonic sort. Falls back to CPU if GPU unavailable.

Filtering

filter(mask: Series) → DataFrame

Applies a boolean Series as a row filter. Returns a new DataFrame.

mask   = df['price'] > 100.0
result = df.filter(mask)

Data Cleaning

fillna(value: float) → DataFrame

Fills NaN / null values in all numeric columns in-place.

df.fillna(0.0)

round(decimals: int) → DataFrame

Rounds all numeric columns in-place.

df.round(2)

Observer — Export & Inspection

to_dict() → list[dict]

Returns all rows as a list of dictionaries (records format).

records = df.to_dict()
# [{'price': 19.99, 'state': 'TX', ...}, ...]

Returns: list[dict]

to_json() → str

Returns all rows as a JSON string "[{...}, ...]".

json_str = df.to_json()

Returns: str

value_counts(col) → dict[str, int]

Frequency of each unique value in a column, sorted by count descending.

state_dist = df.value_counts("state")
# {'TX': 6345, 'CA': 6301, ...}

Returns: dict[str, int]

unique(col) → list

Unique values in a column in insertion order.

cats = df.unique("category")
# ['Electronics', 'Books', ...]

Returns: list


Joins

join(other, on=None, left_on=None, right_on=None) → DataFrame

Hash-join two DataFrames on a key column.

result = orders.join(customers, on="customer_id")
result = orders.join(customers, left_on="cust_id", right_on="id")

Writers

to_prdx(path) → bool

Saves DataFrame to native binary format.

df.to_prdx("output.prdx")

to_csv(path) → bool

Exports DataFrame to a CSV file.

df.to_csv("output.csv")

to_sql(connection_string, table_name, mode="append", conflict_cols=[]) → int

Writes to PostgreSQL.

rows = df.to_sql(CONN, "orders", mode="append")
rows = df.to_sql(CONN, "orders", mode="upsert", conflict_cols=["id"])
Parameter Type Values
mode str "append", "upsert"
conflict_cols list[str] Columns for ON CONFLICT clause (upsert only)

Returns: int — rows written. Raises: RuntimeError on failure.


write_sql_prdx — PRDX Streaming to PostgreSQL

Added in v0.3.2

Stream a .prdx file directly to PostgreSQL via COPY FROM STDINO(block) RAM regardless of file size. The schema is read from the PRDX footer; data is never fully loaded into memory.

from pardox import write_sql_prdx

rows = write_sql_prdx(
    prdx_path,        # str — path to .prdx file
    connection_string, # str — PostgreSQL connection string
    table_name,        # str — target table (must already exist)
    mode="append",     # str — only "append" supported
    conflict_cols=[],  # list[str] — reserved for future upsert support
    batch_rows=1000000 # int — rows per COPY batch
)
print(f"Streamed {rows:,} rows")
Parameter Type Description
prdx_path str Path to the .prdx file
connection_string str PostgreSQL connection string (postgresql://user:pass@host:port/db)
table_name str Target table name (must exist with matching schema)
mode str Write mode — only "append" supported in v0.3.2
conflict_cols list[str] Reserved — pass []
batch_rows int Rows per COPY batch (default: 1,000,000)

Returns: int — total rows written. Raises: RuntimeError on failure.

Validated: 150M rows / 3.8 GB PRDX → PostgreSQL in ~490s at ~300,000 rows/s.

to_mysql(connection_string, table_name, mode="append", conflict_cols=[]) → int

Writes to MySQL.

rows = df.to_mysql(CONN, "products", mode="append")
rows = df.to_mysql(CONN, "products", mode="replace")
rows = df.to_mysql(CONN, "products", mode="upsert", conflict_cols=["id"])
Parameter Type Values
mode str "append", "replace", "upsert"

to_sqlserver(connection_string, table_name, mode="append", conflict_cols=[]) → int

Writes to SQL Server (batch INSERT 500 rows/stmt).

rows = df.to_sqlserver(CONN, "dbo.orders", mode="append")
rows = df.to_sqlserver(CONN, "dbo.orders", mode="upsert", conflict_cols=["id"])
Parameter Type Values
mode str "append", "replace", "upsert"

to_mongodb(connection_string, db_dot_collection, mode="append") → int

Writes to MongoDB (10,000 docs/batch, ordered: false).

rows = df.to_mongodb(CONN, "mydb.orders", mode="append")
rows = df.to_mongodb(CONN, "mydb.orders", mode="replace")
Parameter Type Values
mode str "append", "replace"

Class: Series

A single column view into a DataFrame. Returned by df['col_name']. Does not own the underlying memory — the parent DataFrame does.

Properties

name → str

Column name.

dtype → str

Column type ("Int64", "Float64", "Utf8").


Arithmetic Operators

Operations dispatch to SIMD-accelerated Rust kernels. All return a new Series.

total = df['price'] * df['quantity']
net   = df['total'] - df['discount']
tax   = df['total'] + df['tax_amount']
unit  = df['revenue'] / df['quantity']

Comparison Operators

Return a boolean Series usable as a filter mask.

Method Meaning
s.eq(val) ==
s.neq(val) !=
s.gt(val) >
s.gte(val) >=
s.lt(val) <
s.lte(val) <=
mask = df['price'].gt(100.0)
df_filtered = df.filter(mask)

mask2 = df['state'].eq("TX")
df_tx = df.filter(mask2)

Aggregations

All aggregation methods return a Python scalar.

Method Returns Description
sum() float Sum of all non-null values
mean() float Arithmetic mean
min() float Minimum value
max() float Maximum value
std() float Sample standard deviation
count() int Count of non-null values
total   = df['revenue'].sum()
average = df['revenue'].mean()
high    = df['revenue'].max()
low     = df['revenue'].min()
spread  = df['revenue'].std()
valid   = df['id'].count()

Transformations

fillna(value) → Series

df['price'].fillna(0.0)

round(decimals) → Series

df['price'].round(2)

NumPy Zero-Copy

import numpy as np

# Direct pointer into Rust buffer — no allocation
arr = np.array(df['price'])   # dtype: float64

Works on Float64 columns. Cast Int64 columns first:

df.cast("quantity", "Float64")
arr = np.array(df["quantity"])

Error Codes

All database functions raise RuntimeError with a descriptive message on failure. The underlying Rust function returns integer error codes:

Code Meaning
-1 Invalid manager pointer (null)
-2 Invalid connection string
-3 Invalid table / query string
-4 Invalid mode string
-5 Invalid conflict columns JSON
-10 File not found (write_sql_prdx only)
-20 Empty connection string (write_sql_prdx only)
-100 Operation failed — check stderr for Rust error details

!!! tip “Stderr logging” When -100 is returned, the Rust core logs the actual database error to stderr before returning. Run with stderr visible to diagnose connection or schema issues.


SQL Cursor API (Gap 30)

Added in v0.3.4

Streaming iterator over PostgreSQL query results. Each batch yields a DataFrame without loading the full result set into memory.

query_to_results

def query_to_results(connection_string: str, query: str, batch_size: int = 100_000) -> Generator[DataFrame, None, None]

Generator that opens a server-side PostgreSQL cursor and yields DataFrame objects one batch at a time. Uses DECLARE ... NO SCROLL CURSOR internally — the connection remains open for the duration of the iteration. Memory usage is O(batch_size rows).

Parameter Type Description
connection_string str PostgreSQL connection string (postgresql://user:pass@host:port/db)
query str SQL query to execute (SELECT statement)
batch_size int Rows per batch (default: 100,000)

Yields: DataFrame — one batch per iteration. Columns match the query result schema.

Raises: RuntimeError if the cursor cannot be opened.

import pardox as px

CONN  = "postgresql://user:pass@localhost:5432/db"
QUERY = "SELECT * FROM sales ORDER BY date"

# Streaming iterator — exact pattern from GitHub issue @Prussian1870
for batch_df in px.query_to_results(CONN, QUERY, batch_size=50_000):
    records = batch_df.to_dict()     # list of dicts
    json_str = batch_df.to_json()    # JSON string
    rows, cols = batch_df.shape      # inspect shape

sql_to_parquet

def sql_to_parquet(connection_string: str, query: str, output_pattern: str, chunk_size: int = 100_000) -> int

Streams a SQL query result directly to PardoX binary files (.prdx) using a filename pattern. No full result set is loaded into RAM — memory usage is O(chunk_size rows).

Parameter Type Description
connection_string str PostgreSQL connection string
query str SQL query to execute
output_pattern str Output file path pattern. Use {i} as the chunk index placeholder, e.g. "/tmp/chunk_{i}.prdx"
chunk_size int Rows per output file (default: 100,000)

Returns: int — total rows written across all files. Raises: RuntimeError on failure.

import pardox as px

total = px.sql_to_parquet(
    "postgresql://user:pass@localhost:5432/db",
    "SELECT * FROM sales",
    "/data/sales_chunk_{i}.prdx",
    chunk_size=100_000
)
print(f"Exported {total:,} rows")

# Read individual chunks back
df = px.read_prdx("/data/sales_chunk_0.prdx")

Validated: 250,000 rows streamed across 3 chunk files — 11/11 tests in Python, JavaScript, and PHP.


Copyright © 2026 PardoX. Distributed under the MIT License.

This site uses Just the Docs, a documentation theme for Jekyll.