API Reference

Complete documentation of all classes, functions, and methods in the PardoX Python SDK v0.3.4.

Top-Level Functions (`import pardox as px`)

`read_csv`

Reads a CSV file into a DataFrame using the multi-threaded Rust parser.

def read_csv(path: str, schema: dict | None = None) -> DataFrame

Parameter	Type	Description
`path`	`str`	Path to the `.csv` file.
`schema`	`dict` or `None`	Optional column type overrides: `{"col": "Float64", ...}`. Supported types: `Int64`, `Float64`, `Utf8`.

Returns: DataFrame

df = px.read_csv("sales.csv")
df = px.read_csv("sales.csv", schema={"price": "Float64", "id": "Int64"})

`read_prdx`

Loads a native PardoX binary file (.prdx).

def read_prdx(path: str) -> list[dict]

Parameter	Type	Description
`path`	`str`	Path to the `.prdx` file.

Returns: list[dict] (preview rows)

`from_arrow`

Zero-copy conversion from a PyArrow Table or RecordBatch.

def from_arrow(data: pyarrow.Table | pyarrow.RecordBatch) -> DataFrame

import pyarrow as pa, pardox as px
arrow_table = pa.Table.from_pydict({"a": [1, 2, 3]})
df = px.from_arrow(arrow_table)

`pardox.io` — Database I/O

All database functions bypass the Python runtime — connection and data transfer happen entirely in the Rust core.

PostgreSQL

`read_sql(connection_string, query) → DataFrame`

from pardox.io import read_sql
df = read_sql("postgresql://user:pass@localhost:5432/db", "SELECT * FROM orders")

`execute_sql(connection_string, query) → int`

Executes DDL or DML. Returns rows affected (0 for DDL).

from pardox.io import execute_sql
execute_sql(CONN, "DROP TABLE IF EXISTS orders")
execute_sql(CONN, "CREATE TABLE orders (id BIGINT, amount FLOAT)")
n = execute_sql(CONN, "DELETE FROM orders WHERE status = 'cancelled'")

Raises: RuntimeError on connection or SQL failure.

MySQL

`read_mysql(connection_string, query) → DataFrame`

from pardox.io import read_mysql
df = read_mysql("mysql://user:pass@localhost:3306/db", "SELECT * FROM products")

`execute_mysql(connection_string, query) → int`

from pardox.io import execute_mysql
execute_mysql(CONN, "CREATE TABLE IF NOT EXISTS products (id BIGINT, price DOUBLE)")

SQL Server

`read_sqlserver(connection_string, query) → DataFrame`

from pardox.io import read_sqlserver
CONN = "Server=localhost,1433;Database=mydb;UID=sa;PWD=MyPwd;TrustServerCertificate=Yes"
df = read_sqlserver(CONN, "SELECT TOP 1000 * FROM dbo.orders")

`execute_sqlserver(connection_string, query) → int`

from pardox.io import execute_sqlserver
execute_sqlserver(CONN, "DROP TABLE IF EXISTS dbo.orders_bak")

!!! warning “Password special characters” Avoid ! in SQL Server passwords. Known tiberius v0.12 bug — fix tracked for v0.4.0.

MongoDB

`read_mongodb(connection_string, db_dot_collection) → DataFrame`

from pardox.io import read_mongodb
df = read_mongodb("mongodb://admin:pass@localhost:27017", "mydb.orders")

`execute_mongodb(connection_string, database, command_json) → int`

from pardox.io import execute_mongodb
execute_mongodb("mongodb://...", "mydb", '{"drop": "orders_archive"}')

Class: `DataFrame`

The main data structure. Holds an opaque pointer to a Rust HyperBlockManager.

Construction

# From CSV
df = px.read_csv("file.csv")

# From SQL
df = read_sql(conn, "SELECT …")

# From MySQL / SQL Server / MongoDB
df = read_mysql(conn, query)
df = read_sqlserver(conn, query)
df = read_mongodb(conn, "db.collection")

# From Arrow
df = px.from_arrow(arrow_table)

Properties

`shape → tuple[int, int]`

rows, cols = df.shape
print(f"{rows:,} rows × {cols} columns")

`columns → list[str]`

print(df.columns)   # ['id', 'price', 'quantity', ...]

`dtypes → dict[str, str]`

print(df.dtypes)    # {'id': 'Utf8', 'price': 'Float64', 'quantity': 'Int64'}

Inspection

`show(n=10)`

Prints the first n rows as an ASCII table to stdout.

df.show(5)

`head(n=5) → DataFrame`

Returns a new DataFrame with the first n rows.

top5 = df.head(5)

`tail(n=5) → DataFrame`

Returns a new DataFrame with the last n rows.

last5 = df.tail(5)

`iloc(start, end) → DataFrame`

Returns rows in the range [start, end).

subset = df.iloc(100, 200)   # rows 100–199

Type Operations

`cast(col, target_type) → DataFrame`

Converts a column to a new type in-place. Returns self.

df.cast("quantity", "Float64")
df.cast("id",       "Utf8")

Supported types: Int64, Float64, Utf8

Arithmetic Methods

All arithmetic methods return a new DataFrame with the result stored in a named column.

`mul(col_a, col_b) → DataFrame`

revenue_df = df.mul("price", "quantity")   # result column: 'result_mul'

`add(col_a, col_b) → DataFrame`

total_df = df.add("price", "tax")          # result column: 'result_add'

`sub(col_a, col_b) → DataFrame`

profit_df = df.sub("revenue", "cost")      # result column: 'result_sub'

`std(col) → float`

Sample standard deviation of a column. Pure Rust, no NumPy.

std_val = revenue_df.std("result_mul")

`min_max_scale(col) → DataFrame`

Normalizes column values to [0, 1]. Returns new DataFrame with result_minmax.

normed_df = df.min_max_scale("price")

Sorting

`sort_values(by, ascending=True, gpu=False) → DataFrame`

Sorts the DataFrame by a Float64 column. Returns a new sorted DataFrame.

sorted_df = df.sort_values("price", ascending=True)
sorted_df = df.sort_values("price", ascending=False, gpu=True)

Parameter	Type	Description
`by`	`str`	Column name to sort by. Must be `Float64`.
`ascending`	`bool`	`True` = ascending (default).
`gpu`	`bool`	Use GPU Bitonic sort. Falls back to CPU if GPU unavailable.

Filtering

`filter(mask: Series) → DataFrame`

Applies a boolean Series as a row filter. Returns a new DataFrame.

mask   = df['price'] > 100.0
result = df.filter(mask)

Data Cleaning

`fillna(value: float) → DataFrame`

Fills NaN / null values in all numeric columns in-place.

df.fillna(0.0)

`round(decimals: int) → DataFrame`

Rounds all numeric columns in-place.

df.round(2)

Observer — Export & Inspection

`to_dict() → list[dict]`

Returns all rows as a list of dictionaries (records format).

records = df.to_dict()
# [{'price': 19.99, 'state': 'TX', ...}, ...]

Returns: list[dict]

`to_json() → str`

Returns all rows as a JSON string "[{...}, ...]".

json_str = df.to_json()

Returns: str

`value_counts(col) → dict[str, int]`

Frequency of each unique value in a column, sorted by count descending.

state_dist = df.value_counts("state")
# {'TX': 6345, 'CA': 6301, ...}

Returns: dict[str, int]

`unique(col) → list`

Unique values in a column in insertion order.

cats = df.unique("category")
# ['Electronics', 'Books', ...]

Returns: list

Joins

`join(other, on=None, left_on=None, right_on=None) → DataFrame`

Hash-join two DataFrames on a key column.

result = orders.join(customers, on="customer_id")
result = orders.join(customers, left_on="cust_id", right_on="id")

Writers

`to_prdx(path) → bool`

Saves DataFrame to native binary format.

df.to_prdx("output.prdx")

`to_csv(path) → bool`

Exports DataFrame to a CSV file.

df.to_csv("output.csv")

`to_sql(connection_string, table_name, mode="append", conflict_cols=[]) → int`

Writes to PostgreSQL.

rows = df.to_sql(CONN, "orders", mode="append")
rows = df.to_sql(CONN, "orders", mode="upsert", conflict_cols=["id"])

Parameter	Type	Values
`mode`	`str`	`"append"`, `"upsert"`
`conflict_cols`	`list[str]`	Columns for `ON CONFLICT` clause (upsert only)

Returns: int — rows written. Raises: RuntimeError on failure.

`write_sql_prdx` — PRDX Streaming to PostgreSQL

Added in v0.3.2

Stream a .prdx file directly to PostgreSQL via COPY FROM STDIN — O(block) RAM regardless of file size. The schema is read from the PRDX footer; data is never fully loaded into memory.

from pardox import write_sql_prdx

rows = write_sql_prdx(
    prdx_path,        # str — path to .prdx file
    connection_string, # str — PostgreSQL connection string
    table_name,        # str — target table (must already exist)
    mode="append",     # str — only "append" supported
    conflict_cols=[],  # list[str] — reserved for future upsert support
    batch_rows=1000000 # int — rows per COPY batch
)
print(f"Streamed {rows:,} rows")

Parameter	Type	Description
`prdx_path`	`str`	Path to the `.prdx` file
`connection_string`	`str`	PostgreSQL connection string (`postgresql://user:pass@host:port/db`)
`table_name`	`str`	Target table name (must exist with matching schema)
`mode`	`str`	Write mode — only `"append"` supported in v0.3.2
`conflict_cols`	`list[str]`	Reserved — pass `[]`
`batch_rows`	`int`	Rows per COPY batch (default: 1,000,000)

Returns: int — total rows written. Raises: RuntimeError on failure.

Validated: 150M rows / 3.8 GB PRDX → PostgreSQL in ~490s at ~300,000 rows/s.

`to_mysql(connection_string, table_name, mode="append", conflict_cols=[]) → int`

Writes to MySQL.

rows = df.to_mysql(CONN, "products", mode="append")
rows = df.to_mysql(CONN, "products", mode="replace")
rows = df.to_mysql(CONN, "products", mode="upsert", conflict_cols=["id"])

Parameter	Type	Values
`mode`	`str`	`"append"`, `"replace"`, `"upsert"`

`to_sqlserver(connection_string, table_name, mode="append", conflict_cols=[]) → int`

Writes to SQL Server (batch INSERT 500 rows/stmt).

rows = df.to_sqlserver(CONN, "dbo.orders", mode="append")
rows = df.to_sqlserver(CONN, "dbo.orders", mode="upsert", conflict_cols=["id"])

Parameter	Type	Values
`mode`	`str`	`"append"`, `"replace"`, `"upsert"`

`to_mongodb(connection_string, db_dot_collection, mode="append") → int`

Writes to MongoDB (10,000 docs/batch, ordered: false).

rows = df.to_mongodb(CONN, "mydb.orders", mode="append")
rows = df.to_mongodb(CONN, "mydb.orders", mode="replace")

Parameter	Type	Values
`mode`	`str`	`"append"`, `"replace"`

Class: `Series`

A single column view into a DataFrame. Returned by df['col_name']. Does not own the underlying memory — the parent DataFrame does.

Properties

`name → str`

Column name.

`dtype → str`

Column type ("Int64", "Float64", "Utf8").

Arithmetic Operators

Operations dispatch to SIMD-accelerated Rust kernels. All return a new Series.

total = df['price'] * df['quantity']
net   = df['total'] - df['discount']
tax   = df['total'] + df['tax_amount']
unit  = df['revenue'] / df['quantity']

Comparison Operators

Return a boolean Series usable as a filter mask.

Method	Meaning
`s.eq(val)`	`==`
`s.neq(val)`	`!=`
`s.gt(val)`	`>`
`s.gte(val)`	`>=`
`s.lt(val)`	`<`
`s.lte(val)`	`<=`

mask = df['price'].gt(100.0)
df_filtered = df.filter(mask)

mask2 = df['state'].eq("TX")
df_tx = df.filter(mask2)

Aggregations

All aggregation methods return a Python scalar.

Method	Returns	Description
`sum()`	`float`	Sum of all non-null values
`mean()`	`float`	Arithmetic mean
`min()`	`float`	Minimum value
`max()`	`float`	Maximum value
`std()`	`float`	Sample standard deviation
`count()`	`int`	Count of non-null values

total   = df['revenue'].sum()
average = df['revenue'].mean()
high    = df['revenue'].max()
low     = df['revenue'].min()
spread  = df['revenue'].std()
valid   = df['id'].count()

Transformations

`fillna(value) → Series`

df['price'].fillna(0.0)

`round(decimals) → Series`

df['price'].round(2)

NumPy Zero-Copy

import numpy as np

# Direct pointer into Rust buffer — no allocation
arr = np.array(df['price'])   # dtype: float64

Works on Float64 columns. Cast Int64 columns first:

df.cast("quantity", "Float64")
arr = np.array(df["quantity"])

Error Codes

All database functions raise RuntimeError with a descriptive message on failure. The underlying Rust function returns integer error codes:

Code	Meaning
`-1`	Invalid manager pointer (null)
`-2`	Invalid connection string
`-3`	Invalid table / query string
`-4`	Invalid mode string
`-5`	Invalid conflict columns JSON
`-10`	File not found (`write_sql_prdx` only)
`-20`	Empty connection string (`write_sql_prdx` only)
`-100`	Operation failed — check stderr for Rust error details

!!! tip “Stderr logging” When -100 is returned, the Rust core logs the actual database error to stderr before returning. Run with stderr visible to diagnose connection or schema issues.

SQL Cursor API (Gap 30)

Added in v0.3.4

Streaming iterator over PostgreSQL query results. Each batch yields a DataFrame without loading the full result set into memory.

`query_to_results`

def query_to_results(connection_string: str, query: str, batch_size: int = 100_000) -> Generator[DataFrame, None, None]

Generator that opens a server-side PostgreSQL cursor and yields DataFrame objects one batch at a time. Uses DECLARE ... NO SCROLL CURSOR internally — the connection remains open for the duration of the iteration. Memory usage is O(batch_size rows).

Parameter	Type	Description
`connection_string`	`str`	PostgreSQL connection string (`postgresql://user:pass@host:port/db`)
`query`	`str`	SQL query to execute (SELECT statement)
`batch_size`	`int`	Rows per batch (default: 100,000)

Yields: DataFrame — one batch per iteration. Columns match the query result schema.

Raises: RuntimeError if the cursor cannot be opened.

import pardox as px

CONN  = "postgresql://user:pass@localhost:5432/db"
QUERY = "SELECT * FROM sales ORDER BY date"

# Streaming iterator — exact pattern from GitHub issue @Prussian1870
for batch_df in px.query_to_results(CONN, QUERY, batch_size=50_000):
    records = batch_df.to_dict()     # list of dicts
    json_str = batch_df.to_json()    # JSON string
    rows, cols = batch_df.shape      # inspect shape

`sql_to_parquet`

def sql_to_parquet(connection_string: str, query: str, output_pattern: str, chunk_size: int = 100_000) -> int

Streams a SQL query result directly to PardoX binary files (.prdx) using a filename pattern. No full result set is loaded into RAM — memory usage is O(chunk_size rows).

Parameter	Type	Description
`connection_string`	`str`	PostgreSQL connection string
`query`	`str`	SQL query to execute
`output_pattern`	`str`	Output file path pattern. Use `{i}` as the chunk index placeholder, e.g. `"/tmp/chunk_{i}.prdx"`
`chunk_size`	`int`	Rows per output file (default: 100,000)

Returns: int — total rows written across all files. Raises: RuntimeError on failure.

import pardox as px

total = px.sql_to_parquet(
    "postgresql://user:pass@localhost:5432/db",
    "SELECT * FROM sales",
    "/data/sales_chunk_{i}.prdx",
    chunk_size=100_000
)
print(f"Exported {total:,} rows")

# Read individual chunks back
df = px.read_prdx("/data/sales_chunk_0.prdx")

Validated: 250,000 rows streamed across 3 chunk files — 11/11 tests in Python, JavaScript, and PHP.

API Reference

Top-Level Functions (import pardox as px)

read_csv

read_prdx

from_arrow

pardox.io — Database I/O

PostgreSQL

read_sql(connection_string, query) → DataFrame

execute_sql(connection_string, query) → int

MySQL

read_mysql(connection_string, query) → DataFrame

execute_mysql(connection_string, query) → int

SQL Server

read_sqlserver(connection_string, query) → DataFrame

execute_sqlserver(connection_string, query) → int

MongoDB

read_mongodb(connection_string, db_dot_collection) → DataFrame

execute_mongodb(connection_string, database, command_json) → int

Class: DataFrame

Construction

Properties

shape → tuple[int, int]

columns → list[str]

dtypes → dict[str, str]

Inspection

show(n=10)

head(n=5) → DataFrame

tail(n=5) → DataFrame

iloc(start, end) → DataFrame

Type Operations

cast(col, target_type) → DataFrame

Arithmetic Methods

mul(col_a, col_b) → DataFrame

add(col_a, col_b) → DataFrame

sub(col_a, col_b) → DataFrame

std(col) → float

min_max_scale(col) → DataFrame

Sorting

sort_values(by, ascending=True, gpu=False) → DataFrame

Filtering

filter(mask: Series) → DataFrame

Data Cleaning

fillna(value: float) → DataFrame

round(decimals: int) → DataFrame

Observer — Export & Inspection

to_dict() → list[dict]

to_json() → str

value_counts(col) → dict[str, int]

unique(col) → list

Joins

join(other, on=None, left_on=None, right_on=None) → DataFrame

Writers

to_prdx(path) → bool

to_csv(path) → bool

to_sql(connection_string, table_name, mode="append", conflict_cols=[]) → int

write_sql_prdx — PRDX Streaming to PostgreSQL

to_mysql(connection_string, table_name, mode="append", conflict_cols=[]) → int

to_sqlserver(connection_string, table_name, mode="append", conflict_cols=[]) → int

to_mongodb(connection_string, db_dot_collection, mode="append") → int

Class: Series

Properties

name → str

dtype → str

Arithmetic Operators

Comparison Operators

Aggregations

Transformations

fillna(value) → Series

round(decimals) → Series

NumPy Zero-Copy

Error Codes

SQL Cursor API (Gap 30)

query_to_results

sql_to_parquet

Top-Level Functions (`import pardox as px`)

`read_csv`

`read_prdx`

`from_arrow`

`pardox.io` — Database I/O

`read_sql(connection_string, query) → DataFrame`

`execute_sql(connection_string, query) → int`

`read_mysql(connection_string, query) → DataFrame`

`execute_mysql(connection_string, query) → int`

`read_sqlserver(connection_string, query) → DataFrame`

`execute_sqlserver(connection_string, query) → int`

`read_mongodb(connection_string, db_dot_collection) → DataFrame`

`execute_mongodb(connection_string, database, command_json) → int`

Class: `DataFrame`

`shape → tuple[int, int]`

`columns → list[str]`

`dtypes → dict[str, str]`

`show(n=10)`

`head(n=5) → DataFrame`

`tail(n=5) → DataFrame`

`iloc(start, end) → DataFrame`

`cast(col, target_type) → DataFrame`

`mul(col_a, col_b) → DataFrame`

`add(col_a, col_b) → DataFrame`

`sub(col_a, col_b) → DataFrame`

`std(col) → float`

`min_max_scale(col) → DataFrame`

`sort_values(by, ascending=True, gpu=False) → DataFrame`

`filter(mask: Series) → DataFrame`

`fillna(value: float) → DataFrame`

`round(decimals: int) → DataFrame`

`to_dict() → list[dict]`

`to_json() → str`

`value_counts(col) → dict[str, int]`

`unique(col) → list`

`join(other, on=None, left_on=None, right_on=None) → DataFrame`

`to_prdx(path) → bool`

`to_csv(path) → bool`

`to_sql(connection_string, table_name, mode="append", conflict_cols=[]) → int`

`write_sql_prdx` — PRDX Streaming to PostgreSQL

`to_mysql(connection_string, table_name, mode="append", conflict_cols=[]) → int`

`to_sqlserver(connection_string, table_name, mode="append", conflict_cols=[]) → int`

`to_mongodb(connection_string, db_dot_collection, mode="append") → int`

Class: `Series`

`name → str`

`dtype → str`

`fillna(value) → Series`

`round(decimals) → Series`

`query_to_results`

`sql_to_parquet`