Input / Output Operations
The bottleneck in most data pipelines is not computation, but IO. PardoX moves all data ingestion and persistence entirely into the Rust core, bypassing Python’s file handling and object creation overhead.
1. CSV Files
PardoX features a multi-threaded CSV reader. The file is memory-mapped and parallel workers parse chunks simultaneously. No Python objects are created during ingestion.
Basic usage
import pardox as px
# Automatic type inference (Int64, Float64, Utf8)
df = px.read_csv("dataset.csv")
print(df.shape) # (rows, cols)
print(df.columns) # ['col1', 'col2', ...]
Manual schema
df = px.read_csv("dataset.csv", schema={
"price": "Float64",
"quantity": "Float64",
"id": "Int64",
"name": "Utf8",
})
!!! info “Intelligent type inference” Without a schema, the engine scans the first rows to detect Int64, Float64, or Utf8. Pass a schema to override for specific columns.
!!! success “Parallel parsing” The file is split into logical blocks processed by multiple CPU cores concurrently.
2. Native Format (.prdx)
The PRDX format is a custom binary layout designed for instant persistence. Reading is a near-direct memory copy — no parsing, no schema detection.
Save
df.to_prdx("output.prdx")
Load
df = px.read_prdx("output.prdx")
!!! success “Benchmark” In tests with 10 GB datasets, reading a .prdx file achieves ~4.6 GB/s throughput, limited only by NVMe SSD speed.
| Format | Read speed (2 GB file) |
|---|---|
| CSV | ~8s |
| Parquet | ~3s |
| PRDX | ~0.5s |
3. Apache Arrow Bridge
PardoX integrates with the Arrow ecosystem via the Arrow C Data Interface. Conversion passes memory pointers — no data is copied.
import pyarrow as pa
import pardox as px
arrow_table = pa.Table.from_pydict({"price": [1.5, 2.0], "qty": [10, 20]})
df = px.from_arrow(arrow_table)
!!! tip “Interoperability” The Arrow bridge connects PardoX to Polars, DuckDB, Apache Spark, and any other tool that supports the Arrow IPC format.
4. PostgreSQL
PardoX uses tokio-postgres inside the Rust core. The host language (Python / Node.js / PHP) never touches the database wire protocol.
Connection string
postgresql://user:password@host:port/database
Read
from pardox.io import read_sql
df = read_sql(
"postgresql://pardox:secret@localhost:5432/mydb",
"SELECT id, amount, region FROM orders WHERE status = 'complete'"
)
Execute DDL / DML
from pardox.io import execute_sql
execute_sql(CONN, "DROP TABLE IF EXISTS orders_archive")
execute_sql(CONN, """
CREATE TABLE orders_archive (
id BIGINT,
amount DOUBLE PRECISION,
region TEXT
)
""")
# Returns 0 for DDL, affected rows for DML
n = execute_sql(CONN, "DELETE FROM orders WHERE status = 'cancelled'")
Write
# mode='append' — auto-activates COPY FROM STDIN for > 10,000 rows
rows = df.to_sql(CONN, "orders_archive", mode="append")
print(f"{rows:,} rows written")
# mode='upsert' — INSERT … ON CONFLICT (cols) DO UPDATE SET …
rows = df.to_sql(CONN, "orders_archive", mode="upsert", conflict_cols=["id"])
!!! success “Bulk path — COPY FROM STDIN” When writing more than 10,000 rows with mode="append", the Rust core automatically switches to PostgreSQL’s COPY FROM STDIN protocol. The CSV payload is serialized in memory and streamed at wire speed — no temp files. This is typically 10x–50x faster than multi-row INSERT.
5. MySQL
PardoX uses the mysql crate v25.
Connection string
mysql://user:password@host:port/database
Read
from pardox.io import read_mysql
df = read_mysql(
"mysql://pardox:secret@localhost:3306/mydb",
"SELECT * FROM products WHERE active = 1"
)
Execute DDL / DML
from pardox.io import execute_mysql
execute_mysql(CONN, "DROP TABLE IF EXISTS products_bak")
execute_mysql(CONN, """
CREATE TABLE products_bak (
id BIGINT,
name TEXT,
price DOUBLE,
quantity DOUBLE
)
""")
Write
# append — chunked INSERT 1,000 rows/stmt (auto LOAD DATA for > 10k if server allows)
rows = df.to_mysql(CONN, "products_bak", mode="append")
# replace — REPLACE INTO (delete + insert)
rows = df.to_mysql(CONN, "products_bak", mode="replace")
# upsert — INSERT … ON DUPLICATE KEY UPDATE
rows = df.to_mysql(CONN, "products_bak", mode="upsert", conflict_cols=["id"])
!!! info “LOAD DATA LOCAL INFILE” For append with more than 10,000 rows, PardoX attempts LOAD DATA LOCAL INFILE. If the server has local_infile=OFF (MySQL default), it falls back automatically to the 1,000-row chunked INSERT and logs a notice to stderr. To enable the fast path:
```sql
SET GLOBAL local_infile = 1;
```
6. SQL Server
PardoX uses the tiberius crate v0.12 with a single-thread Tokio runtime.
Connection string (ADO.NET format)
Server=host,port;Database=db;UID=user;PWD=password;TrustServerCertificate=Yes
Read
from pardox.io import read_sqlserver
CONN = "Server=localhost,1433;Database=mydb;UID=sa;PWD=MyPassword;TrustServerCertificate=Yes"
df = read_sqlserver(CONN, "SELECT TOP 1000 * FROM dbo.orders")
Execute DDL / DML
from pardox.io import execute_sqlserver
execute_sqlserver(CONN, "DROP TABLE IF EXISTS dbo.orders_bak")
execute_sqlserver(CONN, """
CREATE TABLE dbo.orders_bak (
id BIGINT,
amount FLOAT,
region NVARCHAR(MAX)
)
""")
Write
# append — multi-row INSERT, 500 rows per statement
rows = df.to_sqlserver(CONN, "dbo.orders_bak", mode="append")
# upsert — MERGE INTO … WHEN MATCHED … WHEN NOT MATCHED
rows = df.to_sqlserver(CONN, "dbo.orders_bak", mode="upsert", conflict_cols=["id"])
!!! warning “Password special characters” Avoid using ! in SQL Server passwords. A known issue in tiberius v0.12 causes authentication failure when ! is present in the password when connecting via TCP from an external host. Use only [A-Za-z0-9_\-@#$]. A fix is tracked for v0.4.0.
!!! info “Bulk performance” SQL Server writes use 500-row multi-value INSERT statements. For 50,000 rows this results in 100 round-trips instead of 50,000. A BULK INSERT / bcp path is planned for v0.4.0.
7. MongoDB
PardoX uses the mongodb crate v2.8.
Connection string (MongoDB URI)
mongodb://user:password@host:port
Read
from pardox.io import read_mongodb
# Target format: "database.collection"
df = read_mongodb("mongodb://admin:secret@localhost:27017", "mydb.orders")
Execute commands
from pardox.io import execute_mongodb
# Drop a collection
execute_mongodb("mongodb://...", "mydb", '{"drop": "orders_bak"}')
Write
# append — insert_many in batches of 10,000 documents, ordered:false
rows = df.to_mongodb("mongodb://...", "mydb.orders_bak", mode="append")
# replace — drops collection, then inserts all documents
rows = df.to_mongodb("mongodb://...", "mydb.orders_bak", mode="replace")
!!! success “Batch behavior” ordered: false is used on every insert_many call. This means MongoDB continues inserting valid documents even if individual documents fail (e.g., duplicate key), rather than aborting the entire batch.
Write Modes Summary
| Database | append | replace | upsert |
|---|---|---|---|
| PostgreSQL | INSERT (COPY for >10k) | — | INSERT ON CONFLICT DO UPDATE |
| MySQL | INSERT 1k/stmt (LOAD DATA for >10k) | REPLACE INTO | INSERT ON DUPLICATE KEY UPDATE |
| SQL Server | Multi-row INSERT 500/stmt | Multi-row INSERT 500/stmt | MERGE INTO |
| MongoDB | insert_many 10k/batch | drop + insert_many | — |
Table Must Exist Before Writing
PardoX write methods append to an existing table and do not auto-create it. Always call execute_* first:
from pardox.io import execute_sql
execute_sql(CONN, """
CREATE TABLE IF NOT EXISTS my_table (
id BIGINT,
amount DOUBLE PRECISION,
label TEXT
)
""")
rows = df.to_sql(CONN, "my_table", mode="append")