Last Thursday I attended ThaiPy - a monthly Python meetup in Bangkok. There was an interesting talk about DuckDB usage in Python.
I knew that DuckDB is superglue of data wrangling, but I didn’t know that it’s that powerful.
DuckDB can attach to various databases directly - for example Postgres:
ATTACH '' AS postgres_db (TYPE postgres);
SELECT * FROM postgres_db.tbl;And save data to Parquet from any source - not just Postgres:
COPY postgres_db.tbl TO 'data.parquet';
COPY postgres_db.tbl FROM 'data.parquet';In your code you can transparently switch between Pandas DataFrames and DuckDB, thanks to Apache Arrow:
pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")Actually you can represent your DuckDB results as almost any popular data crunching library in Python - Pandas, Polars, Arrow, NumPy:
duckdb.sql("SELECT 42").fetchall() # Python objects
duckdb.sql("SELECT 42").df() # Pandas DataFrame
duckdb.sql("SELECT 42").pl() # Polars DataFrame
duckdb.sql("SELECT 42").arrow() # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays


