DuckDB

2024 Nov 29

Timeseries and ASOF Joins

Pairing up timeseries data when the timestamps don’t match exactly (in Pandas, Polars, DuckDB, Postgres & QuestDB)

2024 Oct 17

Graph Query Interfaces: A Comparison Between SQL and Cypher

Featuring DuckDB & KuzuDB

2024 Aug 06

Some Notes on Vector Indexing in DuckDB

Once you’ve indexed your vectors for similarity search, be sure to check your query plans, just in case the DB decides to opt for a sequential scan

2024 Aug 04

Combining Lexical and Semantic Search with Reciprocal Rank Fusion

Best of both worlds sort of thing

2024 Aug 03

Vector Indexing and Search with DuckDB & FastEmbed

Using DuckDB for vector/semantic search

2024 Apr 04

DuckDB JIT Compiled UDFs with Numba

JIT compiling your vectorized UDFs with Numba. Plus pure SQL is plenty fast if you can figure out how to write it

2024 Jan 10

Vectorized DuckDB UDFs with Rust and Python FFI

Implementing vectorized UDFs in Rust that you can use in DuckDB, with a little help from Arrow

2023 Jul 08

Parquet + Zstd: Smaller faster data formats

Often, parquet files have to be compressed. For fast compression, use LZ4 or Snappy. For the highest data compression ratio, use brotli. For both, zstd

2023 Jun 22

SQL Grouping sets, Rollups & Cube

Computing multiple Group-bys with less steps

2023 Jun 16

Programmatically creating a DuckDB table from an Arrow schema

PyArrow lets you create an empty table. Use that instead of custom mappings to create a DuckDB schema.