DuckDB
2024 Nov 29
Pairing up timeseries data when the timestamps don’t match exactly (in Pandas, Polars, DuckDB, Postgres & QuestDB)
2024 Oct 17
Graph Query Interfaces: A Comparison Between SQL and Cypher
Featuring DuckDB & KuzuDB
2024 Aug 06
Some Notes on Vector Indexing in DuckDB
Once you’ve indexed your vectors for similarity search, be sure to check your query plans, just in case the DB decides to opt for a sequential scan
2024 Aug 04
Combining Lexical and Semantic Search with Reciprocal Rank Fusion
Best of both worlds sort of thing
2024 Aug 03
Vector Indexing and Search with DuckDB & FastEmbed
Using DuckDB for vector/semantic search
2024 Apr 04
DuckDB JIT Compiled UDFs with Numba
JIT compiling your vectorized UDFs with Numba. Plus pure SQL is plenty fast if you can figure out how to write it
2024 Jan 10
Vectorized DuckDB UDFs with Rust and Python FFI
Implementing vectorized UDFs in Rust that you can use in DuckDB, with a little help from Arrow
2023 Jul 08
Parquet + Zstd: Smaller faster data formats
Often, parquet files have to be compressed. For fast compression, use LZ4 or Snappy. For the highest data compression ratio, use brotli. For both, zstd
2023 Jun 22
SQL Grouping sets, Rollups & Cube
Computing multiple Group-bys with less steps
2023 Jun 16
Programmatically creating a DuckDB table from an Arrow schema
PyArrow lets you create an empty table. Use that instead of custom mappings to create a DuckDB schema.