Enjoyer of Go, Rust, Python, Databases & SQL
Posts
2024 Nov 29
Pairing up timeseries data when the timestamps don’t match exactly (in Pandas, Polars, DuckDB, Postgres & QuestDB)
2024 Nov 15
Archiving Time-Series Data from PostgreSQL into Parquet
Keeping your database lean
2024 Oct 17
Graph Query Interfaces: A Comparison Between SQL and Cypher
Featuring DuckDB & KuzuDB
2024 Aug 06
Some Notes on Vector Indexing in DuckDB
Once you’ve indexed your vectors for similarity search, be sure to check your query plans, just in case the DB decides to opt for a sequential scan
2024 Aug 04
Combining Lexical and Semantic Search with Reciprocal Rank Fusion
Best of both worlds sort of thing
2024 Aug 03
Vector Indexing and Search with DuckDB & FastEmbed
Using DuckDB for vector/semantic search
2024 Jun 11
Leanstore: High Performance Low-Overhead Buffer Pool
A dash of pointer swizzling, a sprinkle of optimistic locking and a touch of lean eviction, that’s the secret to a high performance buffer pool!
2024 Jun 04
Larger-Than-Memory Data Management
For when the database exceeds the main memory size
2024 May 26
Hybrid Locking & Synchronization
Fast-path optimistic locking with fallback to pessimistic RW locks under contention
2024 Apr 16
Optimizing Data Placement for Distributed OLAP Systems
Using MIP solvers to model and optimize shard placement
2024 Apr 04
DuckDB JIT Compiled UDFs with Numba
JIT compiling your vectorized UDFs with Numba. Plus pure SQL is plenty fast if you can figure out how to write it
2024 Mar 21
Guided Local Search for the Capacitated Facility Location Problem
Overview of Guided Local Search plus how it can be applied to the capacitated facility location problem.
2024 Mar 11
Minizinc: Alternative Modeling Approaches for the Facility Location Problem
Multiple views and Channeling Constraints make for faster models (in some cases)
2024 Mar 06
Discrete Optimization: Where to construct facilities so as to minimize setup costs and customer servicing costs while ensuring each facility is able to meet customer demands.
2024 Jan 10
Vectorized DuckDB UDFs with Rust and Python FFI
Implementing vectorized UDFs in Rust that you can use in DuckDB, with a little help from Arrow
2024 Jan 07
Optimizing CPU & Memory Interaction: Matrix Multiplication
Same algo, different memory access patterns, what could go wrong (or right)!
2024 Jan 05
x86 Cache Control Instructions
Wherein the OS and user get more control over the L1,L2 and L3 caches, mostly for performance.
2024 Jan 04
Retrieving Memory and Cache Organization
Memory, Cache levels, Cache sizes, TLB, associativity and so on
2023 Nov 25
Microbenchmarking: Way more than I set out to know
RDTSC, Out-of-order execution, OS interrupts, cycles, frequency and more.
2023 Aug 22
Best practices, Logging levels, structured logging, Logging & Telemetry (Metrics, Tracing), Audit logs
2023 Jul 27
Handling Missing Values in Timeseries Datasets
Filling gaps using Last observation carried forward, next observation carried backwards, median & linear interpolation
2023 Jul 25
For when you need to impose some structure on semi-structured data
2023 Jul 08
Parquet + Zstd: Smaller faster data formats
Often, parquet files have to be compressed. For fast compression, use LZ4 or Snappy. For the highest data compression ratio, use brotli. For both, zstd
2023 Jun 24
Lateral Joins & Iterators in SQL
Sneaking for-loops into SQL without anyone noticing
2023 Jun 22
SQL Grouping sets, Rollups & Cube
Computing multiple Group-bys with less steps
2020 Jun 07
Generalized Range Difference, Recursion & Search Availability in PostgreSQL
We’re building a scheduling app. Users mark booked slots, represented in Postgres using time or date-range data types. Let’s see how we can find all the freely available slots efficiently
2020 Jan 06
Back To Basics: The foundation of Joins in SQL
Writing SQL joins without using joins at all. A quick history of Database Models, Schemas, Constraints, Cross-products and everything in between
2020 Jan 03
SQL joins as reduce/folds over relations
Of what to make of joins in sql, mental models and building better understanding