Posts

2024 Nov 29

Timeseries and ASOF Joins

Pairing up timeseries data when the timestamps don’t match exactly (in Pandas, Polars, DuckDB, Postgres & QuestDB)

2024 Nov 15

Archiving Time-Series Data from PostgreSQL into Parquet

Keeping your database lean

2024 Oct 17

Graph Query Interfaces: A Comparison Between SQL and Cypher

Featuring DuckDB & KuzuDB

2024 Aug 06

Some Notes on Vector Indexing in DuckDB

Once you’ve indexed your vectors for similarity search, be sure to check your query plans, just in case the DB decides to opt for a sequential scan

2024 Aug 04

Combining Lexical and Semantic Search with Reciprocal Rank Fusion

Best of both worlds sort of thing

2024 Aug 03

Vector Indexing and Search with DuckDB & FastEmbed

Using DuckDB for vector/semantic search

2024 Jun 11

Leanstore: High Performance Low-Overhead Buffer Pool

A dash of pointer swizzling, a sprinkle of optimistic locking and a touch of lean eviction, that’s the secret to a high performance buffer pool!

2024 Jun 04

Larger-Than-Memory Data Management

For when the database exceeds the main memory size

2024 May 26

Hybrid Locking & Synchronization

Fast-path optimistic locking with fallback to pessimistic RW locks under contention

2024 Apr 16

Optimizing Data Placement for Distributed OLAP Systems

Using MIP solvers to model and optimize shard placement

2024 Apr 04

DuckDB JIT Compiled UDFs with Numba

JIT compiling your vectorized UDFs with Numba. Plus pure SQL is plenty fast if you can figure out how to write it

2024 Mar 21

Guided Local Search for the Capacitated Facility Location Problem

Overview of Guided Local Search plus how it can be applied to the capacitated facility location problem.

2024 Mar 11

Minizinc: Alternative Modeling Approaches for the Facility Location Problem

Multiple views and Channeling Constraints make for faster models (in some cases)

2024 Mar 06

The Facility Location Problem

Discrete Optimization: Where to construct facilities so as to minimize setup costs and customer servicing costs while ensuring each facility is able to meet customer demands.

2024 Jan 10

Vectorized DuckDB UDFs with Rust and Python FFI

Implementing vectorized UDFs in Rust that you can use in DuckDB, with a little help from Arrow

2024 Jan 07

Optimizing CPU & Memory Interaction: Matrix Multiplication

Same algo, different memory access patterns, what could go wrong (or right)!

2024 Jan 05

x86 Cache Control Instructions

Wherein the OS and user get more control over the L1,L2 and L3 caches, mostly for performance.

2024 Jan 04

Retrieving Memory and Cache Organization

Memory, Cache levels, Cache sizes, TLB, associativity and so on

2023 Nov 25

Microbenchmarking: Way more than I set out to know

RDTSC, Out-of-order execution, OS interrupts, cycles, frequency and more.

2023 Aug 22

Logging in Go

Best practices, Logging levels, structured logging, Logging & Telemetry (Metrics, Tracing), Audit logs

2023 Jul 27

Handling Missing Values in Timeseries Datasets

Filling gaps using Last observation carried forward, next observation carried backwards, median & linear interpolation

2023 Jul 25

Wrangling JSON with DuckDB

For when you need to impose some structure on semi-structured data

2023 Jul 08

Parquet + Zstd: Smaller faster data formats

Often, parquet files have to be compressed. For fast compression, use LZ4 or Snappy. For the highest data compression ratio, use brotli. For both, zstd

2023 Jun 24

Lateral Joins & Iterators in SQL

Sneaking for-loops into SQL without anyone noticing

2023 Jun 22

SQL Grouping sets, Rollups & Cube

Computing multiple Group-bys with less steps

2020 Jun 07

Generalized Range Difference, Recursion & Search Availability in PostgreSQL

We’re building a scheduling app. Users mark booked slots, represented in Postgres using time or date-range data types. Let’s see how we can find all the freely available slots efficiently

2020 Jan 06

Back To Basics: The foundation of Joins in SQL

Writing SQL joins without using joins at all. A quick history of Database Models, Schemas, Constraints, Cross-products and everything in between

2020 Jan 03

SQL joins as reduce/folds over relations

Of what to make of joins in sql, mental models and building better understanding