Scaling Tastiqo: How I Used L1, L2, and Redis Caching to 15x Liquid Theme Performance

For my Final Year Project (FYP), I have been building Tastiqo (tastiqo.dukandar.app), a massive multi-tenant SaaS platform. One of the core features of Tastiqo is its fully customizable, dynamic Liquid-based storefronts (for example: tastiqo.api.tastiqo.dukandar.app).

However, as the system grew, I ran into a severe bottleneck. The Liquid rendering engine was absolutely choking under load.

This post details how I profiled the application, identified the bottlenecks, and implemented a robust multi-layered caching architecture (using L1 in-memory caches, L2 Redis, and compiled AST caching) to achieve a ~15x-25x performance speedup, effectively dropping render latencies from 941ms down to under 60ms.

Redis

PostgreSQL

Why Caching Was Necessary

To benchmark the public storefront, I ran an autocannon test against the live server. The results were terrifying: the server was capping at a measly ~107 req/s with a 941 ms median latency (and a brutal 6.1s max latency) at 200 concurrent connections.

Latency  | 2.5%: 47ms | 50%: 941ms | 97.5%: 2580ms | 99%: 2942ms | Avg: 1026ms | Max: 6136ms
Req/Sec  | avg: 107   | min: 2     | max: 140

Profiling the request path revealed massive bottlenecks directly inside the Liquid renderer hot path. While fixing those raw performance issues, I also had to close a series of cache-invalidation gaps to ensure the new in-RAM caches didn't turn into memory leaks or cause stale-content bugs.

Part 1 — Performance Fixes (The Speedup)

Identifying the Root Causes

New liquid.Engine built on every request. The system was calling liquid.NewEngine() and manually re-registering all custom filters + the {% render %} tag on every single request. There was literally no engine cache.
Zero compiled-template cache → 40+ Liquid parses per page. The layout, every section, and every snippet went through string parsing on every request. A typical page ran ~40 full AST parse cycles per request. The existing Redis TemplateCache only cached raw source strings (and was actually inactive in production due to a nil client configuration).
Missing L1 Caches for hot data. GetAccountBySubDomain was doing a full Redis round-trip just to look up the account, even on a fully-warm path. GetPageByHandle was hitting Postgres directly on every single route hit.
Undersized Connection Pools. Database pools were stuck at 25 open / 10 idle, and Redis was using standard go-redis defaults.

The Fixes

To solve this, I completely re-architected the rendering engine lifecycle and caching layers:

engine.go: Created a new engineEntry per (subdomain, themeID). The engine is now reused, and compiled *liquid.Template ASTs are cached via FNV-64 hashing of the source. Live paths use the cache; editor/draft paths intentionally bypass it.
section.go: The SectionRenderer now holds a *engineEntry with a noCache flag to separate live traffic from editor previews.
renderer.go: Switched the Layout and Section render paths to use the cached engine entries.
repo.go: Added L1 cache (15 min) for GetAccountBySubDomain (warm path now takes ~0ms instead of a Redis RTT). Added L1 (10 min) + L2 Redis (7 day) for GetPageByHandle.
main.go / redis.go: Bumped DB pool to 100 open / 25 idle. Bumped Redis pool size to 50. Shared the renderer instance globally.

Expected Speedup (Before vs After)

Engine build & filters: Dropped from 5–20 ms to ~0 ms (Built once per tenant, reused forever).
Layout parse: Dropped from 10–50 ms to ~0 ms (Compiled AST cached).
Section parse × N: Dropped from 5–30 ms × N to ~0 ms × N (Compiled AST cached).
Snippet parse × M: Dropped from 5–20 ms × M to ~0 ms × M (Compiled AST cached).
Account lookup: Dropped from 5–15 ms to ~0 ms (Now served from L1 Ristretto).
Page lookup: Dropped from 30–100 ms to ~0 ms (Now L1 + L2).
Total per request: Dropped from ~700–1500 ms to ~20–60 ms (~15–25× faster).

With warm caches on a single Go process, the system is now projected to hit 1500–3000 req/s at sub-100ms medians on the same hardware.

Part 2 — Correctness Fixes (Preventing Memory Leaks)

Adding aggressive RAM caches is dangerous if you don't invalidate them properly. Without fixing the invalidation lifecycle, the system would suffer from:

A slow RAM leak during theme editing (one orphan compiled AST per save).
Stale Liquid output on the live storefront after a merchant publishes a theme.

Bypassing Cache for the Theme Editor

Every time a user clicks "Save" in the Theme Editor, the source bytes change. This generates a new FNV hash and creates a new compiled AST. Over a long editing session, this would slowly leak RAM. To fix this, I set noCompiledCache := data.IsDraft || data.EditorMode. Live storefront traffic hits the compiled cache, while draft/editor renders intentionally bypass it.

Proper Pub/Sub Invalidation

I wired the Renderer directly into our Pub/Sub listener.

Now, when an editor publishes a theme, the existing subdomain_themes:{sub} broadcast busts the engine and compiled cache simultaneously across all distributed nodes.
Account renames and page edits now instantly trigger an L1 purge via cache.BroadcastPurge().

Part 3 — Memory Analysis at Scale

The beautiful thing about this caching architecture is that memory scales by the number of unique storefronts visited, not the number of concurrent users. 100,000 users hitting the same storefront costs the same RAM as 1 user hitting it.

Here is the per-storefront cost:

liquid.Engine: ~50–200 KB
1 Compiled Layout: ~150 KB
10-20 Compiled Sections: ~500 KB – 1 MB
Total per tenant (subdomain + themeID): ~1.5 – 3 MB

For up to 5,000 highly active unique storefronts loaded simultaneously in RAM, the Go process will consume roughly 15 GB, which comfortably fits inside a standard 24 GB server node. The L1 Ristretto caches (for DB rows) are completely bounded by a strict MaxCost limit, ensuring the server never OOMs.

Conclusion

By meticulously auditing the hot paths, eliminating redundant AST parsing, and layering L1 in-memory caches over L2 Redis clusters, I was able to turn a highly unoptimized storefront generator into a blazing-fast, enterprise-grade renderer.

This deep dive proved exactly why I chose Go for my FYP—when you give it the right architecture, it screams.