Back Issues

AI, Inference, And Tokens, Oh My!

F5, Thursday, October 16th, 2025

Okay, kids, full stop right now. I'm hearing terms like 'tokens' and 'models' thrown around like Gen Z slang on a TikTok video, and most of the time it's completely wrong.

I said what I said.

It's (past) time to lay out how inference-based applications are built, where tokens are created and consumed, and how the various pieces of the puzzle fit together. So grab some coffee and let's dive right in.

What is an inference server?

I shouldn't have to start here but I will. When we say 'inference' we're really talking about the way a large language model (LLM) reasons through its voluminous data. You don't call an LLM directly. There is really no such thing as an 'LLM API,' and I will give you 'the mom glare' if you say it.

more → · More from F5 →