Back of the Envelope
Posts
✉ Envelope #53: How AI Reasoning Models Work

✉ Envelope #53: How AI Reasoning Models Work

... and why that matters for engineers like us

Andy Lin
May 23, 2025

Happy Friday! Andy from Back of the Envelope here.

Last week, we talked about how ChatGPT works at a fundamental level [link].

It's an extremely sophisticated pattern recognition system, constantly predicting the next word (or token) to generate human-like responses.

One of the limitations of this approach is that it's still making educated guesses — great for writing and reorganizing texts, but not so great for math and engineering.

But that started to change over the past 6 months as researchers improved both the models and how we interact with them.

In this email, I want to pull back the curtain a little more… to explore what’s changed and what it means for us structural engineers.

Let’s go!
(Estimate read time: 2 minutes and 30 seconds)

Not long ago, if you asked ChatGPT to design a beam, you’d get something that sounded smart... but wasn’t quite right.

It might get the math wrong. Or it would pick a size using Fy × Z and forget to mention that it skipped over lateral bracing or deflection checks.

Why? Because it was trained to predict the next word, not to "reason."

Think of it like a student with really really good memory who's seen a lot of problems and solutions, but hasn’t actually learned to think through them step by step.

That was the case… until researchers came up with a clever workaround.

Before we continue, a quick note:

If you are working on a steel building and you haven’t heard of Durafuse, you might want to check it out.

It is a proprietary steel moment frame connection that offers high ductility and fast recovery after an earthquake.

And they’ve got your back with full design team support related to the connections: calcs, plan check comments, RFI responses, and shop drawing reviews.

Click here to learn more: https://go.sehq.co/durafuse

How Does "Reasoning" Work with Prediction?

In simple terms: the model first plans what it’s going to do, then walks through the problem step by step, checking itself along the way.

For reasoning-capable models like ChatGPT o3, Grok 3 (click on “Think”), or Gemini 2.5 Pro, this happens behind the scenes.

But you can catch a glimpse of the thinking process.

Grok 3 “Thinking”

What’s happening is that the model is still making predictions — but now it’s doing it “out loud.” And by talking to itself, it builds a kind of short-term memory to keep track of what it’s already figured out.

By producing that ongoing internal dialogue, it gets better at structured, step-by-step logic.

Try It Yourself

You can simulate this even in non-reasoning models by starting with a prompt like this:

“I want to simulate a reasoning model's thinking process via chain of thought. Help me develop a series of prompts that shows the process for designing and optimizing a steel beam per AISC 360-16.”

It’ll generate a series of guided prompts that you can use to simulate an AI thinking process.

(By the way, companies like OpenAI have already started integrating some thinking behaviors into non-reasoning models like GPT-4o. So you’ll likely see more and more of this.)

Limitations to Keep in Mind

One limitation is speed. Thinking takes time. It’s processing more steps and building that internal chain of logic.

Another is that it’s still probabilistic, not deterministic.

If you ask the same question twice with slight variation, you might get two different versions of a similar answer… because it’s still relying on statistical prediction.

For example:

“design a steel beam”

vs.

“design a steel beam per AISC 360-16”

….will often give two very different outputs.

So in practice, it’s useful for checking your work — but harder to trust unless you know what to expect.

It’s kind of like working with a junior engineer. Over time, you build up trust based on consistency and quality.

With AI, though, your “junior” might change from prompt to prompt.

This is also where tools and agents come into play.
We’ll dive into that in future emails.

Until then, enjoy experimenting.

Have a great weekend!

Andy Lin
Back of the Envelope/Structural Engineer HQ

PS.

If you’ve got a minute, could you help me out by filling out this 1-minute survey about moment frames? Thanks! [link]

Reply

or to participate.