Machine-Driven Code Review

Our commit messages, code reviews, and system diagrams are all automated.

Dec 26, 2025

A whimsical watercolor illustration shows a large man in a scarf using a wand to conjure a stream of paper notes from a wooden table, while two children with magnifying glasses examine notes on the table, and another child watches. Snowflakes fall in the background.

Across my 15+ year career, spanning the mega corps of AWS and Salesforce and scrappy startups like Convoy, the code review process was more-or-less the same: You write code. You summarize it. You ask a colleague to critique it. You fix the errors. You repeat.

It relied entirely on human vigilance, human patience, and the hope that your colleague had had enough coffee that morning to spot a missing semicolon.

And then 2025 came and all of this changed.

At Logic, our six-person engineering team transformed how we run reviews. Each step now catches more errors with much less human intervention.

A whiteboard diagram titled "2025 Dev Flow (LLM-driven)" outlines three stages: "Automated Commit Messages" (using Logic API), "AI-assisted PR Reviews" (via GitHub Workflow and Claude), and "Diagram & PRD Generation" (using Logic API/Nano Banana Pro).

Summarizing The Work

The first change we made was automating commit messages, which also double as our pull request descriptions. LLMs are really good at taking a series of changes and following a set of guidelines to summarize them into something smaller.

We actually use Logic to power this. Our whole platform is built to give AI-powered features an API, so this is an easy use case for us. The spec documents our best practices for writing a commit message, and the platform gives us back an API we can call from anywhere. Including git hooks!

Since the best practices are defined in a shared commit spec that we all use, our commit messages are consistent across the team. Any change or modifications we make automatically and instantly get used by the entire engineering team

Portion of our Commit Message Guidelines

Craft Subject

Start with verb that matches intent (Add, Fix, Update, Remove, Refactor, Optimize, Document)
Append concise object phrase
Prepend [TICKET] if ticket present (e.g. PROJ-123)
Ensure ≤ 50 chars

Draft Body

Include when diff ≥ 5 lines or intent not obvious from the subject alone.
Follow these guidelines
1. Line Wrap: Hard-wrap at 72 characters.
2. Content: Explain what and why; omit how (visible in diff).
3. Bullets: Start with - for multiple points.
4. References: Mention related commits, design docs, or rationale only if contained in the diff or branch name (no URLs).

We call the generated API from a prepare-message git hook. It sends the branch name and code diff as input and gets back a high quality, consistently formatted, commit message as the response.

This simplifies the process and also provides a nice gut-check of the code change. Now, when I’m opening a pull request and I see the commit message includes “ - Add many verbose debug statements,” it probably means I forgot to remove some debugging code I had overlooked before pushing.

I can now catch myself before I annoy others and require them to call out my mistake.

While not foolproof, it provides a useful safety net.

Automating Taste

A flowchart illustrating a development workflow where a developer pushes a Pull Request (PR) to a repository. Claude, an AI reviewer, auto-reviews the PR, leaves comments, and makes code changes. Lessons are extracted from Claude's reviews and fed back into the repository's Claude.md file, leading to faster PRs and allowing human reviewers to focus on design and tradeoffs.

The next big change to our Pull Request flow was the release of Claude Code, and specifically Anthropic’s Claude Code Action. This adds Claude into our GitHub workflows, adding several improvements to the code review and iteration process.

We built out a code review prompt that defines what we want from a high quality code review. It is detailed (~1,500 tokens), focusing on several different categories, including analysis of architecture, code standards, security concerns, etc. It also includes a final checklist that helps Claude think through each specific change.

Final Code Review Checklist

No functions over 50 lines
No hardcoded values that should be config
No commented-out code
No console.log statements
All TODOs have associated tickets
Database migrations tested up and down
API changes reflected in TypeBox schemas
Security considerations addressed

Claude will leave comments on the code, and can make code changes and respond to comments that I leave. So if Claude notices a function is longer than recommended, I can just ask for a refactor, and Claude will make the change. All inline from within the pull request without leaving GitHub.

Finally, once a PR is approved and ready to merge, we have a different prompt that analyzes all human reviewer comments and extracts lessons into our repo's CLAUDE.md file. This lets Claude learn from each PR and automatically improve over time, based on real feedback, leaving less and less for the humans to comment on.

Just like linters and formatters have long removed the need for nitpicky comments about white-space and formatting, Claude’s semantic review + ability to auto-fix pre-empts another large set of common classes of comments around code complexity, architecture patterns, entity naming, etc.

Human reviewers are freed up to approach the PR with a focus on bigger decisions and tradeoffs.

The net effect is that our codebase is kept highly consistent, feeling unified, and our PRs are assembled faster, reviewed faster, and improved faster.

And the entire process itself is self-improving.

Making Human Review Easier

Another big update occurred with the release of Google’s latest image model. Within a few hours, our CEO noticed how good it was at generating whiteboard diagrams.

This gave us one last improvement to our pull-request process in 2025.

Since Logic supports generating images, I wrote up a new doc for generating whiteboard diagrams from a code diff or list of product requirements. We use that API to generate visual summaries of our changes.

It adds yet another safety net. A human can quickly look at the diagram that comes out and, if it visually looks complex, the change may be too big for a single pull-request or maybe it’s just entirely architected wrong.

A complex system that works is invariably found to have evolved from a simple system that worked.
Gall’s Law

The process is very useful at summarizing complex technical work and discussions. When testing this API, I even fed it a Slack thread from a DNS debugging session.

A screenshot of a Slack conversation between Steve and Mark discussing issues with cert validation, DNS records, and AWS, specifically regarding delegating control to AWS.

What I got back was very close to the diagram I would have drawn myself.

A whiteboard diagram titled "BEFORE" and "AFTER" illustrates DNS resolution for logic.new. The "BEFORE" side shows queries failing to find TXT records due to DNS servers, causing cert validation to fail. The "AFTER" side shows DNS delegating to AWS Route53, allowing TXT records to be found and cert validation to succeed.

Code review went largely unchanged for 15+ years. But over the last 12 months we have commits that write themselves, an LLM that catches and fixes issues before humans even see them, we programmatically enforce style and taste, and now even produce diagrams for human context setting. Our small engineering team moves faster, and it’s not something that we say, it’s what our customers say too.

This is incredible! I don’t think I’ve ever seen engineers inject new capabilities quite so quickly. Kudos to you and your team.
Quote from a customer doing industry research with Logic

This is our new baseline for review. Let’s see what 2026 brings.

Rainbow Roxy

Jan 2

Wow, the part about automating commit messages with LLMs really stood out to me. It's so cool how you're using your own 'Logic' platform for that. What kind of unexpected benefits or challenges have you definitelly encountered with this shift to AI-driven reviews? This whole approach sounds incredibly insightful!

Bits of Logic

Discussion about this post

Ready for more?