How to baseline an engineering team in 30 days.

Most companies under 100 people have no idea how their engineering team is actually performing. Here's how to find out — without slowing the team down.

If you're a founder or COO running a company with an engineering team, here's a question worth sitting with: how would you know, today, whether your team is doing well?

Most of the answers we hear are circular. "They seem busy." "We're shipping things." "The senior engineer says we're on track." Those are vibes, not data. And vibes are what get you to the eighteenth month of a slow rebuild before realizing the team plateaued in month four.

You can't improve what you can't see. The first 30 days of a Fractional CTO engagement is mostly about installing visibility — not as a surveillance project, but as a way to tell the difference between a team that's getting better and a team that's just getting busier. Here's how that work runs.

Why the question is hard

Engineering performance is genuinely difficult to measure. Lines of code is a vanity metric. Story points are made up. Velocity moves with team composition. Tickets closed depends on how tickets are sized. Even commits per week can be gamed by anyone who knows you're watching commits.

The trap most founders fall into is picking any single metric and tracking it weekly. The metric becomes the goal, the team optimizes for it, the goal stops mapping to outcomes. Goodhart's Law lives in every engineering org: as soon as a measure becomes a target, it stops being a useful measure.

The fix isn't to find a better single metric. It's to triangulate across four lenses simultaneously, and to baseline once before deciding which to optimize.

The four lenses

The four-lens approach is what we run on our own systems through Concordance, and it's the same lens we use to baseline a new engagement's team:

1. Overall health

Is the team capable of producing reliable, maintainable software at a sustainable pace? Indicators include: deployment frequency (without incident), test coverage trajectory, code review cycle time, the ratio of feature work to maintenance, and whether the team can take a week off without the system falling over.

Health is the most important lens because everything else depends on it. A high-velocity team that's cooking the books on test coverage isn't healthy — they're burning down the future to look productive in the present.

2. Compliance evidence

Can the team demonstrate, in writing, that they're following the practices the business needs them to follow? This is less about regulatory compliance (though it covers SOX, HIPAA, SOC 2 where applicable) and more about: is there an audit trail of decisions, code reviews, deployment approvals, security reviews, incident responses?

If something goes wrong — a breach, a bad deploy, a customer complaint — can you reconstruct what happened and who decided what? At SMB scale, this isn't usually formal. The lens still applies. The teams that can answer "yes, here's the trail" are usually also the healthier teams.

3. AI risk posture

Two years ago this lens didn't exist. Today it's mandatory. Every engineering team is now using AI tools to write code, generate content, draft emails, and process data. The question is whether they're doing it with discipline.

What we look at: which AI tools the team has approved (and which they're using anyway), what data is being pasted into AI tools, how AI-generated code is reviewed before being committed, whether there's a clear position on AI in customer-facing communications. Most teams have an implicit AI policy. The lens makes it explicit.

4. Deployment risk

How risky is each deployment? Indicators: rollback rate, time to recover from incidents, the percentage of deployments that involve manual steps, whether deployments require senior engineers to be present, the on-call rotation's actual workload.

A team with high deployment risk has a hidden cost on every release — and the cost compounds as the system grows. Naming the risk is the first step to addressing it.

The 30-day baseline, week by week

Across the four lenses, here's what the actual baseline work looks like:

Week 1: Read the system

The Fractional CTO spends the first week reading. Not interviewing — reading. Source control, deployment logs, the last six months of incident reports, the team's tickets, the architecture diagrams (if they exist), the build pipeline, the test suite. The output is a written summary of what the system looks like from the inside.

This step is the one most often skipped, and it's the one that surfaces the most. Talking to engineers gives you their version. Reading the system gives you the truth.

Week 2: Talk to the team

One-on-ones with each engineer (45 minutes each), the senior engineering lead, and any product or design partners they work with. The questions are diagnostic, not evaluative: where do you spend most of your time? What slows you down? What's broken that we haven't talked about? What would you change if you could?

The honest answers come in week two when the trust is fresh and there's no political stake yet. By month three, you'd hear the rehearsed version. By week two, you hear what they actually think.

Week 3: Run the numbers

Pull the actual data: deployment frequency, lead time, change failure rate, recovery time, code review cycle, test coverage trends, incident frequency. For each, plot the last six months. Look for trends, not snapshots. A team that's deploying twice a week is doing fine; a team that was deploying twice a week six months ago and now deploys once every two weeks is in trouble.

If the data isn't readily available — which is often the case at SMB scale — that's itself a finding. The fix is usually 90% existing tools and 10% setup, not a tools-buying spree.

Week 4: Write it up

The output of the baseline is a single document, eight to twelve pages, written for the founder and the senior leadership team. Structure:

The document isn't the product. The conversation it triggers is the product. By the end of week four, leadership has a shared, evidence-based view of how their engineering team is actually performing — probably for the first time.

The most valuable thing the baseline produces isn't the score. It's the moment when the founder, the COO, and the senior engineer all look at the same data and agree on what's true.

The trap to avoid

The single biggest failure mode after a baseline is over-correcting on whichever metric scored worst. If deployment risk is high, the team installs ten new processes and slows everything down. If AI posture is weak, leadership bans AI tools company-wide. If test coverage is low, the team spends a quarter writing tests for code that's about to be replaced.

The point of the baseline isn't to fix everything that scored poorly. It's to pick two or three things to actually move on, and to leave the rest visible but unaddressed for now. A baseline you don't act on is a waste; a baseline you over-react to is worse.

What changes after the baseline

Three things, reliably:

First, leadership stops asking "are we doing well?" and starts asking "are these three numbers moving?" That's a calmer conversation, and a more productive one.

Second, the engineering team stops feeling like their work is invisible. Most engineers want to be measured on something real; they hate being measured on vibes. The baseline gives them something concrete.

Third, when something does go wrong — a bad deploy, a security incident, a senior engineer leaving — you have a real reference point. "Are things worse than they were six months ago, or is this just a bad week?" becomes answerable. That changes how you respond.

If you don't have a clear, evidence-based view of how your engineering team is actually performing, the free 30-minute discovery call is the right starting point. We'll walk through what a baseline would look like for your specific team and tell you honestly whether it's worth doing now or worth waiting.

Want to see how your team is actually performing?

30 minutes, free, no pitch. We'll talk through your team's current state and whether a 30-day baseline would tell you something you don't already know — or whether you've already got the visibility you need.

Book a Call →