Testing’s First Principles: Why AI Isn’t Here to Replace Test Scripts

Let's start with an uncomfortable truth: most of what we call "software testing" today isn't actually testing. It's checking. And if we’re being honest, a large part of the testing tool industry has been selling solutions to a problem they don’t really understand.

But before we get there, we need to rewind.

First Principles: What Is Testing Actually For?

Strip away the frameworks, the methodologies, and the conference booths promising “AI-powered test automation” (whatever that means this quarter) and ask a simpler question: Why do we test software?

The answer seems obvious until you try to articulate it. Most people answer with something like “to make sure the software works” or “to verify requirements are met.” But these answers are symptoms of decades of groupthink. They describe what we've been doing, not what we're actually trying to accomplish.

Here's the first principle: The purpose of software testing is to find problems in software applications.

Not to prove they work. Not to execute predetermined scripts. Not to hit 80% coverage numbers so a pipeline turns green.

To find problems. Real, meaningful problems that would impact users, damage business value, or undermine the system's purpose.

Once you accept this definition, something uncomfortable happens: a lot of what we label as “testing” no longer qualifies.

Automated Testing Isn't Testing

I can already hear the objections. "But our automated test suite catches bugs all the time!"

Does it? Or does it catch regressions… instances where something that previously worked now doesn't?

There's a critical difference.

Automated checks (let's call them what they are) execute a predetermined sequence of actions and compare actual results against expected results. They’re binary. They’re deterministic. They’re assertions wrapped in orchestration.

And they’re incredibly valuable. This isn’t a dismissal of their importance. Automated checks are essential for catching regressions, ensuring baseline functionality, and creating confidence in continuous deployment pipelines.

But they're not testing. They're checking.

Testing—real testing—is an investigative process. It's forming hypotheses about how a system might fail. It's following hunches down rabbit holes. It's designing experiments that stress the system in unexpected ways. It's the moment when an experienced tester tilts their head and thinks, "Wait, what happens if I do this?"

Automated checks mostly find problems you already thought to look for. Testing finds the problems you didn't know existed.

The Tool Vendor Problem

This leads to an uncomfortable irony in the testing tools market: most vendors don't understand the problem they're solving.

In recent years the pitch has been some variation of: "Our AI can write/maintain/execute your test scripts better than humans can!"

This is like optimizing a process that shouldn’t exist in the first place. The vendors have looked at the current state of testing, assumed it represents the platonic ideal of testing, and focused on making that specific thing more efficient.

But if automated checking isn't actually testing, then making it faster, smarter, or more "autonomous" doesn't solve the fundamental problem. It just scales up the wrong solution.

The result is teams getting better and faster at something that isn’t adding the value they think it is.

AI and the Future of Testing: A Different Perspective

So where does AI actually fit into the future of testing?

Not as a replacement for test automation engineers who write coded scripts. Not as a tool that "auto-generates" test cases from requirements documents (which, let's be honest, are often incomplete, ambiguous, or wrong anyway).

Instead, think about AI as a partner in the investigative work testers already do.

The real power of AI in testing isn't in executing predetermined checks—it's in exploration. In pattern recognition across vast state spaces. In identifying anomalies that don't trigger explicit failures but represent subtle deviations from expected behavior.

Think about what a skilled tester does: They build a mental model of how the system should work, then actively try to find gaps between that model and reality. They notice things that are technically "working" but feel wrong. They identify risks based on context, user expectations, and domain knowledge.

Now imagine applying AI to support that work:

Explore large interaction spaces that humans wouldn’t realistically cover by hand
Recognize subtle behavioral patterns across millions of operations that might indicate deeper systemic issues
Correlate seemingly unrelated anomalies across different system components
Build and refine models of system behavior dynamically, identifying deviations worth investigating

This isn’t about replacing testers with automated script factories.

The human brings domain knowledge, contextual understanding, and intuition about what matters. The AI brings tireless exploration, pattern recognition at scale, and the ability to notice the anomalous in vast seas of data.

Together, they might actually test instead of just check.

The Ephemeral Future: When Software Becomes Fluid

But let's go deeper. Let's question an assumption so fundamental we rarely articulate it: that software is a discrete, static artifact we can "test" at all.

Today, we think of software as something that's written, compiled, deployed, and then validated. It's a thing—an entity with defined boundaries that we can probe and measure.

But what if that changes?

We're already seeing hints of this evolution. Systems that dynamically rewrite their own behavior based on context. Applications that compose themselves on-demand from microservices. LLM-powered features where the actual behavior is emergent rather than explicitly programmed.

Imagine a future where software behaves less like a finished artifact and more like a continuously evolving system. Where the application you're using right now is subtly different from the one you used five minutes ago because it's constantly adapting, learning, and evolving based on usage patterns, user feedback, and environmental context.

In that world, traditional regression testing starts to break down. There's no static baseline to regress from. The system is inherently ephemeral.

How do you test something that's constantly becoming?

Testing the Ephemeral

One possible answer is that you stop testing specific implementations and focus instead on invariants.

Instead of verifying that specific interactions produce specific outputs, you define and continuously validate the fundamental properties the system must maintain regardless of how it evolves:

Value invariants: Does the system continue to deliver core value to users?
Safety invariants: Are critical constraints (security, data integrity, business rules) upheld?
Behavioral invariants: Does the system's behavior remain within acceptable boundaries of predictability and reliability?

This isn’t hypothetical. We're already moving in this direction with property-based testing, chaos engineering, and observability practices that treat production as a living laboratory.

But most organizations are still stuck thinking about testing as a pre-deployment gate check. "We test in staging, then deploy to production." In practice, this is why teams with “mature” test automation still rely on last-minute exploratory testing to catch the issues that actually matter.

In an ephemeral software world, there is no staging. There's only production. And testing becomes less about validation before deployment and more about continuous experimentation and observation during operation.

The role of AI here becomes even more critical. Humans can't continuously monitor and validate system invariants across thousands of dynamically generated service compositions. But AI systems designed to understand and validate behavioral boundaries at scale? That's a manageable problem.

The Real Paradigm Shift

Here's where we land: The future of testing isn't about making our current practices faster or more automated. It's about fundamentally rethinking what testing means in a world where:

The purpose is finding problems, not executing scripts
Software itself may become fluid and ephemeral
AI is a collaborative investigator, not a replacement test engineer

This requires letting go of assumptions many organizations are deeply invested in. That testing happens before deployment. That test cases are discrete, reusable artifacts. That coverage numbers, on their own, tell us anything meaningful about risk. That we can enumerate all the ways a system should behave.

The vendors selling "AI test automation" aren't wrong to focus on AI. They're just pointing it at the wrong problem. Instead of using AI to do what humans currently do (but faster!), we should be exploring what becomes possible when we partner human insight with AI's unique capabilities.

Real testing—the kind that finds actual problems—has always been about curiosity, skepticism, and investigative thinking. It's about asking "what if?" and following the evidence wherever it leads.

The better question isn’t “How do we use AI to automate testing?” but “How do we use AI to help us find problems we didn’t even know to look for?”

And that question is far more interesting than generating yet another layer of automation on top of an already brittle approach.

The irony, of course, is that writing a blog post about the future of testing that challenges current orthodoxy while standing on first principles is itself a kind of test. Will it find problems in our collective thinking? Will it reveal gaps between what we say testing is and what we actually do?

I guess we'll find out.

Testing’s First Principles: Why AI Isn’t Here to Replace Test Scripts

First Principles: What Is Testing Actually For?

Automated Testing Isn't Testing

The Tool Vendor Problem

The Ephemeral Future: When Software Becomes Fluid

Testing the Ephemeral

The Real Paradigm Shift

You may also like

The End of Interface: Why Only APIs Will Matter

How "Semi-Formal Reasoning" turns LLMs into Dependable Code Detectives

Thinking about your own AI, data, or software strategy?