April 6, 20265 min read

Regression Testing: Why It Becomes a Nightmare (And How to Fix It)

Regression testing starts manageable and slowly becomes the thing everyone dreads. Here's why it degrades, what smart regression selection looks like, and how to keep your regression suite useful over time.

TestingAutomationBest Practices

On this page

Why Regression Suites Degrade
1. Tests Only Get Added, Never Removed
2. Tests Become Disconnected From Current Behavior
3. "Regression" Means Everything
4. No Tiered Strategy
Smart Regression Selection
Tier 1: Always Run (Every Build / PR)
Tier 2: Run Nightly or on Release Branches
Tier 3: Run Before Release Only
Change-Based Regression Selection
Keeping the Suite Healthy: Ongoing Practices
Regular Pruning
Flakiness Zero Tolerance
Automation Balance
The Execution Strategy
The Mindset Shift

Regression testing is one of those things that feels solved until it isn't.

You start with 20 test cases. They run in 45 minutes. Everyone's happy. A year later you have 400 test cases, they take 6 hours to run, half of them are outdated, and your team is debating whether to run them at all before a release.

This is the regression testing lifecycle in most projects. Here's how to break the pattern.

Why Regression Suites Degrade

1. Tests Only Get Added, Never Removed

Every bug that makes it to production becomes a new regression test: "We need to make sure this never happens again." Over time, the suite grows without bound.

Nobody removes tests for features that were deprecated. Nobody removes duplicate tests that cover the same path. Nobody questions whether a test that was added 3 years ago for a one-time edge case still needs to be in the nightly run.

The result: a suite that contains hundreds of tests, many of which are testing behavior that no longer exists.

2. Tests Become Disconnected From Current Behavior

Product evolves. Test suite doesn't. A test written when the checkout flow had 3 steps now runs against a checkout flow with 5 steps. The test still passes (it's checking an intermediate state that still works), but it's no longer testing what you think it's testing.

[!WARNING] A passing test that doesn't cover the current behavior is worse than no test. It gives you false confidence. Audit your test suite regularly — not just for failures, but for relevance.

3. "Regression" Means Everything

In some teams, "regression testing" becomes shorthand for "everything we test." New feature? That's regression now. Bug fix? Regression. UI tweak? Add it to regression.

When regression means everything, it loses meaning. You can no longer tell at a glance what your regression suite is protecting.

4. No Tiered Strategy

All tests get treated equally. The test that validates the entire payment flow runs at the same frequency as the test that checks whether the settings icon has the right color. One is critical. One is not. But they both sit in the same queue.

Smart Regression Selection

Not every test needs to run for every change. The key is matching the regression scope to the risk of the change.

Tier 1: Always Run (Every Build / PR)

Critical paths that, if broken, are showstoppers:

Login and authentication
Core purchase / transaction flow
Data save and load (any feature where data loss = user complaint)
Crash-free startup

These run fast (keep them under 5 minutes total) and block merging if they fail.

Tier 2: Run Nightly or on Release Branches

The broader feature set that should work but isn't an immediate blocker if it needs investigation:

Full user journeys (onboarding, settings changes, feature flows)
Multi-platform smoke tests
API contract tests

Tier 3: Run Before Release Only

Deep edge cases, performance checks, full multi-device matrix. These take time and are only worth running when you're close to shipping.

code

Change → Trigger Tier 1 immediately
         Trigger Tier 2 on schedule
         Trigger Tier 3 manually for releases

[!TIP] This is called "risk-based test selection." You run more tests for riskier changes. A one-line copy change doesn't need a 6-hour full regression run. A payment flow refactor does.

Change-Based Regression Selection

For teams with good code coverage and impact analysis tools, you can go further: run only the tests that cover the code that changed.

bash

# Example: get changed files in the last commit
git diff HEAD~1 --name-only

# Run only tests that cover those files
./gradlew testDebugUnitTest --tests "*.CheckoutViewModelTest"

This requires investment in understanding which tests cover which code. But for large test suites, it dramatically reduces CI time while maintaining relevant coverage.

Keeping the Suite Healthy: Ongoing Practices

Regular Pruning

Every quarter, go through the regression suite and ask for each test:

Is this testing current behavior?
Is it covered by another test?
Is it testing something that's actually risky to regress?
When was this last updated?

Tests that fail these questions get removed or updated. This is not optional — it's maintenance, same as pruning dead code.

Flakiness Zero Tolerance

A flaky regression test is a liability. The team learns to ignore failures in that test, and a real regression will slip through alongside the flakes.

Policy: any test that fails without a code change gets investigated immediately. If it can't be made deterministic, it gets moved out of the automated suite.

Automation Balance

Not everything in regression needs to be automated. The 80/20 rule applies:

Automate the 80% that is repetitive, deterministic, and high-value
Keep 20% manual for the flows that require judgment, visual validation, or are too complex to automate reliably

Manual regression tests that run consistently and reliably are more valuable than automated tests that flake.

The Execution Strategy

For a typical Android app with a mix of features, our regression execution looks like this:

Per-PR (automated, ~3 minutes):

code

- Unit tests for changed modules
- API contract tests
- Tier 1 critical path smoke (login, core flow)

Nightly (automated, ~30 minutes):

code

- Full Tier 2 on 2 priority devices
- API integration tests
- Performance benchmark baseline check

Pre-release (mix of automated + manual, ~3 hours):

code

- Full suite automated on Tier 1 + Tier 2 devices
- Manual exploratory on new features
- Manual regression on recent bug-fix areas
- Full Tier 3 device matrix spot-check

The total time investment is manageable because we're not running everything all the time. We're running the right tests at the right moments.

The Mindset Shift

Regression testing fails when teams treat it as a fixed artifact — "the regression suite" that you run as-is forever.

It works when teams treat it as a living system — maintained, curated, tiered, and continuously evaluated for relevance.

The goal is a regression suite that gives you confidence in 30 minutes, not anxiety over 6 hours. That requires discipline: adding tests thoughtfully, removing them when they no longer serve a purpose, and being honest about the difference between test coverage and test value.

A smaller, trusted regression suite is better than a large, ignored one. Always.

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

GitHub Play Store

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

RSS Feed Telegram