Regression Testing: Why It Becomes a Nightmare (And How to Fix It)
Regression testing starts manageable and slowly becomes the thing everyone dreads. Here's why it degrades, what smart regression selection looks like, and how to keep your regression suite useful over time.
On this page
- Why Regression Suites Degrade
- 1. Tests Only Get Added, Never Removed
- 2. Tests Become Disconnected From Current Behavior
- 3. "Regression" Means Everything
- 4. No Tiered Strategy
- Smart Regression Selection
- Tier 1: Always Run (Every Build / PR)
- Tier 2: Run Nightly or on Release Branches
- Tier 3: Run Before Release Only
- Change-Based Regression Selection
- Keeping the Suite Healthy: Ongoing Practices
- Regular Pruning
- Flakiness Zero Tolerance
- Automation Balance
- The Execution Strategy
- The Mindset Shift
Regression testing is one of those things that feels solved until it isn't.
You start with 20 test cases. They run in 45 minutes. Everyone's happy. A year later you have 400 test cases, they take 6 hours to run, half of them are outdated, and your team is debating whether to run them at all before a release.
This is the regression testing lifecycle in most projects. Here's how to break the pattern.
Why Regression Suites Degrade
1. Tests Only Get Added, Never Removed
Every bug that makes it to production becomes a new regression test: "We need to make sure this never happens again." Over time, the suite grows without bound.
Nobody removes tests for features that were deprecated. Nobody removes duplicate tests that cover the same path. Nobody questions whether a test that was added 3 years ago for a one-time edge case still needs to be in the nightly run.
The result: a suite that contains hundreds of tests, many of which are testing behavior that no longer exists.
2. Tests Become Disconnected From Current Behavior
Product evolves. Test suite doesn't. A test written when the checkout flow had 3 steps now runs against a checkout flow with 5 steps. The test still passes (it's checking an intermediate state that still works), but it's no longer testing what you think it's testing.
[!WARNING] A passing test that doesn't cover the current behavior is worse than no test. It gives you false confidence. Audit your test suite regularly — not just for failures, but for relevance.
3. "Regression" Means Everything
In some teams, "regression testing" becomes shorthand for "everything we test." New feature? That's regression now. Bug fix? Regression. UI tweak? Add it to regression.
When regression means everything, it loses meaning. You can no longer tell at a glance what your regression suite is protecting.
4. No Tiered Strategy
All tests get treated equally. The test that validates the entire payment flow runs at the same frequency as the test that checks whether the settings icon has the right color. One is critical. One is not. But they both sit in the same queue.
Smart Regression Selection
Not every test needs to run for every change. The key is matching the regression scope to the risk of the change.
Tier 1: Always Run (Every Build / PR)
Critical paths that, if broken, are showstoppers:
- Login and authentication
- Core purchase / transaction flow
- Data save and load (any feature where data loss = user complaint)
- Crash-free startup
These run fast (keep them under 5 minutes total) and block merging if they fail.
Tier 2: Run Nightly or on Release Branches
The broader feature set that should work but isn't an immediate blocker if it needs investigation:
- Full user journeys (onboarding, settings changes, feature flows)
- Multi-platform smoke tests
- API contract tests
Tier 3: Run Before Release Only
Deep edge cases, performance checks, full multi-device matrix. These take time and are only worth running when you're close to shipping.
Change → Trigger Tier 1 immediately
Trigger Tier 2 on schedule
Trigger Tier 3 manually for releases[!TIP] This is called "risk-based test selection." You run more tests for riskier changes. A one-line copy change doesn't need a 6-hour full regression run. A payment flow refactor does.
Change-Based Regression Selection
For teams with good code coverage and impact analysis tools, you can go further: run only the tests that cover the code that changed.
# Example: get changed files in the last commit
git diff HEAD~1 --name-only
# Run only tests that cover those files
./gradlew testDebugUnitTest --tests "*.CheckoutViewModelTest" This requires investment in understanding which tests cover which code. But for large test suites, it dramatically reduces CI time while maintaining relevant coverage.
Keeping the Suite Healthy: Ongoing Practices
Regular Pruning
Every quarter, go through the regression suite and ask for each test:
- Is this testing current behavior?
- Is it covered by another test?
- Is it testing something that's actually risky to regress?
- When was this last updated?
Tests that fail these questions get removed or updated. This is not optional — it's maintenance, same as pruning dead code.
Flakiness Zero Tolerance
A flaky regression test is a liability. The team learns to ignore failures in that test, and a real regression will slip through alongside the flakes.
Policy: any test that fails without a code change gets investigated immediately. If it can't be made deterministic, it gets moved out of the automated suite.
Automation Balance
Not everything in regression needs to be automated. The 80/20 rule applies:
- Automate the 80% that is repetitive, deterministic, and high-value
- Keep 20% manual for the flows that require judgment, visual validation, or are too complex to automate reliably
Manual regression tests that run consistently and reliably are more valuable than automated tests that flake.
The Execution Strategy
For a typical Android app with a mix of features, our regression execution looks like this:
Per-PR (automated, ~3 minutes):
- Unit tests for changed modules
- API contract tests
- Tier 1 critical path smoke (login, core flow)Nightly (automated, ~30 minutes):
- Full Tier 2 on 2 priority devices
- API integration tests
- Performance benchmark baseline checkPre-release (mix of automated + manual, ~3 hours):
- Full suite automated on Tier 1 + Tier 2 devices
- Manual exploratory on new features
- Manual regression on recent bug-fix areas
- Full Tier 3 device matrix spot-checkThe total time investment is manageable because we're not running everything all the time. We're running the right tests at the right moments.
The Mindset Shift
Regression testing fails when teams treat it as a fixed artifact — "the regression suite" that you run as-is forever.
It works when teams treat it as a living system — maintained, curated, tiered, and continuously evaluated for relevance.
The goal is a regression suite that gives you confidence in 30 minutes, not anxiety over 6 hours. That requires discipline: adding tests thoughtfully, removing them when they no longer serve a purpose, and being honest about the difference between test coverage and test value.
A smaller, trusted regression suite is better than a large, ignored one. Always.
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
