Flaky Tests: The Silent Killer of Your CI Pipeline
A test that fails randomly is worse than no test at all. It trains your team to ignore failures, destroys trust in the suite, and hides real bugs. Here's how to identify, fix, and prevent flakiness.
On this page
- What Makes a Test Flaky
- 1. Timing and Async Issues
- 2. Test Ordering Dependencies
- 3. Environmental Differences
- 4. Network and External Dependencies
- 5. Date and Time Dependencies
- The Cost of Ignoring Flakiness
- How to Find Flaky Tests
- Fixing Flaky Tests
- For Timing Issues
- For State Dependencies
- For Parallel Execution Issues
- For Time-Dependent Tests
- When You Can't Fix It Immediately
- Prevention: Write Less Flaky Tests From the Start
- Takeaways
One flaky test can destroy a test suite.
Not because it's one test. Because the moment engineers start ignoring failures — "oh that's just the flaky Selenium test" — every failure becomes ignorable. Real bugs slip through. The suite becomes theater.
Flakiness is a trust problem. Fix it or remove the test.
What Makes a Test Flaky
A flaky test passes sometimes and fails other times without any code change. The causes fall into a few categories:
1. Timing and Async Issues
The most common cause. The test assumes an element is present before it actually loads, or checks a value before an async operation completes.
// Flaky — assumes the list is populated immediately
val items = viewModel.items.value
assertEquals(3, items.size)
// Better — wait for the state to settle
val items = viewModel.items.drop(1).first() // skips initial empty state
assertEquals(3, items.size)In UI tests, clicking an element before it's fully rendered causes failures that pass on retry.
2. Test Ordering Dependencies
Tests that rely on shared state set up by a previous test. If the previous test fails or runs in a different order, this test fails.
Test A: Creates a user (passes)
Test B: Expects that user to exist (passes if A ran first, fails otherwise)Every test must be independent. Set up what you need, tear down after.
3. Environmental Differences
- Tests pass locally, fail in CI (different timezone, locale, screen resolution)
- Tests fail on CI agent 1 but pass on agent 2 (different OS version, dependencies)
- Tests fail when run in parallel but pass sequentially (shared database, shared files)
4. Network and External Dependencies
Tests that call real APIs, real databases, or real external services fail when those services are slow or unavailable. Mock external dependencies in unit tests. Use dedicated test environments for integration tests.
5. Date and Time Dependencies
// Flaky — what happens at 11:59:58 PM?
val today = LocalDate.now()
assertEquals(today, schedule.nextRunDate)Tests that depend on the current time fail at boundary conditions. Inject time as a dependency and control it in tests.
The Cost of Ignoring Flakiness
Teams often accept flakiness because "it usually passes on retry." This is how the problem compounds:
- One flaky test → engineers learn to retry failures
- More flaky tests → retry becomes the default response to all failures
- Real failure occurs → team assumes it's flaky, retries → bug ships
- Trust in suite collapses → team stops caring about test results entirely
[!WARNING] A test suite with 10% flakiness has a very high probability of false greens on any given run. At that point, your CI pipeline is not a safety net.
How to Find Flaky Tests
Run tests multiple times in a row. A test that fails 1-in-10 runs will show up quickly if you run the suite 20 times.
# Run the suite 10 times and count failures
for i in {1..10}; do
./gradlew test 2>&1 | grep -E "(PASS|FAIL)" >> results.txt
done
grep FAIL results.txt | sort | uniq -c | sort -rnTrack failure rates in CI. Most modern CI systems let you export test results as JUnit XML. Build a dashboard that shows which tests fail most often.
Look at retry patterns. If your suite has test retries enabled and certain tests always retry, those are flaky tests in disguise.
Fixing Flaky Tests
For Timing Issues
Replace arbitrary sleeps with proper waits:
// Bad
Thread.sleep(2000)
checkElement.click()
// Better — explicit wait with timeout
waitUntilVisible(checkElement, timeout = 5.seconds)
checkElement.click()For State Dependencies
Each test sets up its own data and cleans up after:
@Before
fun setup() {
db.insertTestUser(userId = "test-user-123")
}
@After
fun teardown() {
db.deleteUser(userId = "test-user-123")
}For Parallel Execution Issues
Isolate database state per test using transactions that roll back:
@Transactional
@Test
fun `should update user profile`() {
// All DB changes are rolled back after this test
}For Time-Dependent Tests
Inject a clock interface:
interface Clock {
fun now(): LocalDateTime
}
// In production
class SystemClock : Clock {
override fun now() = LocalDateTime.now()
}
// In tests
class FixedClock(private val fixedTime: LocalDateTime) : Clock {
override fun now() = fixedTime
}When You Can't Fix It Immediately
If a test is flaky and you don't have bandwidth to fix it now:
- Quarantine it — move it to a separate suite that doesn't block CI
- Track it — create a bug/ticket with reproduction steps
- Set a deadline — flaky tests in quarantine for more than 2 weeks get deleted
- Delete it — a quarantined test that never gets fixed is dead weight
Never leave a known flaky test in the main test suite. It will corrupt the team's trust in everything else.
Prevention: Write Less Flaky Tests From the Start
- Never use — use explicit waitscode
Thread.sleep() - Never share state between tests — each test is isolated
- Never call real external services in unit tests — mock them
- Inject time, random number generators, and file system as dependencies
- Run tests in parallel from day one — surface parallelism issues early
Takeaways
- Flakiness is a trust problem — one ignored failure teaches the team to ignore all failures
- Timing issues, state dependencies, and environment differences are the top causes
- Track failure rates — you can't fix what you can't measure
- Quarantine flaky tests immediately, then fix or delete
- Write isolated, deterministic tests from the start — easier than fixing flakiness later
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
