February 21, 20265 min read

Flaky Tests: The Silent Killer of Your CI Pipeline

A test that fails randomly is worse than no test at all. It trains your team to ignore failures, destroys trust in the suite, and hides real bugs. Here's how to identify, fix, and prevent flakiness.

TestingAutomationCI/CD

On this page

What Makes a Test Flaky
1. Timing and Async Issues
2. Test Ordering Dependencies
3. Environmental Differences
4. Network and External Dependencies
5. Date and Time Dependencies
The Cost of Ignoring Flakiness
How to Find Flaky Tests
Fixing Flaky Tests
For Timing Issues
For State Dependencies
For Parallel Execution Issues
For Time-Dependent Tests
When You Can't Fix It Immediately
Prevention: Write Less Flaky Tests From the Start
Takeaways

One flaky test can destroy a test suite.

Not because it's one test. Because the moment engineers start ignoring failures — "oh that's just the flaky Selenium test" — every failure becomes ignorable. Real bugs slip through. The suite becomes theater.

Flakiness is a trust problem. Fix it or remove the test.

What Makes a Test Flaky

A flaky test passes sometimes and fails other times without any code change. The causes fall into a few categories:

1. Timing and Async Issues

The most common cause. The test assumes an element is present before it actually loads, or checks a value before an async operation completes.

kotlin

// Flaky — assumes the list is populated immediately
val items = viewModel.items.value
assertEquals(3, items.size)

// Better — wait for the state to settle
val items = viewModel.items.drop(1).first() // skips initial empty state
assertEquals(3, items.size)

In UI tests, clicking an element before it's fully rendered causes failures that pass on retry.

2. Test Ordering Dependencies

Tests that rely on shared state set up by a previous test. If the previous test fails or runs in a different order, this test fails.

code

Test A: Creates a user (passes)
Test B: Expects that user to exist (passes if A ran first, fails otherwise)

Every test must be independent. Set up what you need, tear down after.

3. Environmental Differences

Tests pass locally, fail in CI (different timezone, locale, screen resolution)
Tests fail on CI agent 1 but pass on agent 2 (different OS version, dependencies)
Tests fail when run in parallel but pass sequentially (shared database, shared files)

4. Network and External Dependencies

Tests that call real APIs, real databases, or real external services fail when those services are slow or unavailable. Mock external dependencies in unit tests. Use dedicated test environments for integration tests.

5. Date and Time Dependencies

kotlin

// Flaky — what happens at 11:59:58 PM?
val today = LocalDate.now()
assertEquals(today, schedule.nextRunDate)

Tests that depend on the current time fail at boundary conditions. Inject time as a dependency and control it in tests.

The Cost of Ignoring Flakiness

Teams often accept flakiness because "it usually passes on retry." This is how the problem compounds:

One flaky test → engineers learn to retry failures
More flaky tests → retry becomes the default response to all failures
Real failure occurs → team assumes it's flaky, retries → bug ships
Trust in suite collapses → team stops caring about test results entirely

[!WARNING] A test suite with 10% flakiness has a very high probability of false greens on any given run. At that point, your CI pipeline is not a safety net.

How to Find Flaky Tests

Run tests multiple times in a row. A test that fails 1-in-10 runs will show up quickly if you run the suite 20 times.

bash

# Run the suite 10 times and count failures
for i in {1..10}; do
  ./gradlew test 2>&1 | grep -E "(PASS|FAIL)" >> results.txt
done
grep FAIL results.txt | sort | uniq -c | sort -rn

Track failure rates in CI. Most modern CI systems let you export test results as JUnit XML. Build a dashboard that shows which tests fail most often.

Look at retry patterns. If your suite has test retries enabled and certain tests always retry, those are flaky tests in disguise.

Fixing Flaky Tests

For Timing Issues

Replace arbitrary sleeps with proper waits:

kotlin

// Bad
Thread.sleep(2000)
checkElement.click()

// Better — explicit wait with timeout
waitUntilVisible(checkElement, timeout = 5.seconds)
checkElement.click()

For State Dependencies

Each test sets up its own data and cleans up after:

kotlin

@Before
fun setup() {
    db.insertTestUser(userId = "test-user-123")
}

@After
fun teardown() {
    db.deleteUser(userId = "test-user-123")
}

For Parallel Execution Issues

Isolate database state per test using transactions that roll back:

kotlin

@Transactional
@Test
fun `should update user profile`() {
    // All DB changes are rolled back after this test
}

For Time-Dependent Tests

Inject a clock interface:

kotlin

interface Clock {
    fun now(): LocalDateTime
}

// In production
class SystemClock : Clock {
    override fun now() = LocalDateTime.now()
}

// In tests
class FixedClock(private val fixedTime: LocalDateTime) : Clock {
    override fun now() = fixedTime
}

When You Can't Fix It Immediately

If a test is flaky and you don't have bandwidth to fix it now:

Quarantine it — move it to a separate suite that doesn't block CI
Track it — create a bug/ticket with reproduction steps
Set a deadline — flaky tests in quarantine for more than 2 weeks get deleted
Delete it — a quarantined test that never gets fixed is dead weight

Never leave a known flaky test in the main test suite. It will corrupt the team's trust in everything else.

Prevention: Write Less Flaky Tests From the Start

Never use
code
```
Thread.sleep()
```
— use explicit waits
Never share state between tests — each test is isolated
Never call real external services in unit tests — mock them
Inject time, random number generators, and file system as dependencies
Run tests in parallel from day one — surface parallelism issues early

Takeaways

Flakiness is a trust problem — one ignored failure teaches the team to ignore all failures
Timing issues, state dependencies, and environment differences are the top causes
Track failure rates — you can't fix what you can't measure
Quarantine flaky tests immediately, then fix or delete
Write isolated, deterministic tests from the start — easier than fixing flakiness later