April 8, 20265 min read

AI in QA: What Actually Works (and What Is Just Hype)

AI in testing is real and useful — but not in the ways most vendors claim. Here's an honest breakdown of where AI genuinely improves QA workflows and where it still falls short.

TestingAIAutomation

On this page

What Actually Works
1. Test Case Generation From Requirements
2. Log Analysis and Pattern Detection
3. Writing Boilerplate Test Code
4. Exploratory Testing Guidance
What Doesn't Work (Yet)
Full Autonomous Testing
Zero-Human Test Maintenance
Replacing QA Thinking
The Practical AI+QA Toolkit in 2026
How to Start Using AI in Your QA Workflow

Every testing tool vendor now claims their product is AI-powered. AI generates your test cases. AI fixes your flaky tests. AI replaces your QA team.

The reality is more nuanced and more interesting. AI has made specific parts of QA significantly better. It has changed almost nothing about other parts. Here's the honest breakdown.

What Actually Works

1. Test Case Generation From Requirements

Give an LLM a well-written user story or acceptance criteria and ask it to generate test cases. This works well.

Why it works: LLMs are good at generating variations, edge cases, and negative cases from a description. A human might write 10 test cases for a login flow; an LLM will generate 30, including cases the human didn't think of (empty username, password with special characters, network timeout during auth call, session expiry handling).

Practical workflow:

code

User story:
"As a user, I want to log in with email and password. 
If credentials are invalid, show an error message. 
If the account is locked after 5 failures, show the locked message."

Prompt to LLM:
"Generate a comprehensive test case list for this user story.
Include positive, negative, boundary, and edge cases.
Format: Test ID | Description | Steps | Expected Result"

The output isn't perfect — some cases are redundant, some miss business context. But it's a strong starting point that takes 2 minutes instead of 45.

[!TIP] Use AI-generated test cases as a first draft, not a final artifact. Have a QA engineer review, prune duplicates, and add context the AI couldn't know (like internal business rules or known historical bugs).

2. Log Analysis and Pattern Detection

Production logs are noisy. Finding the signal — the repeating error pattern, the cascade failure, the slow query — takes time when done manually.

AI-assisted log analysis tools (and even raw LLMs given log excerpts) are good at:

Identifying repeating error signatures
Correlating errors with deployment times
Summarizing what a 10,000-line crash log means in plain English

code

Input to LLM:
[Paste 200 lines of crash log]

"Summarize the root cause of this crash, which line is failing, 
and what conditions likely triggered it."

For production incidents, this cuts the time-to-understanding significantly.

3. Writing Boilerplate Test Code

Given an API specification or a function signature, an LLM generates solid test code scaffolding. The setup/teardown, the mock configuration, the assertion structure — all boilerplate that a developer can fill in with actual test logic.

kotlin

// Prompt: "Write a unit test for this ViewModel function that 
// calls the repository and emits a Success state"

// LLM output (good starting point):
@Test
fun `fetchDepartures emits Success state when repository returns data`() = runTest {
    val mockData = listOf(Departure("T001", "10:30", "Platform 3"))
    coEvery { repository.getDepartures(any()) } returns Result.success(mockData)
    
    viewModel.fetchDepartures("BKK001")
    
    assertThat(viewModel.uiState.value).isInstanceOf(UiState.Success::class.java)
    assertThat((viewModel.uiState.value as UiState.Success).departures).isEqualTo(mockData)
}

The generated test may have minor issues (wrong mock syntax, incorrect assertion class), but the structure is right and saves 10-15 minutes of setup per test.

4. Exploratory Testing Guidance

Stuck on what to test next? LLMs make surprisingly good "test oracle" assistants. Describe the feature and ask: "What are the high-risk areas? What would you test first?" The responses are often useful prompts for exploratory sessions.

What Doesn't Work (Yet)

Full Autonomous Testing

The vision: an AI agent tests your app end-to-end, finds bugs, and files reports — no human involved.

The reality: autonomous testing agents exist (Appium AI drivers, visual testing tools), but they're still brittle. They fail on dynamic content, non-standard UI patterns, and anything requiring contextual judgment about whether something feels right.

They're useful for specific, narrow, well-defined flows. They're not a replacement for exploratory testing or complex regression scenarios.

Zero-Human Test Maintenance

AI can suggest fixes for broken tests. It can explain why a test is failing. It cannot decide whether a broken test indicates a bug in the code or an outdated test that needs to be updated — that requires understanding the intended behavior, which requires human judgment.

[!WARNING] "AI-maintained test suites" is still mostly marketing. Test maintenance requires product context that AI tools don't have. The decisions — fix the code vs update the test — are judgment calls that humans still need to make.

Replacing QA Thinking

The hardest part of QA is not executing tests. It's deciding what to test, how much coverage is enough, and which risks are acceptable. These are strategic decisions that require understanding the product, the users, and the business.

AI is a tool that makes execution faster. It doesn't replace the strategic thinking behind a test plan.

The Practical AI+QA Toolkit in 2026

Task	AI Tool / Approach	Maturity
Test case generation	LLM (ChatGPT, Claude)	✅ Production-ready
Log analysis	LLM + log export	✅ Production-ready
Test code scaffolding	LLM + GitHub Copilot	✅ Production-ready
Visual regression	Percy, Applitools AI	✅ Stable but needs tuning
Flaky test detection	ML-based CI tools	🔶 Useful but imperfect
Autonomous E2E testing	Various AI agents	🔶 Narrow use cases only
Full test suite generation	LLM agents	❌ Not reliable yet
Zero-maintenance automation	All vendors claim this	❌ Not real

How to Start Using AI in Your QA Workflow

If you're not already using AI in QA, start with the two highest-value, lowest-risk applications:

Week 1: Use an LLM to generate test cases for your next feature. Compare them to what your team would have written. Note what the AI caught that you would have missed.

Week 2: Next time you're investigating a production issue, paste the crash log or error log into an LLM and ask for a plain-English explanation. Measure how much faster you get to root cause.

These two changes cost nothing and require no new tooling. The value is immediate and measurable.

From there, evaluate more specialized tools based on your specific pain points. But start with what you can do today.

AI doesn't change what good QA looks like. It changes how fast you can get there.

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

GitHub Play Store

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

RSS Feed Telegram

How to Use AI to Generate Test Cases (Practical Workflow)

6 min read

TestingAI

Writing Claude Skills That Actually Work (With Examples)

4 min read

AndroidAI

Building A Local-First AI Memory Agent In Rust

5 min read

RustAI

Building something? Available for Android dev and QA consulting.

Work with me