AI in QA: What Actually Works (and What Is Just Hype)
AI in testing is real and useful — but not in the ways most vendors claim. Here's an honest breakdown of where AI genuinely improves QA workflows and where it still falls short.
On this page
- What Actually Works
- 1. Test Case Generation From Requirements
- 2. Log Analysis and Pattern Detection
- 3. Writing Boilerplate Test Code
- 4. Exploratory Testing Guidance
- What Doesn't Work (Yet)
- Full Autonomous Testing
- Zero-Human Test Maintenance
- Replacing QA Thinking
- The Practical AI+QA Toolkit in 2026
- How to Start Using AI in Your QA Workflow
Every testing tool vendor now claims their product is AI-powered. AI generates your test cases. AI fixes your flaky tests. AI replaces your QA team.
The reality is more nuanced and more interesting. AI has made specific parts of QA significantly better. It has changed almost nothing about other parts. Here's the honest breakdown.
What Actually Works
1. Test Case Generation From Requirements
Give an LLM a well-written user story or acceptance criteria and ask it to generate test cases. This works well.
Why it works: LLMs are good at generating variations, edge cases, and negative cases from a description. A human might write 10 test cases for a login flow; an LLM will generate 30, including cases the human didn't think of (empty username, password with special characters, network timeout during auth call, session expiry handling).
Practical workflow:
User story:
"As a user, I want to log in with email and password.
If credentials are invalid, show an error message.
If the account is locked after 5 failures, show the locked message."
Prompt to LLM:
"Generate a comprehensive test case list for this user story.
Include positive, negative, boundary, and edge cases.
Format: Test ID | Description | Steps | Expected Result"The output isn't perfect — some cases are redundant, some miss business context. But it's a strong starting point that takes 2 minutes instead of 45.
[!TIP] Use AI-generated test cases as a first draft, not a final artifact. Have a QA engineer review, prune duplicates, and add context the AI couldn't know (like internal business rules or known historical bugs).
2. Log Analysis and Pattern Detection
Production logs are noisy. Finding the signal — the repeating error pattern, the cascade failure, the slow query — takes time when done manually.
AI-assisted log analysis tools (and even raw LLMs given log excerpts) are good at:
- Identifying repeating error signatures
- Correlating errors with deployment times
- Summarizing what a 10,000-line crash log means in plain English
Input to LLM:
[Paste 200 lines of crash log]
"Summarize the root cause of this crash, which line is failing,
and what conditions likely triggered it."For production incidents, this cuts the time-to-understanding significantly.
3. Writing Boilerplate Test Code
Given an API specification or a function signature, an LLM generates solid test code scaffolding. The setup/teardown, the mock configuration, the assertion structure — all boilerplate that a developer can fill in with actual test logic.
// Prompt: "Write a unit test for this ViewModel function that
// calls the repository and emits a Success state"
// LLM output (good starting point):
@Test
fun `fetchDepartures emits Success state when repository returns data`() = runTest {
val mockData = listOf(Departure("T001", "10:30", "Platform 3"))
coEvery { repository.getDepartures(any()) } returns Result.success(mockData)
viewModel.fetchDepartures("BKK001")
assertThat(viewModel.uiState.value).isInstanceOf(UiState.Success::class.java)
assertThat((viewModel.uiState.value as UiState.Success).departures).isEqualTo(mockData)
}The generated test may have minor issues (wrong mock syntax, incorrect assertion class), but the structure is right and saves 10-15 minutes of setup per test.
4. Exploratory Testing Guidance
Stuck on what to test next? LLMs make surprisingly good "test oracle" assistants. Describe the feature and ask: "What are the high-risk areas? What would you test first?" The responses are often useful prompts for exploratory sessions.
What Doesn't Work (Yet)
Full Autonomous Testing
The vision: an AI agent tests your app end-to-end, finds bugs, and files reports — no human involved.
The reality: autonomous testing agents exist (Appium AI drivers, visual testing tools), but they're still brittle. They fail on dynamic content, non-standard UI patterns, and anything requiring contextual judgment about whether something feels right.
They're useful for specific, narrow, well-defined flows. They're not a replacement for exploratory testing or complex regression scenarios.
Zero-Human Test Maintenance
AI can suggest fixes for broken tests. It can explain why a test is failing. It cannot decide whether a broken test indicates a bug in the code or an outdated test that needs to be updated — that requires understanding the intended behavior, which requires human judgment.
[!WARNING] "AI-maintained test suites" is still mostly marketing. Test maintenance requires product context that AI tools don't have. The decisions — fix the code vs update the test — are judgment calls that humans still need to make.
Replacing QA Thinking
The hardest part of QA is not executing tests. It's deciding what to test, how much coverage is enough, and which risks are acceptable. These are strategic decisions that require understanding the product, the users, and the business.
AI is a tool that makes execution faster. It doesn't replace the strategic thinking behind a test plan.
The Practical AI+QA Toolkit in 2026
| Task | AI Tool / Approach | Maturity |
|---|---|---|
| Test case generation | LLM (ChatGPT, Claude) | ✅ Production-ready |
| Log analysis | LLM + log export | ✅ Production-ready |
| Test code scaffolding | LLM + GitHub Copilot | ✅ Production-ready |
| Visual regression | Percy, Applitools AI | ✅ Stable but needs tuning |
| Flaky test detection | ML-based CI tools | 🔶 Useful but imperfect |
| Autonomous E2E testing | Various AI agents | 🔶 Narrow use cases only |
| Full test suite generation | LLM agents | ❌ Not reliable yet |
| Zero-maintenance automation | All vendors claim this | ❌ Not real |
How to Start Using AI in Your QA Workflow
If you're not already using AI in QA, start with the two highest-value, lowest-risk applications:
Week 1: Use an LLM to generate test cases for your next feature. Compare them to what your team would have written. Note what the AI caught that you would have missed.
Week 2: Next time you're investigating a production issue, paste the crash log or error log into an LLM and ask for a plain-English explanation. Measure how much faster you get to root cause.
These two changes cost nothing and require no new tooling. The value is immediate and measurable.
From there, evaluate more specialized tools based on your specific pain points. But start with what you can do today.
AI doesn't change what good QA looks like. It changes how fast you can get there.
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
