April 9, 20266 min read

How to Use AI to Generate Test Cases (Practical Workflow)

AI can generate a comprehensive set of test cases in minutes — if you prompt it correctly. Here's the exact workflow, prompt templates, and refinement process that produces test cases you can actually use.

TestingAIAutomation

On this page

The Core Prompt Structure
Step-by-Step Workflow
Step 1: Prepare the Feature Input
Step 2: Generate the Initial Set
Step 3: Refine With Follow-Up Prompts
Step 4: Human Review and Pruning
Step 5: Add Execution Context
Real Example: Departure Board Feature
Prompts for Specific QA Areas
What AI Can't Generate (You Still Need Humans)

Writing test cases is one of the most time-consuming parts of QA — and one of the most mechanical. Given a feature description, you're following a pattern: positive cases, negative cases, boundary values, edge cases.

LLMs are very good at pattern-following. With the right prompts, they generate test cases faster than any human and catch edge cases that even experienced QA engineers miss.

Here's the exact workflow.

The Core Prompt Structure

A bare-bones prompt gets bare-bones output. A structured prompt with context gets usable output.

Minimal prompt (low quality):

code

Generate test cases for a login screen.

Structured prompt (high quality):

code

You are a senior QA engineer. Generate a comprehensive test case list for the following feature.

FEATURE: User Login
DESCRIPTION: Users can log in with email and password. Failed attempts are counted. 
After 5 failed attempts, the account is locked for 30 minutes.

ENVIRONMENT: Android app, targets Android 10+

INCLUDE:
- Positive test cases (valid inputs, happy path)
- Negative test cases (invalid inputs, error states)
- Boundary value cases (empty fields, max-length inputs)
- Edge cases (network failure, session timeout, concurrent login)
- Platform-specific cases (system keyboard behavior, accessibility)

FORMAT:
| TC-ID | Category | Description | Preconditions | Steps | Expected Result |

The difference in output quality is significant. The structured prompt gives the LLM the scope, the context, and the output format. You get a table you can paste directly into your test management tool.

Step-by-Step Workflow

Step 1: Prepare the Feature Input

The better your input, the better the output. Gather before prompting:

User story or acceptance criteria
API contract (if there's a backend component)
UI mockup description or link
Known business rules and constraints
Any historical bugs in similar features

The more context you give the LLM, the more relevant the test cases.

Step 2: Generate the Initial Set

Run the structured prompt. Expect 30-60 test cases for a medium-complexity feature. Don't be surprised by the volume — you'll prune it in the next step.

Step 3: Refine With Follow-Up Prompts

First output is rarely complete. Use follow-up prompts to go deeper:

code

"Focus on the account lockout behavior specifically. 
Generate test cases for the lockout counter reset, 
the 30-minute unlock behavior, and admin unlock scenarios."

code

"What are the security-related test cases I'm missing? 
Consider SQL injection, XSS in the email field, and session fixation."

code

"Generate test cases specific to Android — 
focus on keyboard behavior, app backgrounding during login, 
and behavior when the device loses connectivity mid-request."

Each follow-up adds test cases the initial run missed. Run 3-5 follow-ups on any important feature.

Step 4: Human Review and Pruning

The AI-generated set needs a QA engineer review. Look for:

Duplicates: LLMs repeat themselves. Remove tests that cover the same scenario in slightly different wording.
Irrelevant cases: Some generated tests don't apply to your specific implementation.
Missing business context: "Enter an invalid email" — but what counts as invalid in your system?
code
```
user@
```
— invalid?
code
```
user@domain
```
— valid or not? The LLM doesn't know your validation logic.
Priority gaps: Not all tests are equal. Flag which ones are critical path and which are edge cases.

[!TIP] A good rule of thumb: after pruning, you should have kept 60-70% of the AI-generated cases. If you're keeping 95%, you're not reviewing critically enough. If you're keeping 30%, your input context wasn't specific enough.

Step 5: Add Execution Context

AI generates the what — what to test, what to expect. Your team fills in the how — the exact steps, test data, environment setup.

code

AI Output:
| TC-042 | Negative | Login with expired account | Account is expired | Enter valid credentials | Error message shown |

Your addition:
Preconditions: Use test account test-expired@example.com (expired 2025-01-01)
Steps: 
  1. Launch app on clean install
  2. Tap "Sign In"
  3. Enter email: test-expired@example.com
  4. Enter password: Test@1234
  5. Tap "Log In"
Expected: Toast/dialog: "Your account has expired. Contact support at support@example.com"

Real Example: Departure Board Feature

Here's a real prompt and output excerpt for a digital signage departure board feature:

Prompt:

code

You are a senior QA engineer testing a digital signage app.

FEATURE: Real-time Departure Board
DESCRIPTION: Displays live train departure information. Refreshes every 60 seconds.
Shows train ID, destination, scheduled time, platform, and status (On Time / Delayed / Cancelled).
Data comes from a REST API. If the API fails, show last successful data with a "Data as of [timestamp]" indicator.

INCLUDE: functional, API failure, data edge cases, display rendering, timing behavior
FORMAT: | TC-ID | Category | Test Case | Expected Result |

Excerpt of output:

TC-ID	Category	Test Case	Expected Result
TC-001	Positive	Display loads with valid API response	All departure rows visible with correct data
TC-002	Positive	Data refreshes automatically at 60s interval	New data shown, timestamp updates
TC-003	Negative	API returns 500 error	Last valid data shown with "Data as of [time]" banner
TC-004	Negative	API returns empty departures array	"No departures scheduled" message shown
TC-005	Edge	API response has departure with null platform	Row displayed with platform field showing "TBC"
TC-006	Edge	Device loses Wi-Fi mid-session	Last cached data shown, reconnection attempted automatically
TC-007	Edge	60s refresh triggers while API is already in-flight	No duplicate requests, existing data remains until response received
TC-008	Display	Destination text exceeds display width	Text truncated with ellipsis, full text not cut off mid-character
TC-009	Display	"Cancelled" status displayed	Row shown in red with strikethrough on departure time

That's 9 useful test cases generated in under 30 seconds. A human writing these from scratch takes 15-20 minutes.

Prompts for Specific QA Areas

Security test cases:

code

"Generate security-focused test cases for [feature]. 
Focus on: input validation, authentication bypass, data exposure, 
injection attacks, and session handling."

Performance test cases:

code

"Generate performance test cases for [feature].
Scenarios: slow network (3G), large data sets (1000+ items), 
concurrent users, device with low RAM, battery saver mode active."

Accessibility test cases:

code

"Generate accessibility test cases for [feature] on Android.
Cover: TalkBack navigation order, content descriptions, 
touch target sizes, color contrast, and keyboard navigation."

Regression risk cases:

code

"Given that we changed [specific component], 
what are the regression risk areas? Generate test cases 
for the most likely regression scenarios."

What AI Can't Generate (You Still Need Humans)

Contextual judgment: "Is this the correct behavior for our business?" — requires product context
Historical knowledge: "This broke before because of X" — requires institutional memory
UX evaluation: "Does this feel right to a user?" — requires human judgment
Risk prioritization: "Which of these 50 test cases matter most for this release?" — requires product understanding

AI generates the raw material. QA engineers turn it into a test strategy. Both are necessary. Neither replaces the other.

The workflow above cuts test case writing time by 60-70% in practice. That time goes back to actual testing — exploratory sessions, deeper investigation, better coverage of the cases that matter.

That's the real value: not replacing QA work, but making QA engineers faster at the part that's mechanical so they can spend more time on the part that requires judgment.

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

GitHub Play Store

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

RSS Feed Telegram