How to Use AI to Generate Test Cases (Practical Workflow)
AI can generate a comprehensive set of test cases in minutes — if you prompt it correctly. Here's the exact workflow, prompt templates, and refinement process that produces test cases you can actually use.
On this page
- The Core Prompt Structure
- Step-by-Step Workflow
- Step 1: Prepare the Feature Input
- Step 2: Generate the Initial Set
- Step 3: Refine With Follow-Up Prompts
- Step 4: Human Review and Pruning
- Step 5: Add Execution Context
- Real Example: Departure Board Feature
- Prompts for Specific QA Areas
- What AI Can't Generate (You Still Need Humans)
Writing test cases is one of the most time-consuming parts of QA — and one of the most mechanical. Given a feature description, you're following a pattern: positive cases, negative cases, boundary values, edge cases.
LLMs are very good at pattern-following. With the right prompts, they generate test cases faster than any human and catch edge cases that even experienced QA engineers miss.
Here's the exact workflow.
The Core Prompt Structure
A bare-bones prompt gets bare-bones output. A structured prompt with context gets usable output.
Minimal prompt (low quality):
Generate test cases for a login screen.Structured prompt (high quality):
You are a senior QA engineer. Generate a comprehensive test case list for the following feature.
FEATURE: User Login
DESCRIPTION: Users can log in with email and password. Failed attempts are counted.
After 5 failed attempts, the account is locked for 30 minutes.
ENVIRONMENT: Android app, targets Android 10+
INCLUDE:
- Positive test cases (valid inputs, happy path)
- Negative test cases (invalid inputs, error states)
- Boundary value cases (empty fields, max-length inputs)
- Edge cases (network failure, session timeout, concurrent login)
- Platform-specific cases (system keyboard behavior, accessibility)
FORMAT:
| TC-ID | Category | Description | Preconditions | Steps | Expected Result |The difference in output quality is significant. The structured prompt gives the LLM the scope, the context, and the output format. You get a table you can paste directly into your test management tool.
Step-by-Step Workflow
Step 1: Prepare the Feature Input
The better your input, the better the output. Gather before prompting:
- User story or acceptance criteria
- API contract (if there's a backend component)
- UI mockup description or link
- Known business rules and constraints
- Any historical bugs in similar features
The more context you give the LLM, the more relevant the test cases.
Step 2: Generate the Initial Set
Run the structured prompt. Expect 30-60 test cases for a medium-complexity feature. Don't be surprised by the volume — you'll prune it in the next step.
Step 3: Refine With Follow-Up Prompts
First output is rarely complete. Use follow-up prompts to go deeper:
"Focus on the account lockout behavior specifically.
Generate test cases for the lockout counter reset,
the 30-minute unlock behavior, and admin unlock scenarios.""What are the security-related test cases I'm missing?
Consider SQL injection, XSS in the email field, and session fixation.""Generate test cases specific to Android —
focus on keyboard behavior, app backgrounding during login,
and behavior when the device loses connectivity mid-request."Each follow-up adds test cases the initial run missed. Run 3-5 follow-ups on any important feature.
Step 4: Human Review and Pruning
The AI-generated set needs a QA engineer review. Look for:
- Duplicates: LLMs repeat themselves. Remove tests that cover the same scenario in slightly different wording.
- Irrelevant cases: Some generated tests don't apply to your specific implementation.
- Missing business context: "Enter an invalid email" — but what counts as invalid in your system? — invalid?code
user@— valid or not? The LLM doesn't know your validation logic.codeuser@domain - Priority gaps: Not all tests are equal. Flag which ones are critical path and which are edge cases.
[!TIP] A good rule of thumb: after pruning, you should have kept 60-70% of the AI-generated cases. If you're keeping 95%, you're not reviewing critically enough. If you're keeping 30%, your input context wasn't specific enough.
Step 5: Add Execution Context
AI generates the what — what to test, what to expect. Your team fills in the how — the exact steps, test data, environment setup.
AI Output:
| TC-042 | Negative | Login with expired account | Account is expired | Enter valid credentials | Error message shown |
Your addition:
Preconditions: Use test account test-expired@example.com (expired 2025-01-01)
Steps:
1. Launch app on clean install
2. Tap "Sign In"
3. Enter email: test-expired@example.com
4. Enter password: Test@1234
5. Tap "Log In"
Expected: Toast/dialog: "Your account has expired. Contact support at support@example.com"Real Example: Departure Board Feature
Here's a real prompt and output excerpt for a digital signage departure board feature:
Prompt:
You are a senior QA engineer testing a digital signage app.
FEATURE: Real-time Departure Board
DESCRIPTION: Displays live train departure information. Refreshes every 60 seconds.
Shows train ID, destination, scheduled time, platform, and status (On Time / Delayed / Cancelled).
Data comes from a REST API. If the API fails, show last successful data with a "Data as of [timestamp]" indicator.
INCLUDE: functional, API failure, data edge cases, display rendering, timing behavior
FORMAT: | TC-ID | Category | Test Case | Expected Result |Excerpt of output:
| TC-ID | Category | Test Case | Expected Result |
|---|---|---|---|
| TC-001 | Positive | Display loads with valid API response | All departure rows visible with correct data |
| TC-002 | Positive | Data refreshes automatically at 60s interval | New data shown, timestamp updates |
| TC-003 | Negative | API returns 500 error | Last valid data shown with "Data as of [time]" banner |
| TC-004 | Negative | API returns empty departures array | "No departures scheduled" message shown |
| TC-005 | Edge | API response has departure with null platform | Row displayed with platform field showing "TBC" |
| TC-006 | Edge | Device loses Wi-Fi mid-session | Last cached data shown, reconnection attempted automatically |
| TC-007 | Edge | 60s refresh triggers while API is already in-flight | No duplicate requests, existing data remains until response received |
| TC-008 | Display | Destination text exceeds display width | Text truncated with ellipsis, full text not cut off mid-character |
| TC-009 | Display | "Cancelled" status displayed | Row shown in red with strikethrough on departure time |
That's 9 useful test cases generated in under 30 seconds. A human writing these from scratch takes 15-20 minutes.
Prompts for Specific QA Areas
Security test cases:
"Generate security-focused test cases for [feature].
Focus on: input validation, authentication bypass, data exposure,
injection attacks, and session handling."Performance test cases:
"Generate performance test cases for [feature].
Scenarios: slow network (3G), large data sets (1000+ items),
concurrent users, device with low RAM, battery saver mode active."Accessibility test cases:
"Generate accessibility test cases for [feature] on Android.
Cover: TalkBack navigation order, content descriptions,
touch target sizes, color contrast, and keyboard navigation."Regression risk cases:
"Given that we changed [specific component],
what are the regression risk areas? Generate test cases
for the most likely regression scenarios."What AI Can't Generate (You Still Need Humans)
- Contextual judgment: "Is this the correct behavior for our business?" — requires product context
- Historical knowledge: "This broke before because of X" — requires institutional memory
- UX evaluation: "Does this feel right to a user?" — requires human judgment
- Risk prioritization: "Which of these 50 test cases matter most for this release?" — requires product understanding
AI generates the raw material. QA engineers turn it into a test strategy. Both are necessary. Neither replaces the other.
The workflow above cuts test case writing time by 60-70% in practice. That time goes back to actual testing — exploratory sessions, deeper investigation, better coverage of the cases that matter.
That's the real value: not replacing QA work, but making QA engineers faster at the part that's mechanical so they can spend more time on the part that requires judgment.
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
