April 20, 20265 min read

'Works on My Machine' Is Still a Real Problem in 2026

Containers, CI/CD, and device farms were supposed to kill the 'works on my machine' problem. They didn't. Here's why environment mismatch still causes failures and what actually fixes it.

EngineeringAndroidMobileCI/CD

On this page

The New Forms of the Old Problem
Form 1: OEM Customization
Form 2: OS Version Behavior Differences
Form 3: Network Environment Mismatch
Form 4: Data State Differences
Form 5: Third-Party Service Versions
Why Traditional Fixes Don't Fully Solve It
What Actually Reduces It
1. Instrument Your Production Environment
2. Maintain a Representative Device Lab
3. Test With Production-Like Data
4. Chaos Engineering for Mobile
The Honest Bottom Line

"Works on my machine" became a joke. Then we got Docker, CI pipelines, and device clouds. The joke was supposed to die.

It didn't.

In 2026, environment mismatch is still one of the most common causes of production failures. The tools changed. The problem didn't. Here's why.

The New Forms of the Old Problem

Form 1: OEM Customization

You test on a Pixel 7. Your user is on a Redmi Note 12.

Xiaomi's MIUI ships with aggressive battery optimization that kills background services after 10 minutes of screen-off time. Your background sync job that runs every 15 minutes works perfectly in your test. It silently stops working for 40% of your user base.

This isn't a container problem. CI can't catch it. Only testing on MIUI hardware — with the default battery settings that real users have — catches it.

Form 2: OS Version Behavior Differences

Android 12 introduced the exact alarm restriction. Apps targeting Android 12+ need

code

USE_EXACT_ALARM

code

SCHEDULE_EXACT_ALARM

permission for scheduled tasks. Apps that relied on

code

AlarmManager.setExact()

without the permission started having their alarms batched and delayed.

Your CI runs on Android 11. Your developers test on Android 14 emulators. The Android 12 behavior is tested by... nobody.

[!WARNING] The "works on my machine" problem in Android is frequently "works on the Android version I happen to have." Maintaining a test matrix that covers the last 3 major Android versions is not optional — it's the minimum.

Form 3: Network Environment Mismatch

Your test environment has a clean 100Mbps connection. Your production users are on:

Cellular data with intermittent signal
Hotel Wi-Fi shared with 200 devices
Corporate networks with transparent proxies that modify or cache HTTP responses
Networks with MTU mismatches that fragment large packets

An HTTP client that never retries on failure works fine in development. It silently fails for a significant slice of production users.

Form 4: Data State Differences

Your test starts with a clean database. Your user has 3 years of accumulated data, legacy records from a data migration, and an account that was manually modified by support after a previous incident.

The query that runs in 200ms on clean data runs in 8 seconds on the production user's data. Your performance tests didn't catch it because they didn't match production data volume and shape.

Form 5: Third-Party Service Versions

Your app integrates with a third-party SDK. You test against SDK version 3.2.1. Some users are still running your app from 6 months ago with SDK version 2.8.0 embedded. The SDK made a breaking API change in 3.0.0 that you didn't account for in your migration path.

Why Traditional Fixes Don't Fully Solve It

Docker/containers: Great for backend services. No help for mobile apps or hardware-dependent software.

CI/CD pipelines: Consistent environments for your build and test process. Still limited by which devices and OS versions you include in the pipeline.

Device farms (Firebase Test Lab, BrowserStack): Real progress. But:

Default farm configurations don't match your users' device settings
Battery optimization, accessibility settings, and locale are all default — not representative of real user configurations
Limited ability to simulate specific network conditions
No way to test with your users' actual data state

Feature flags: Control rollout, enable quick rollback. Don't prevent the environment mismatch — they just limit blast radius when it occurs.

What Actually Reduces It

1. Instrument Your Production Environment

The only environment that exactly matches production is production. Instrument it:

Log OS version, device model, and OEM build for every crash
Track slow operations by device category (budget vs flagship)
Tag errors with network type (Wi-Fi, cellular, unknown)

When "works on my machine" happens in production, this data tells you whose machine it doesn't work on. That's the starting point for reproduction.

kotlin

// Add to your crash reporting initialization
FirebaseCrashlytics.getInstance().apply {
    setCustomKey("device_model", Build.MODEL)
    setCustomKey("os_version", Build.VERSION.RELEASE)
    setCustomKey("oem", Build.MANUFACTURER)
    setCustomKey("sdk_int", Build.VERSION.SDK_INT)
}

2. Maintain a Representative Device Lab

Not a comprehensive device lab — a representative one. Cover:

The most common device in your crash reports (usually a budget Android)
The OEM with most reported issues (often Samsung or Xiaomi)
The oldest OS version you support
The newest OS version released in the last 6 months

Test these devices with realistic settings: battery optimization enabled, default storage with some apps installed, mobile data instead of Wi-Fi.

3. Test With Production-Like Data

For any feature that involves data queries, run performance tests against a production data snapshot (anonymized). This catches the queries that are fast on 100 rows and slow on 10,000.

4. Chaos Engineering for Mobile

Simulate the network conditions your users actually experience:

Android: Developer options → Simulate cellular network
Charles Proxy / mitmproxy: Throttle bandwidth, inject latency, simulate packet loss
Network emulation in tests: Use OkHttp's MockWebServer with configurable delays

kotlin

// Simulate slow network in tests
val server = MockWebServer()
server.enqueue(MockResponse()
    .setBody(departuresJson)
    .setBodyDelay(3, TimeUnit.SECONDS) // 3 second delay
    .setSocketPolicy(SocketPolicy.STALL_SOCKET_AT_START)) // stall on some calls

The Honest Bottom Line

"Works on my machine" can be reduced but not eliminated. The diversity of environments where real users run your software is too large to fully replicate in any test setup.

The goal isn't zero environment mismatch failures. It's:

Fast detection when environment-specific failures occur in production
Enough diversity in your test environments to catch the most common mismatches before production
Fast root-cause identification when users report failures (because you have the right instrumentation)

Teams that achieve this don't celebrate "works on my machine" as an excuse. They treat it as diagnostic information: "It works here, fails there — what's different between these environments?" That question, answered systematically, is how you close the gap.

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

GitHub Play Store

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

RSS Feed Telegram

How to Build a Cross-Platform Player (Hard Truths)

6 min read

AndroidEngineering

Firebase App Distribution: Beta Testing Without the Play Store

4 min read

FirebaseBeta Testing

Setting Up CI/CD for Android with GitHub Actions

3 min read

AndroidCI/CD

Related Apps

MyFamilyTracker

Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.

Get it onGoogle Play Details

Building something? Available for Android dev and QA consulting.

Work with me