Skip to content
All posts
April 20, 20265 min read

'Works on My Machine' Is Still a Real Problem in 2026

Containers, CI/CD, and device farms were supposed to kill the 'works on my machine' problem. They didn't. Here's why environment mismatch still causes failures and what actually fixes it.

EngineeringAndroidMobileCI/CD
Share:

"Works on my machine" became a joke. Then we got Docker, CI pipelines, and device clouds. The joke was supposed to die.

It didn't.

In 2026, environment mismatch is still one of the most common causes of production failures. The tools changed. The problem didn't. Here's why.


The New Forms of the Old Problem

Form 1: OEM Customization

You test on a Pixel 7. Your user is on a Redmi Note 12.

Xiaomi's MIUI ships with aggressive battery optimization that kills background services after 10 minutes of screen-off time. Your background sync job that runs every 15 minutes works perfectly in your test. It silently stops working for 40% of your user base.

This isn't a container problem. CI can't catch it. Only testing on MIUI hardware — with the default battery settings that real users have — catches it.

Form 2: OS Version Behavior Differences

Android 12 introduced the exact alarm restriction. Apps targeting Android 12+ need

code
USE_EXACT_ALARM
or
code
SCHEDULE_EXACT_ALARM
permission for scheduled tasks. Apps that relied on
code
AlarmManager.setExact()
without the permission started having their alarms batched and delayed.

Your CI runs on Android 11. Your developers test on Android 14 emulators. The Android 12 behavior is tested by... nobody.

[!WARNING] The "works on my machine" problem in Android is frequently "works on the Android version I happen to have." Maintaining a test matrix that covers the last 3 major Android versions is not optional — it's the minimum.

Form 3: Network Environment Mismatch

Your test environment has a clean 100Mbps connection. Your production users are on:

  • Cellular data with intermittent signal
  • Hotel Wi-Fi shared with 200 devices
  • Corporate networks with transparent proxies that modify or cache HTTP responses
  • Networks with MTU mismatches that fragment large packets

An HTTP client that never retries on failure works fine in development. It silently fails for a significant slice of production users.

Form 4: Data State Differences

Your test starts with a clean database. Your user has 3 years of accumulated data, legacy records from a data migration, and an account that was manually modified by support after a previous incident.

The query that runs in 200ms on clean data runs in 8 seconds on the production user's data. Your performance tests didn't catch it because they didn't match production data volume and shape.

Form 5: Third-Party Service Versions

Your app integrates with a third-party SDK. You test against SDK version 3.2.1. Some users are still running your app from 6 months ago with SDK version 2.8.0 embedded. The SDK made a breaking API change in 3.0.0 that you didn't account for in your migration path.


Why Traditional Fixes Don't Fully Solve It

Docker/containers: Great for backend services. No help for mobile apps or hardware-dependent software.

CI/CD pipelines: Consistent environments for your build and test process. Still limited by which devices and OS versions you include in the pipeline.

Device farms (Firebase Test Lab, BrowserStack): Real progress. But:

  • Default farm configurations don't match your users' device settings
  • Battery optimization, accessibility settings, and locale are all default — not representative of real user configurations
  • Limited ability to simulate specific network conditions
  • No way to test with your users' actual data state

Feature flags: Control rollout, enable quick rollback. Don't prevent the environment mismatch — they just limit blast radius when it occurs.


What Actually Reduces It

1. Instrument Your Production Environment

The only environment that exactly matches production is production. Instrument it:

  • Log OS version, device model, and OEM build for every crash
  • Track slow operations by device category (budget vs flagship)
  • Tag errors with network type (Wi-Fi, cellular, unknown)

When "works on my machine" happens in production, this data tells you whose machine it doesn't work on. That's the starting point for reproduction.

kotlin
// Add to your crash reporting initialization
FirebaseCrashlytics.getInstance().apply {
    setCustomKey("device_model", Build.MODEL)
    setCustomKey("os_version", Build.VERSION.RELEASE)
    setCustomKey("oem", Build.MANUFACTURER)
    setCustomKey("sdk_int", Build.VERSION.SDK_INT)
}

2. Maintain a Representative Device Lab

Not a comprehensive device lab — a representative one. Cover:

  • The most common device in your crash reports (usually a budget Android)
  • The OEM with most reported issues (often Samsung or Xiaomi)
  • The oldest OS version you support
  • The newest OS version released in the last 6 months

Test these devices with realistic settings: battery optimization enabled, default storage with some apps installed, mobile data instead of Wi-Fi.

3. Test With Production-Like Data

For any feature that involves data queries, run performance tests against a production data snapshot (anonymized). This catches the queries that are fast on 100 rows and slow on 10,000.

4. Chaos Engineering for Mobile

Simulate the network conditions your users actually experience:

  • Android: Developer options → Simulate cellular network
  • Charles Proxy / mitmproxy: Throttle bandwidth, inject latency, simulate packet loss
  • Network emulation in tests: Use OkHttp's MockWebServer with configurable delays
kotlin
// Simulate slow network in tests
val server = MockWebServer()
server.enqueue(MockResponse()
    .setBody(departuresJson)
    .setBodyDelay(3, TimeUnit.SECONDS) // 3 second delay
    .setSocketPolicy(SocketPolicy.STALL_SOCKET_AT_START)) // stall on some calls

The Honest Bottom Line

"Works on my machine" can be reduced but not eliminated. The diversity of environments where real users run your software is too large to fully replicate in any test setup.

The goal isn't zero environment mismatch failures. It's:

  1. Fast detection when environment-specific failures occur in production
  2. Enough diversity in your test environments to catch the most common mismatches before production
  3. Fast root-cause identification when users report failures (because you have the right instrumentation)

Teams that achieve this don't celebrate "works on my machine" as an excuse. They treat it as diagnostic information: "It works here, fails there — what's different between these environments?" That question, answered systematically, is how you close the gap.

Share:
S

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

Newsletter preferences

Related Apps

MyFamilyTracker

Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.

Building something? Available for Android dev and QA consulting.

Work with me

Comments — powered by Giscus