I spent a good three weeks troubleshooting this issue and finally found the solution. It started when I decided to do several updates at once. CF 8.0.0. to 8.0.1, apply the CFIMAGE hotfix, install a plethora of Windows updates (90+), and switch to the latest version of Java 6.
After several reboots, Coldfusion Application Server became unstable. If I was lucky, it would take 3-4 service start attempts before it would run successfully. And on occasion it wouldn’t last long.
“EXCEPTION_ACCESS_VIOLATION Java HotSpot(TM) Server VM problematic frame in ntdll.dll”
But what does this mean?! I exhausted just about every effort. Google searching kept returning people receiving unrelated memory errors. I tried 5 different JVMs including the one that came with CF8. Different JVM startup arguments. Uninstalled Fusion Reactor. Reinstalled CF8. Rebuilt the VM from scratch.
Reinstalling CF8 actually worked, but as soon as I changed the Maximum JVM Heap Size to anything other than the default 512m, I was back to receiving the same error. Unfortunately I didn’t realize changing the memory was the problem because CF started just fine as it sometimes it would.
On another server, I didn’t want to mess with Coldfusion at all and decided to only install Windows Updates. That server became unstable. Awesome. I wonder which of the 90 Windows Updates caused this?
I’m usually pretty good at finding answers to my problems, but this one really took me down a notch and made me feel less of a person. Weeks had gone by and I kept ending up back at the beginning. I tried everything. Everything except for the one simple thing that would fix it. Maybe I wasn’t good enough at my job, the one where I’m solving problems every day.
A few days later I knocked the JVM max memory back down to 512m. CF started. I restarted 20 times, even rebooted the machine, and it stayed up. Nice! Even though 512m isn’t enough, at least I know it’s memory related. I tried 640m and 768m. They would work more often than 1024m, but not good enough for production. I assume it’s not a physical memory issue because I’m getting the same error from a VM on a different server.
Late in the evening, before I would call it quits by hitting the bottle, I did one last Google search. Which lead me to this Server Fault link which mentions a specific Windows Update and that update having a subsequent hotfix for it. The Knowledge Base article mentions an issue with the Microsoft ISA Server Control service failing to start on servers where multiple cores were registering as multiple CPUs. My VM is a dual core running on an 8-core host. After installing the hotfix and rebooting, all my problems went away.
The good thing about this whole mess is while I felt dumber the longer it went on I learned a good amount while reading up on JVMs, error logging on the CF and Java side, various JVM arguments and what they can do for the server, Citrix XenServer tidbits, and installing and configuring our web servers from scratch.