The Problem
We've been working on an application which needed significant integration with an external system which is done via web services.
During out load testing we came across a major issue with the web services when the application was under load which caused the ColdFusion application server to crash very badly.
The problem was caused by the fact that the web service calls typically took about 1 second to complete whereas pages that didn't need to use a web-service completed in about 100ms.
When we put the application under load if the number of requests that needed web-services ran at a sustained high rate then very soon all the running requests would be doing web service calls. This meant that all 25 java threads were getting swallowed up the "long running" web-service calls.
This caused all the other threads to queue up and very shortly the application fell over in a heap.
The Solution
Initially we looked a using cflock tags to handle it but this would essentially serialize all web service requests and only one web-service thread would run at any one time. This meant that the application would not be able to handle the required load.
After a bit of a brainstorming session we came up with the idea to develop a semaphore type object which would limit the amount of threads that could get tied up with long running web-service requests.
Ideally it works like this:
- A web-service request comes in
- It requests a thread from the Semphore object
- If it gets one it runs the Web Service and then releases the thread
However, if the system is busy it works like this:
- A web-service request comes in
- It requests a thread from the Semphore object
- It doesn't get a thread so doesn't run the web service and quickly returns and lets the user know the app is busy
Usage
In order to use the semaphore you need to set it up in a shared scope - typically application or server scope. We use it in our webservice wrapper CFC which is stored in the application scope.
<cfset variables.instance.Semaphore = createObject("component", "Semaphore").init("unique_name", 15,"logfilename")>
Here is a code example of how this looks in practice:
<cfset threadID = variables.instance.Semaphore.acquireThread()>
<cfif threadID GT 0>
<cftry>
<!--- We handle all errors in here to ensure we release the thread afterwards --->
<cfinvoke webservice here>
<cfcatch type="any">
<!--- Log errors but continue so we release thread--->
</cfcatch>
</cftry>
<!--- Make sure we release the thread - even if everything above explodes --->
<cfset variables.instance.Semaphore.releaseThread(threadID)>
</cfif>
Note: One of the issues that we came across during the implementation of this was that when long running requests were caught by the long running request timeout the releaseThread function would not get called as the webservice code was timing out and it was getting killed before it hit the releaseThread line. To work around this we implemented an internal garbage collection mechanism to ensure that we didn't leak threads, or when we did we could recover from it.
This has allowed us to manage the number thread by limiting the number of long running requests so as the server gets busy it will now reject web service calls before it runs out of memory.
You can get the full code for the Semaphore object from our Opensource CF Library