100% CPU Usage

mercredi 19 novembre 2014

We're trying, for several weeks now, to resolve an issue where the Mirth Connect CPU usage jumps to 100% and stays there until we kill the mcservice.exe process. We are running out of ideas to find what could cause the problem; any help would be greatly appreciated.



Here's the situation:

  • Our database.max-connections is set to 40.

  • We have several Channels (roughly fifty) processing messages in which reports are embedded. So the messages are about 80K. We do not use attachments because they are not included in the archiving process. We will use them once we have found a way to include them in the archiving process but for now, this is the situation.

  • All the Channels have their Storage Mode set to Production.

  • We have several connectors (50+) for which the queuing is enabled (1 thread, 10000 retry interval, rotate unchecked) and for which "Wait for previous destination" is unchecked.

  • Every time the problem occurs, there are some destinations for which the number of queues messages is constantly increasing, with no attempt made to process the next message that is supposed to be processed (according to the dashboard). As if the destinations were stuck on something. Once the Mirth Connect service is killed is then restarted, all the messages in queue are processed successfully.

  • We were unable to pinpoint a specific set of destinations that could cause the problem. Every time the problem occurs, it's not the same channels and/or destinations that start queuing messages without any apparent reason.

  • We are unable to stop or redeploy a channel that has a destination that presents the symptoms described above. The channel (destination) is stuck with the STOPPING state. Trying to halt the channel doesn't help, it throws an exception. All the other channels can be stopped or redeployed successfully.

  • We were able to get a graph (using zabbix) that shows how many users were connected to the Mirth database. You can clearly observed that the problem occurs 3 times in this graph (the numbers of user connected grows significantly). Each time we had to kill the mcservice.exe process.

  • You can see (using process explorer from sysinternals) that there is more that 1 thread that use all the CPU.

  • We also were able to get some information from visual VM but it doesn't help that much.




Do you have any clue/idea to help us with this problem ? A lot of people are involved when the problem occurs and we are running out of excuses for them...



How is it possible to have a destination with hundreds of queued messages (no error) and with no message marked as QUEUED with a "send attempts" greater than 0 ?



Many thanks.








100% CPU Usage

0 commentaires:

Enregistrer un commentaire