Dictation Problems
Incident Report for nVoq Public

At nVoq, we understand that SayIt service availability is critical to your daily workflow and would like to apologize for the healthcare.nvoq.com service interruption on Friday May 31, and the service slowdowns on Monday June 3 and Tuesday June 4. An update of the system on May 30 introduced a bug that caused SayIt servers to run out of memory and crash under very certain conditions. We have deployed a new version of the SayIt software containing a fix, and have increased our testing methodologies to try and expose this type of bug before reaching production in the future. Below is our root cause analysis of the incident. If you have any questions or comments, please contact us at support@nvoq.com.

Root Cause Analysis

nVoq’s deployment of version 14.1 of the nVoq platform was completed at 4:34 PM MDT, Thursday May 30, 2019. This version of the software contained an unknown defect that impacted application server performance. The impact was memory consumption leading to slow performance and occasional failed dictations and user-administration transactions. Signs of performance degradation began Friday May 31 at 1:53 PM MDT. The nVoq DevOps team began investigating reports of performance problems and system monitoring data. Normal system performance was restored by 4:00 PM MDT.

There were no issues on Saturday June 1 or Sunday June 2. Investigation by the DevOps team continued over the weekend.

By Monday morning, it was known that memory usage on the application servers was part of the problem and that restarting the application servers, before they reached their memory limit, resolved the issue. However, the root cause was still unknown. As a temporary workaround, the DevOps team added additional monitoring and lowered alert thresholds to alert on this new set of conditions. A pool of spare application servers was created that could immediately replace the production servers before they reached their memory limit. To minimize user impact, production servers were manually replaced with servers from this pool when programmatic alerts indicated high memory use. This continued until the patch was deployed.

Memory captured Tuesday morning from the application servers led to an identification of the root cause. The defect was then reproduced on nVoq’s test systems while a solution was developed, and a patch was built. The build was tested internally to verify the fix. Tuesday evening the patch was applied to eval.nvoq.com. Wednesday morning, further testing was performed on eval.nvoq.com by both nVoq employees and some ISV partners.

The patch was deployed on healthcare.nvoq.com Wednesday June 5 at 1:12 PM MDT. Monitoring has validated that the defect and resulting performance degradation were resolved by the patch. It is estimated that up to 10% of dictations were impacted by slowness or failure conditions, with the greatest impact occurring Friday afternoon. Canadian and Agent Assist customers were not impacted.

Posted Jun 06, 2019 - 13:58 MDT

This problem appears to be resolved, there have been no additional inquiries regarding performance latency. We are still actively monitoring for signs of system slowness.

We sincerely apologize for any inconvenience that this issue has caused you and your customers. Please continue to reach out to nVoq Support at support@nvoq.com if you experience any further issues.
Posted Jun 04, 2019 - 11:58 MDT
Performance issues that affected the Sayit Client, API users, and Sayit Admin console seem to be rectified at this moment. We are able to detect the degraded performance as soon as it occurs and have a remedy in place that should have the issue resolved very quickly. We are still investigating the root cause of this issue. Please continue to reach out to nVoq Support support@nvoq.com if you experience any further issues.
Posted Jun 04, 2019 - 10:28 MDT
We are aware of the issue with the transcript returning slowly or not at all. We are currently working to address the issue and apologize for any inconvenience that this issue is causing.
Posted Jun 04, 2019 - 08:46 MDT
This incident affected: SayIt at healthcare.nvoq.com (Healthcare: SayIt Administration Portal, Healthcare: Voice Shortcuts, Healthcare: HTTPS Dictations, Healthcare: Websocket Dictations).