Dictation Problems
Incident Report for nVoq Public
Postmortem

At nVoq, we understand that SayIt service availability is critical to your daily workflow and would like to apologize for the healthcare.nvoq.com service interruption on Friday May 31, and the service slowdowns on Monday June 3 and Tuesday June 4. An update of the system on May 30 introduced a bug that caused SayIt servers to run out of memory and crash under very certain conditions. We have deployed a new version of the SayIt software containing a fix, and have increased our testing methodologies to try and expose this type of bug before reaching production in the future. Below is our root cause analysis of the incident. If you have any questions or comments, please contact us at support@nvoq.com.

Root Cause Analysis

nVoq’s deployment of version 14.1 of the nVoq platform was completed at 4:34 PM MDT, Thursday May 30, 2019. This version of the software contained an unknown defect that impacted application server performance. The impact was memory consumption leading to slow performance and occasional failed dictations and user-administration transactions. Signs of performance degradation began Friday May 31 at 1:53 PM MDT. The nVoq DevOps team began investigating reports of performance problems and system monitoring data. Normal system performance was restored by 4:00 PM MDT.

There were no issues on Saturday June 1 or Sunday June 2. Investigation by the DevOps team continued over the weekend.

By Monday morning, it was known that memory usage on the application servers was part of the problem and that restarting the application servers, before they reached their memory limit, resolved the issue. However, the root cause was still unknown. As a temporary workaround, the DevOps team added additional monitoring and lowered alert thresholds to alert on this new set of conditions. A pool of spare application servers was created that could immediately replace the production servers before they reached their memory limit. To minimize user impact, production servers were manually replaced with servers from this pool when programmatic alerts indicated high memory use. This continued until the patch was deployed.

Memory captured Tuesday morning from the application servers led to an identification of the root cause. The defect was then reproduced on nVoq’s test systems while a solution was developed, and a patch was built. The build was tested internally to verify the fix. Tuesday evening the patch was applied to eval.nvoq.com. Wednesday morning, further testing was performed on eval.nvoq.com by both nVoq employees and some ISV partners.

The patch was deployed on healthcare.nvoq.com Wednesday June 5 at 1:12 PM MDT. Monitoring has validated that the defect and resulting performance degradation were resolved by the patch. It is estimated that up to 10% of dictations were impacted by slowness or failure conditions, with the greatest impact occurring Friday afternoon. Canadian and Agent Assist customers were not impacted.

Posted 10 days ago. Jun 06, 2019 - 13:58 MDT

Resolved
This problem appears to be resolved. There have been no additional inquiries in regards to this issue in the last 4 hours. If you experience any further issues please reach out to nVoq Support at support@nvoq.com.
Posted 13 days ago. Jun 03, 2019 - 15:42 MDT
Update
We have not received any more instances in regards to this issue. Everything seems to be operational at this time. We are still closely monitoring the system. If you experience any additional issues please reach out to nVoq Support at support@nvoq.com.
Posted 14 days ago. Jun 03, 2019 - 11:17 MDT
Monitoring
Our testing indicates that dictations are returning more quickly and the latency has subsided on the SayIt Administrator console. We are monitoring this issue. If you experience any additional issues please reach out to nVoq Support at support@nvoq.com.
Posted 14 days ago. Jun 03, 2019 - 09:55 MDT
Update
Users are experiencing transcripts not returning when dictating and users are receiving an error message that displays: Dictation failed due to server session timeout.
Posted 14 days ago. Jun 03, 2019 - 09:34 MDT
Identified
We are aware that transcripts are not returning and there is slowness when accessing the Sayit Administrator console. We apologize for the inconvenience at this time and are working to resolve the issue as quickly as possible.
Posted 14 days ago. Jun 03, 2019 - 09:20 MDT
This incident affected: SayIt at healthcare.nvoq.com (Healthcare: SayIt Administration Portal, Healthcare: Voice Shortcuts, Healthcare: HTTPS Dictations, Healthcare: Websocket Dictations).