System Issues - What happened and what we've done (9th and 10th March 2020)
On the 9th and 10th of May we experienced two separate system issues that impacted on the availability and performance of Power Diary.
Firstly, I want to apologise for the disruption caused to you, your team and your patients. Power Diary is designed to be the central operating hub of health practices and we understand the significant impact any system disruptions can have. Thank you for your patience and support whilst we worked through identifying and resolving these issues. We really appreciate it.
Two issues in two days is not good. We are disappointed this occurred and I want to take this opportunity to provide some more information about what happened with each of these, and what we have done to prevent recurrence. Despite occurring one day apart, both issues were unrelated to each other.
Issue 1: Users unable to Access Power Diary (Monday, 9th March)
Approx Duration: 25mins
Users were unexpectedly unable to access Power Diary. This issue was due to one of our security systems . This particular system ensures that only non-suspicious, validated traffic is able to reach our servers. Instead of blocking only suspicious traffic, it started blocking all traffic. Whilst the Power Diary application itself was operating normally behind the scenes, because the security system was blocking user attempts to login in or navigate, it appeared to users as if the system was down. Once we identified the source of the issue, and we were able to recalibrate the security setting which then re-enabled access. We initially thought the blocking behaviour was an automated system response, however we subsequently identified that it was human error - one of our system support engineers had inadvertently changed a setting that blocked all incoming traffic. We have identified the way in which this setting was accidentally changed, and are updating some system labels to minimise the risk of this from occurring again.
Issue 2: System became progressively slower to navigate to point it appeared to freeze. (Tuesday, 10th March)
Approx Duration: 2 hours
Power Diary became progressively slower to navigate and use to the point that it appeared to freeze. We began investigating this at the first sign of reduced performance, however the system diagnostics indicated some increased load, but not to the extent that would explain the slowness being experienced. This made it difficult to isolate. We then turned off some non-essential processes, and reset several key areas. This improved performance so that user access was restored, but we identified that the profile of resource usage was still outside our normal operating range.
It was then identified that one of the code updates that had been released overnight in preparation for our new Custom Forms feature contained a configuration that had not been optimised. Put simply it meant that an autosave feature was able to activate excessively, which then consumed an extraordinary amount of system resources. This then impacted on the rest of the system. Once identified, we reverted the problematic update and the system functioning returned to normal.
This issue was not detected in our testing environment prior to release because it only had the potential to noticeably impact on system performance when the additional autosaves occured at scale i.e. in the live environment when everyone is using Power Diary. Having said that we should have identified the configuration issue at a code level, and this incident is being reviewed though our QA processes to ensure we are better able to detect and prevent this type of issue in the future.
We have continued to monitor the system closely, and everything is operating normally. We have also not had any further reports of issues from users.
Once again we're sorry for the impact these issues caused. Please let us know if you have any questions, or if there is any further information we can provide.
- Damien, on behalf of the Power Diary Team.