Written By: Kevin Willms, Senior Cloud Operations Reliability Engineering Manager and Matthew Campbell, VP Cloud Platform
At D2L, we’re committed to ensuring your users have the best experience that they can, and that starts with availability. By availability, we are referring to the measure of how available our products and services are for our customers to use when they want to use them. A common saying at D2L is, “The most important feature is availability.” While we work hard to provide exciting new functionality and features, the fact remains that if the system is down, then none of these capabilities matter.
It is for that reason that the Cloud Operations Reliability Engineering (CORE) team exists, which operates out of Kitchener, Canada and Melbourne, Australia. We are constantly looking for opportunities to improve the stability of the Brightspace platform as well as increase our visibility into the health of all aspects of the system. This commitment to performance has resulted in us consistently surpassing our monthly 99.9% availability commitment and often surpassing and even five 9s as well.
Boring and uneventful
Not usually how we want you to think about the Brightspace platform, but when it comes to availability, especially during your busiest seasons, this is how we want you to describe your experience with us! We aim to consistently demonstrate that we can support you and your learners so that the availability and performance of your LMS is not something that occupies your time or keeps you up at night.
Pulling back the curtain… a little
While we cannot disclose all our secrets, we can share that we make use of AWS auto-scaling technologies, allowing us to quickly provision additional resources when we notice increased usage. We have also designed our systems to be redundant and resilient so that when unexpected and unpredictable computer failures occur, we can continue to provide a seamless experience to your users with immediate failovers to healthy resources.
The CORE team is also constantly monitoring and reviewing alerts to validate that our systems are scaling appropriately. If needed, we can perform adjustments to ensure that Brightspace continues to operate optimally.
While auto-scaling and good architectural design are good foundation, when unexpected issues arise, our CORE team is ready to respond 24/7 and will work with the required teams until the issue is resolved, keeping you and your users informed throughout via the D2L Status Page.
We have also been on a journey to standardize the observability and monitoring across all of the infrastructure and services that D2L provides to provide a consistent level of excellence across anything that D2L builds.
There are more improvements on their way in the coming months to ensure we can respond even faster to changes in your usage patterns, providing your system with smooth operations. These improvements will push our availability from excellent to incredible and make our system availability even more boring and uneventful. This means the educators, learners, and administrators using the Brightspace platform can focus on what really matters: teaching and learning.