System Error Message

Incident Report for resero CampusCloud & SmartLockers

Resolved

We have seen a resolution to prior issues caused by the global AWS outage, and AWS has updated the status of the issue to resolved; below is the last update provided by AWS.

[RESOLVED] Increased Error Rates and Latencies
Oct 20 3:53 PM PDT Between 11:49 PM PDT on October 19 and 2:24 AM PDT on October 20, we experienced increased error rates and latencies for AWS Services in the US-EAST-1 Region. Additionally, services or features that rely on US-EAST-1 endpoints such as IAM and DynamoDB Global Tables also experienced issues during this time. At 12:26 AM on October 20, we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due to its dependency on DynamoDB. As we continued to work through EC2 instance launch impairments, Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch. We recovered the Network Load Balancer health checks at 9:38 AM. As part of the recovery effort, we temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations. Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary.
Posted Oct 20, 2025 - 22:00 EDT

Update

We have seen steady transaction processing and communications with our partners over the last 45 minutes; while AWS has not fully recovered all of their systems, it appears that they are proceeding towards a full resolution. We will continue to monitor for any additional updates or interruptions.
Posted Oct 20, 2025 - 17:10 EDT

Update

AWS has downgraded the severity of the outage as services continue to recover; below is the latest updated from them. We continue to see successful communications with our partners and transactions process as expected.

Oct 20 1:03 PM PDT Service recovery across all AWS services continues to improve. We continue to reduce throttles for new EC2 Instance launches in the US-EAST-1 Region that were put in place to help mitigate impact. Lambda invocation errors have fully recovered and function errors continue to improve. We have scaled up the rate of polling SQS queues via Lambda Event Source Mappings to pre-event levels. We will provide another update by 1:45 PM PDT.
Posted Oct 20, 2025 - 16:08 EDT

Update

AWS has provided the update below; we continue to see successful recovery of all communications with partner systems and successful transaction processing.

[12:15 PM PDT] We continue to observe recovery across all AWS services, and instance launches are succeeding across multiple Availability Zones in the US-EAST-1 Regions. For Lambda, customers may face intermittent function errors for functions making network requests to other services or systems as we work to address residual network connectivity issues. To recover Lambda’s invocation errors, we slowed down the rate of SQS polling via Lambda Event Source Mappings. We are now increasing the rate of SQS polling as we experience more successful invocations and reduced function errors. We will provide another update by 1:00 PM PDT.
Posted Oct 20, 2025 - 15:27 EDT

Update

We are continuing to see an increase in successful communications with our partners and transactions processing as expected; we are continuing to monitor intermittent failures of receipts that appears to be a function of the slow recovery of AWS systems.
Posted Oct 20, 2025 - 14:47 EDT

Update

We are beginning to see signs of recovery with more transactions processing successfully; however, this may only be temporary as AWS continues to work to fully resolve all issues. We will continue to monitor and provide updates as they become available.


AWS has provided the update below:

Oct 20 10:38 AM PDT Our mitigations to resolve launch failures for new EC2 instances are progressing and the internal subsystems of EC2 are now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region. We are applying mitigations to the remaining AZs at which point we expect launch errors and network connectivity issues to subside.
Posted Oct 20, 2025 - 13:47 EDT

Update

AWS has provided the update below; it seems like when they resolve one issue it causes a separate issue. We will continue to monitor and provide updates based on the information we receive and our own observations of the CampusCloud environment.


Oct 20 8:43 AM PDT We have narrowed down the source of the network connectivity issues that impacted AWS Services. The root cause is an underlying internal subystem responsible for monitoring the health of our network load balancers. We are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations.
Posted Oct 20, 2025 - 12:04 EDT

Update

The global AWS outage continues to impact our and our partner's hosting environments; we are now seeing a drastic increase in System Errors returning from our communications.
Posted Oct 20, 2025 - 11:40 EDT

Monitoring

The root of the System Error is the AWS global outage affecting routing of our communications to external partners such as TaxJar, WPS, Clover, CarcConnect, and Calcurates; we will continue to monitor the situation with AWS throughout the day, but we are seeing a decrease in errors in our communications at the moment.
Posted Oct 20, 2025 - 10:20 EDT

Investigating

We are currently investigating reports of a System Error message when attempting to process transactions; we believe this is related to the global AWS outage that is occurring.
Posted Oct 20, 2025 - 10:07 EDT
This incident affected: CampusCloud Core Services (eCommerce Platform, AWS West Coast Environment, AWS East Coast Environment).