Dwolla.com and APIs are unavailable
Incident Report for Dwolla
Postmortem

Summary

On Friday, January 6th, 2017, we experienced intermittent outages that persisted throughout the morning starting at approximately 6:30am CST with final resolution of the issue at 11:15am CST. It affected our APIs, dwolla.com, as well as the Dashboard and Admin which is powered by our APIs.

An outage at AWS caused a database instance in Dwolla’s environment to failover to a backup instance. Post failover, some services were unable to connect to a backup database.

On behalf of the Dwolla team, we sincerely apologize for the occurrence of this outage, and any inconvenience it may have caused to you and your customers.

Outage Timeline and Resolution

All times CST.

Approximately 6:30 a.m.: API monitoring alerted the team that issues were occurring.
7:08 a.m.: Issues reaching dwolla.com and status page incident was created. The team identified the database failure and began researching resolution.
7:46 a.m.: A support ticket was opened with AWS to bring the failed database back online. Failing services were reconfigured to connect to the failover database.
11:07 a.m.: We updated our status page, services were back online
11:16 a.m.: After successful monitoring, the issue was closed

During the outages:

  • Many customers were not able to reach dwolla.com
  • Dwolla’s customers’ applications were not able to reach api.dwolla.com to send or retreive data intermittently throughout the morning hours.

What We Learned

  • In the case of complete failure, some Dwolla services were unable to connect to a backup database.
  • Monitoring quickly alerted us of outages in the API and other areas of the platform

Next steps

In response to Friday's events, we've initiated efforts to fix services so they are able to reconnect to backup databases in the case of complete failure.

We sincerely apologize for any inconvenience this caused our customers. Please feel free to reach out on our developer forum if you have any concerns or questions on the outage that occurred on January 6th.

Posted Feb 06, 2017 - 15:28 CST

Resolved
This incident has been resolved.
Posted Jan 06, 2017 - 11:16 CST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 06, 2017 - 11:06 CST
Identified
The issue has been identified and a fix is being implemented.
Posted Jan 06, 2017 - 10:53 CST
Investigating
We are currently investigating this issue.
Posted Jan 06, 2017 - 07:08 CST