Incident Report: September 30 Failed Orders

On September 30, less than 10 web stores were affected by an issue that caused orders to fail. The issue occurred after the latest release of the commercebuild platform, version 4.10.0. Given that the issue was not immediately known, affected web stores were impacted by this issue from approximately 01:00 UTC until 18:30 UTC.

We apologize for the inconvenience that this caused affected customers. The reliability and stability of our platform are of paramount importance to us, and we take any incident that occurs seriously. As such, we would like to inform you about what happened and the steps we take to prevent these issues from occurring.

What Happened

The latest release of the commercebuild platform solved an issue where default payment terms of B2B and B2C web store customers were not saved. However, there were no validation checks for current payment settings within the database table record in order to determine if the database record was still valid. Unfortunately, incorrect payment terms were being inserted into orders, which subsequently caused the orders to fail. 

Impact

Affected web stores experienced delays in the posting of orders to their ERPs until approximately 18:30 UTC. From the web store user perspective, no functionality was impacted, and orders were recorded normally in the web store database.

Timeline

  • September 30 at 01:00 UTC: Version 4.10.0 of the commercebuild platform was released
  • 02:49 UTC: commercebuild receives the first ticket regarding a failed order and incorrect payment terms. Investigation begins.
  • 10:30 UTC: It is still assumed to be isolated to a single web store.
  • 14:26 UTC: A second ticket is received regarding a failed order and incorrect payment terms.
  • 15:51 UTC: The investigation then turns to the cause potentially being related to the latest release.
  • ~16:15 UTC: The issue is escalated to Development.
  • 18:05 UTC: Development confirms that the issue is related to the release and a resolution is forthcoming.
  • 18:30 UTC: A fix is applied to mitigate the issue for further orders. 
  • 20:33 UTC: Long-term resolution is under code review.
  • 21:00 UTC: commercebuild begins manually reposting failed orders.
  • 23:29 UTC: Development releases an update into production to resolve the issue.

Future Prevention

With every new release of the commercebuild platform, several measures are taken to prevent a negative impact on your business. Our Development team scrutinizes and tests every update to the commercebuild platform before they are released. This is achieved by testing changes in an isolated staging environment. Furthermore, immediately after every release, we test web stores to ensure their functionality. These measures will continue.

In this case specifically, not all team members had visibility into failed orders and their causes. Had we seen an uptick in failed orders after the release, we may have been able to act faster. We are looking into methods that would allow us to be more broadly aware of failed orders at a platform level. This would not only allow us to act faster in the event that this issue happens again, it would also allow us to react more proactively to failed orders on commercebuild web stores in general.

We again apologize for the inconvenience this issue caused, and we appreciate the trust you place in us every day as your ecommerce partner. If you have any questions or concerns, please reach out to me at bhale@commercebuild.com, and I will be happy to assist you.