06 Oct 2020 15:14 UTC
Our logs show intermittent errors beginning at 0648 Pacific. We’ve localized it to GAE networking/datastore; while GCP had no publicly disclosed incident as of 0800, their support portal shows a known issue with GAE since 0645. Presumably this is our issue.
06 Oct 2020 15:25 UTCUpdate: We’ve confirmed that webhook ingestion processing is slowed, but we continue to receive incoming webhooks.
06 Oct 2020 15:30 UTCUpdate: Our logs suggest that issue may have been resolved at ~0820. We’ll continue to investigate.
06 Oct 2020 16:05 UTCUpdate: We’re largely confident that the issue is resolved as of 0820. Google has posted a publicly visible incident report. Per that report, incident was limited to the US, so: 1) our EU customers should be unaffected, and 2) no data from incoming webhooks will have been lost - even from services that don’t re-transmit failed webhooks.
06 Oct 2020 16:20 UTCUpdate: Google’s incident report now states the issue was resolved as of 0913 Pacific. Per our logs, Worklytics was not impacted after 0820. Our investigations have also shown that impact was intermittent, with only some requests to the Worklytics web app affected, and most acute between 0710 and 0745. We don’t anticipate providing further updates on this incident. Please contact Worklytics support if you have any further questions.
Affected Systems: Worklytics Web App, Webhook Ingestion