Metakgp:Incident Reports/2019-07-17 Visual Editor stopped working for a few hours

From Metakgp Wiki
Jump to navigation Jump to search

Impact[edit source]

Visual Editor was not working for about 15 hours from approximately 01:00 to 16:00.

Trigger[edit source]

01:10 Release of PR 49

Detection[edit source]

06:45 Detected during the post-release testing of MediaWiki upgrade to v1.33

> visual editor still not running

Slack message

Timeline[edit source]

Notes:

  • Dates and times must always be entered in India Standard Time (UTC +5:30)
  • Event (column 3) must be written in the present tense
Date Time Event Notes
2019-07-17 01:10 PR 49 is released links was a deprecated Docker feature and we wanted to move away from using it to the recommended replacement: networks. But the networks were not configured correctly: the parsoid container couldn't connect to the nginx container. This connection is essential for parsoid to work.
2019-07-17 01:15 [INCIDENT BEGINS] Visual editor becomes unusable
2019-07-17 06:45 [INCIDENT DETECTED] Visual editor error are noticed for the first time Error log from parsoid container clearly says that nginx container is not accessible
2019-07-17 16:10 [INCIDENT MITIGATED] PR 64 is released This PR puts the parsoid and nginx containers in the same network, hence enabling communication between them
2019-07-17 16:12 [INCIDENT ENDS] Visual editor is usable again Verified as both anon user and as a logged in user, with and without captcha

Incident Analysis[edit source]

What went well? What went wrong? Where did we get lucky?
Incident was detected during routine post-release testing for a subsequent PR Several PRs were released together or in quick succession which made it hard to detect this problem right after PR 49 was released Parsoid's error message made the reason for the error apparent
Bug Fix PR 64 was approved quickly and released the same day

Notes / Discussion[edit source]

Error message inside the parsoid container[edit source]

{
  "name":"parsoid",
  "hostname":"a815b3504bee",
  "pid":36,
  "level":60,
  "err":{
          "message":"Config Request failure for \"http://nginx/api.php\": Error: getaddrinfo ENOTFOUND nginx nginx:80",
          "name":"lib/index.js"