Skip to content

Architecture#

This page of the documentation explains technical details of okami architecture.

Process#

Initialising#

  1. Spider object is created and settings are loaded
  2. Pipelines and middlewares are loaded and initialised
  3. Startup pipeline is executed and finished
  4. Start page from spider is queued

Scraping process starts.

Processing#

  1. Request is created with a Task object
  2. Request is passed through http middleware before cycle
  3. Downloader processes Request and creates Response
  4. Response is passed through http middleware after cycle
  5. Errors from request/response cycle are handled as well as throttling
  6. Task and Response are passed through spider middleware before cycle
  7. Spider processes Task and Response and creates a list of new Task and Item objects
  8. Task, Response, list of Task and Item objects are passed through spider middleware after cycle
  9. List of Item objects is passed through items pipeline cycle
  10. List of Task objects is passed through tasks pipeline cycle

Processing part is repeated for every task or page until exhausted.

Finalising#

  1. Session object is closed
  2. Pipelines and middlewares are finalised

Okami terminates.

Schema#

Image