Stateful testing

By default, Schemathesis takes all operations from your API and tests them separately by passing random input data and validating responses. It works great when you need to quickly verify that your operations properly validate input and respond in conformance with the API schema.

With stateful testing, Schemathesis combines multiple API calls into a single test scenario and tries to find call sequences that fail.

Stateful tests in Schemathesis rely on Open API links to function, as they are designed to target stateful transitions between API endpoints. Unlike stateless tests, which verify individual endpoints in isolation, stateful tests require these links to sequence API calls logically. Ensure your schema includes Open API links to leverage stateful testing effectively.

Why is it useful?

This approach allows your tests to reach deeper into your application logic and cover scenarios that are impossible to cover with independent tests. You may compare Schemathesis’s stateful and non-stateful testing the same way you would compare integration and unit tests. Stateful testing checks how multiple API operations work in combination.

It solves the problem when your application produces a high number of “404 Not Found” responses during testing due to randomness in the input data.

NOTE. The number of received “404 Not Found” responses depends on the number of connections between different operations defined in the schema. The more connections you have, the deeper tests can reach.

How to specify connections?

To specify how different operations depend on each other, we use a special syntax from the Open API specification - Open API links. It describes how the output from one operation can be used as input for other operations. To define such connections, you need to extend your API schema with the links keyword:

 paths:
   /users:
     post:
       summary: Creates a user and returns the user ID
       operationId: createUser
       requestBody:
         required: true
         description: User object
         content:
           application/json:
             schema:
               $ref: '#/components/schemas/User'
       responses:
         '201':
           ...
           links:
             GetUserByUserId:
               operationId: getUser  # The target operation
               parameters:
                 userId: '$response.body#/id'
   /users/{userId}:
     get:
       summary: Gets a user by ID
       operationId: getUser
       parameters:
         - in: path
           name: userId
           required: true
           schema:
             type: integer
             format: int64

In this schema, you define that the id value returned by the POST /users call can be used as a path parameter in the GET /users/{userId} call.

Schemathesis will use this connection during GET /users/{userId} parameters generation - everything that is not defined by links will be generated randomly.

If you don’t want to modify your schema source, add_link allows you to define links between a pair of operations programmatically.

For CLI, you can use the after_load_schema hook to attach links before tests.

Add a new Open API link to the schema definition.

Parameters:
  • source (APIOperation) – This operation is the source of data

  • target – This operation will receive the data from this link. Can be an APIOperation instance or a reference like this - #/paths/~1users~1{userId}/get

  • status_code (str) – The link is triggered when the source API operation responds with this status code.

  • parameters – A dictionary that describes how parameters should be extracted from the matched response. The key represents the parameter name in the target API operation, and the value is a runtime expression string.

  • request_body – A literal value or runtime expression to use as a request body when calling the target operation.

  • name (str) – Explicit link name.

schema = schemathesis.from_uri("http://0.0.0.0/schema.yaml")

schema.add_link(
    source=schema["/users/"]["POST"],
    target=schema["/users/{userId}"]["GET"],
    status_code="201",
    parameters={"userId": "$response.body#/id"},
)

With some minor limitations, Schemathesis fully supports Open API links, including the runtime expressions syntax.

Minimal example

Stateful tests could be added to your test suite by defining a test class:

import schemathesis

schema = schemathesis.from_uri("http://0.0.0.0/schema.yaml")

APIWorkflow = schema.as_state_machine()
TestAPI = APIWorkflow.TestCase

Besides loading an API schema, the example above contains two basic components:

  • APIWorkflow. A state machine that allows you to customize behavior on each test scenario.

  • TestAPI. A unittest-style test case where you can add your pytest fixtures that will be applied to the whole set of scenarios.

Stateful tests work seamlessly with WSGI / ASGI applications - the state machine will automatically pick up the right way to make an API call.

The implementation is based on Hypothesis’s Rule-based state machines, and you can apply its features if you want to extend the default behavior.

Note

Schemathesis’s stateful testing uses Swarm testing (via Hypothesis), which makes defect discovery much more effective.

Lazy schema loading

It is also possible to use stateful testing without loading the API schema during test collection. For example, if your application depends on some test fixtures, you might want to avoid loading the schema too early.

To do so, you need to create the state machine inside a pytest fixture and run it via run() inside a test function:

import pytest
import schemathesis


@pytest.fixture
def state_machine():
    # You may use any schema loader here
    # or use any pytest fixtures
    schema = schemathesis.from_uri("https://example.schemathesis.io/openapi.json")
    return schema.as_state_machine()


def test_statefully(state_machine):
    state_machine.run()

How it works behind the scenes?

The whole concept consists of two important stages.

  • State machine creation:
    • Each API operation has a separate bundle where Schemathesis put all responses received from that operation;

    • All links represent transitions of the state machine. Each one has a pre-condition - there should already be a response with the proper status code;

    • If an operation has no links, then Schemathesis creates a transition without a pre-condition and generates random data as input.

  • Running scenarios:
    • Each scenario step accepts a freshly generated random test case and randomly chosen data from the dependent operation. This data might be missing if there are no links to the current operation;

    • If there is data, then the generated case is updated according to the defined link rules;

    • The resulting test case is sent to the current operation then its response is validated and stored for future use.

As a result, Schemathesis can run arbitrary API call sequences and combine data generation with reusing responses.

How to customize tests

If you want to change a single scenario’s behavior, you need to extend the state machine. Each scenario gets a freshly created state machine instance that runs a sequence of steps.

class schemathesis.stateful.state_machine.APIStateMachine[source]

The base class for state machines generated from API schemas.

Exposes additional extension points in the testing process.

The following methods are executed only once per test scenario.

setup() None[source]

Hook method that runs unconditionally in the beginning of each test scenario.

Does nothing by default.

teardown() None[source]

Called after a run has finished executing to clean up any necessary state.

Does nothing by default.

These methods might be called multiple times per test scenario.

before_call(case: Case) None[source]

Hook method for modifying the case data before making a request.

Parameters:

case (Case) – Generated test case data that should be sent in an API call to the tested API operation.

Use it if you want to inject static data, for example, a query parameter that should always be used in API calls:

class APIWorkflow(schema.as_state_machine()):
    def before_call(self, case):
        case.query = case.query or {}
        case.query["test"] = "true"

You can also modify data only for some operations:

class APIWorkflow(schema.as_state_machine()):
    def before_call(self, case):
        if case.method == "PUT" and case.path == "/items":
            case.body["is_fake"] = True
get_call_kwargs(case: Case) dict[str, Any][source]

Create custom keyword arguments that will be passed to the Case.call() method.

Mostly they are proxied to the requests.request() call.

Parameters:

case (Case) – Generated test case data that should be sent in an API call to the tested API operation.

class APIWorkflow(schema.as_state_machine()):
    def get_call_kwargs(self, case):
        return {"verify": False}

The above example disables the server’s TLS certificate verification.

call(case: Case, **kwargs: Any) GenericResponse[source]

Make a request to the API.

Parameters:
  • case (Case) – Generated test case data that should be sent in an API call to the tested API operation.

  • kwargs – Keyword arguments that will be passed to the appropriate case.call_* method.

Returns:

Response from the application under test.

Note that WSGI/ASGI applications are detected automatically in this method. Depending on the result of this detection the state machine will call the call method.

Usually, you don’t need to override this method unless you are building a different state machine on top of this one and want to customize the transport layer itself.

after_call(response: GenericResponse, case: Case) None[source]

Hook method for additional actions with case or response instances.

Parameters:
  • response – Response from the application under test.

  • case (Case) – Generated test case data that should be sent in an API call to the tested API operation.

For example, you can log all response statuses by using this hook:

import logging

logger = logging.getLogger(__file__)
logger.setLevel(logging.INFO)


class APIWorkflow(schema.as_state_machine()):
    def after_call(self, response, case):
        logger.info(
            "%s %s -> %d",
            case.method,
            case.path,
            response.status_code,
        )


# POST /users/ -> 201
# GET /users/{user_id} -> 200
# PATCH /users/{user_id} -> 200
# GET /users/{user_id} -> 200
# PATCH /users/{user_id} -> 500
validate_response(response: GenericResponse, case: Case, additional_checks: tuple[CheckFunction, ...] = ()) None[source]

Validate an API response.

Parameters:
  • response – Response from the application under test.

  • case (Case) – Generated test case data that should be sent in an API call to the tested API operation.

  • additional_checks – A list of checks that will be run together with the default ones.

Raises:

CheckFailed – If any of the supplied checks failed.

If you need to change the default checks or provide custom validation rules, you can do it here.

def my_check(response, case):
    ...  # some assertions


class APIWorkflow(schema.as_state_machine()):
    def validate_response(self, response, case):
        case.validate_response(response, checks=(my_check,))

The state machine from the example above will execute only the my_check check instead of all available checks.

Each check function should accept response as the first argument and case as the second one and raise AssertionError if the check fails.

Note that it is preferred to pass check functions as an argument to case.validate_response. In this case, all checks will be executed, and you’ll receive a grouped exception that contains results from all provided checks rather than only the first encountered exception.

If you load your schema lazily, you can extend the state machine inside the pytest fixture:

import pytest


@pytest.fixture
def state_machine():
    schema = schemathesis.from_uri("https://example.schemathesis.io/openapi.json")

    class APIWorkflow(schema.as_state_machine()):
        def setup(self):
            ...  # your scenario setup

    return APIWorkflow

Using pytest fixtures

In case if you need to customize the whole test run, then you can extend the test class:

schema = ...  # Load the API schema here

APIWorkflow = schema.as_state_machine()


class TestAPI(APIWorkflow.TestCase):
    def setUp(self):
        ...  # create a database

    def tearDown(self):
        ...  # drop the database

Or with explicit fixtures:

import pytest

APIWorkflow = schema.as_state_machine()


@pytest.fixture()
def database():
    # create tables & data
    yield
    # drop tables


@pytest.mark.usefixtures("database")
class TestAPI(APIWorkflow.TestCase):
    pass

Note that for pytest or unittest, there is a single test case, which is parametrized on the Hypothesis side. Therefore, it will run only once, not for each test scenario.

Hypothesis configuration

Hypothesis settings can be changed via the settings object on the TestCase class:

from hypothesis import settings

schema = ...  # Load the API schema here

TestCase = schema.as_state_machine().TestCase
TestCase.settings = settings(max_examples=200, stateful_step_count=5)

If you load your schema lazily:

from hypothesis import settings
import pytest


@pytest.fixture
def state_machine():
    ...


def test_statefully(state_machine):
    state_machine.run(
        settings=settings(
            max_examples=200,
            stateful_step_count=5,
        )
    )

With this configuration, there will be twice more test cases with a maximum of five steps in each one.

How to provide initial data for test scenarios?

Often you might want to always make some API calls as a preparation for the test. For example, to create some test data, like users in the system or items in the e-shop stock. It can provide good starting points for scenarios, which is especially useful if your API expects specific input, which is hard to generate randomly.

The best way to do so is by using the Hypothesis’s initialize decorator:

from hypothesis.stateful import initialize

schema = ...  # Load the API schema here

BaseAPIWorkflow = schema.as_state_machine()


class APIWorkflow(BaseAPIWorkflow):
    @initialize(
        target=BaseAPIWorkflow.bundles["/users/"]["POST"],
        case=schema["/users/"]["POST"].as_strategy(),
    )
    def init_user(self, case):
        return self.step(case)

This rule will use the POST /users/ operation strategy and generate random data as input and store the result in a special bundle, where it will be used for dependent API calls. The state machine will run this rule at the beginning of any test scenario.

Important

If you have multiple rules, they will run in arbitrary order, which may not be desired. If you need to run initialization code always at the beginning of each test scenario, use the setup() hook instead.

If you need more control and you’d like to provide the whole payload to your API operation, then you can do it either by modifying the generated case manually or by creating a new one via the APIOperation.make_case() function:

from hypothesis.stateful import initialize

schema = ...  # Load the API schema here

BaseAPIWorkflow = schema.as_state_machine()


class APIWorkflow(BaseAPIWorkflow):
    @initialize(
        target=BaseAPIWorkflow.bundles["/users/"]["POST"],
    )
    def init_user(self):
        case = schema["/users/"]["POST"].make_case(body={"username": "Test"})
        return self.step(case)

Loading multiple entries of the same type is more verbose but still possible:

from hypothesis.stateful import initialize, multiple

schema = ...  # Load the API schema here

BaseAPIWorkflow = schema.as_state_machine()
# These users will be created at the beginning of each scenario
USERS = [
    {"is_admin": True, "username": "Admin"},
    {"is_admin": False, "username": "Customer"},
]


class APIWorkflow(BaseAPIWorkflow):
    @initialize(
        target=BaseAPIWorkflow.bundles["/users/"]["POST"],
    )
    def init_users(self):
        result = []
        # Create each user via the API
        for user in USERS:
            case = schema["/users/"]["POST"].make_case(body=user)
            result.append(self.step(case))
        # Store them in the `POST /users/` bundle
        return multiple(*result)

Examples

Here are more verbose examples of how you can adapt Schemathesis’s stateful testing to some typical workflows.

API authorization

Login to an app and use its API token with each call:

import requests


class APIWorkflow(schema.as_state_machine()):
    headers: dict

    def setup(self):
        # Make a login request
        response = requests.post(
            "http://0.0.0.0/api/login", json={"login": "test", "password": "password"}
        )
        # Parse the response and store the token in headers
        token = response.json()["auth_token"]
        self.headers = {"Authorization": f"Bearer {token}"}

    def get_call_kwargs(self, case):
        # Use stored headers
        return {"headers": self.headers}

Note that this example uses the setup hook. A similar hook could be implemented with the initialize decorator, but there is a caveat with that.

You can have multiple initialization rules by using the initialize decorator, and they will be called in an arbitrary order. In this example, such behavior may not be desired since the login request should run first, and then all following requests will use the received token. If we’d use initialize to login with some additional initialize rules that depend on the API token, it won’t work because of random execution order. The setup method fits better here since it always is executed when the state machine starts.

Conditional validation

Run different checks, depending on the result of the previous call:

def check_condition(response, case):
    if case.source is not None:
        # Run this check only for `GET /items/{id}`
        if case.method == "GET" and case.path == "/items/{id}":
            value = response.json()
            if case.source.response.status_code == 201:
                assert value in ("IN_PROGRESS", "COMPLETE")
            if case.source.response.status_code == 400:
                assert value == "REJECTED"


class APIWorkflow(schema.as_state_machine()):
    def validate_response(self, response, case):
        # Run all default checks together with the new one
        super().validate_response(response, case, additional_checks=(check_condition,))

Reproducing failures

When Schemathesis finds an erroneous API call sequence, it will provide executable Python code that reproduces the error. It might look like this:

state = APIWorkflow()
v1 = state.step(
    case=state.schema["/users/"]["POST"].make_case(body={"username": "000"}),
    previous=None,
)
state.step(
    case=state.schema["/users/{user_id}"]["PATCH"].make_case(
        path_parameters={"user_id": 0},
        query={"common": 0},
        body={"username": ""},
    ),
    previous=(
        v1,
        schema["/users/"]["POST"].links["201"]["UpdateUserById"],
    ),
)
state.teardown()

The APIWorkflow class in the example is your state machine class - change it accordingly if your state machine class has a different name, or change it to state = schema.as_state_machine()(). Besides the class naming, this code is supposed to run without changes.

Corner cases

Sometimes the API under test may behave in the way, so errors are not easily reproducible. For example, if there is a mistake with caching that occurs only on the first call, and your test app is not entirely restarted on each run, then Schemathesis will report that the error is flaky and can’t be reliably reproduced.

If your stateful tests report an Unsatisfiable error, it means that Schemathesis can’t do any API calls to satisfy rules on your state machine. In most cases, it comes from custom pre-conditions and the underlying API schema, but if you got this error, I suggest reporting it so we can confirm the root cause.

Command Line Interface

By default, stateful testing is enabled. You can disable it via the --stateful=none CLI option. Please, note that we plan to implement more different algorithms for stateful testing in the future.

st run http://0.0.0.0/schema.yaml

...

POST /api/users/ .                                     [ 33%]
    -> GET /api/users/{user_id} .                      [ 50%]
        -> PATCH /api/users/{user_id} .                [ 60%]
    -> PATCH /api/users/{user_id} .                    [ 66%]
GET /api/users/{user_id} .                             [ 83%]
    -> PATCH /api/users/{user_id} .                    [ 85%]
PATCH /api/users/{user_id} .                           [100%]

...

Each additional test will be indented and prefixed with -> in the CLI output. You can specify recursive links if you want. The default recursion depth limit is 5 and can be changed with the --stateful-recursion-limit=<N> CLI option.

Schemathesis’s CLI now supports the new approach to stateful testing based on state machines. It is available as an experimental feature and can be enabled using the --experimental=stateful-test-runner CLI option or by setting the SCHEMATHESIS_EXPERIMENTAL_STATEFUL_TEST_RUNNER=true environment variable. For more information, refer to the New Stateful Test Runner section.

Links                                                  2xx    4xx    5xx    Total

POST /api/users/
└── 201
    ├── GET /api/users/{user_id}                       765      0    101      866
    └── PATCH /api/users/{user_id}                     765      0      0      765

GET /api/users/{user_id}
└── 200
    └── PATCH /api/users/{user_id}                     513      0      0      513

The old approach to stateful testing, not based on state machines, is still the default in the CLI. However, we recommend using the new approach as it offers more effective testing. In the future, the new approach will become the default in the CLI, and the old approach will be removed.

Please note that the visual appearance and configuration options for stateful testing in the CLI may differ slightly from the in-code approach. We are continuously working on improving the CLI experience and aligning it with the in-code approach.

Extracting data from headers and query parameters

By default, Schemathesis allows you to extract data from the response body of an API endpoint, based on the provided schema. However, sometimes you might need to extract data from other parts of the API response, such as headers, path or query parameters.

Schemathesis provides an additional feature that allows you to use regular expressions to extract data from the string values of headers and query parameters. This can be particularly useful when the API response includes important information in these locations, and you need to use that data for further processing.

Here’s an example of how to extract the user ID from the Location header of a 201 Created response:

 paths:
   /users:
     post:
       ...
       responses:
         '201':
           ...
           links:
             GetUserByUserId:
               operationId: getUser
               parameters:
                 userId: '$response.header.Location#regex:/users/(.+)'

For example, if the Location header is /users/42, the userId parameter will be set to 42. The regular expression should be a valid Python regular expression and should contain a single capturing group.

If the regular expression does not match the value, the parameter will be set to empty.

State Machine Test Runner

If you need to run stateful tests without using pytest, you can use the Schemathesis state machine test runner. Similarly to the default Schemathesis test runner, it allows for running state machines and reacting to events from them.

A test run is the entire process of running the state machine, which consists of multiple test suites. Each test suite contains multiple test scenarios and is executed until no new failures are found. The test run continues until a test suite is executed without finding any new failures.

A test run is the entire process of running the state machine. It starts by generating a new test suite and executing it. If the test suite finishes with any new failures, Schemathesis generates another test suite and runs it. This process continues until a generated test suite finishes successfully without finding any new failures

Each test suite contains multiple test scenarios. Each test scenario is a sequence of steps generated by the state machine, where each step typically represents an API call.

Important

Each test scenario may include multiple API calls but is considered as a single test case by Hypothesis. Therefore, the max_examples setting controls the number of test scenarios, not the number of API calls.

The available events are:

  • RunStarted - triggered before the entire test run starts.

  • RunFinished - triggered after the entire test run finishes.

  • SuiteStarted - triggered before each test suite starts.

  • SuiteFinished - triggered after each test suite finishes, providing information about the failed checks.

  • ScenarioStarted - triggered before each test scenario starts.

  • ScenarioFinished - triggered after each test scenario finishes.

  • StepStarted - triggered before each step in a test scenario is executed.

  • StepFinished - triggered after each step in a test scenario is executed.

  • Interrupted - triggered when the test run is interrupted by the user (e.g., via Ctrl+C).

  • Errored - triggered when an unexpected error occurs during the test run.

These events are primarily used for monitoring and reporting purposes, allowing you to track the progress of the state machine test runner. They provide information about the current state of the test run but do not offer any control over the test execution.

To collect the events you may use a “sink” that consumes the events and collects statistics about the test run.

import schemathesis
from schemathesis.stateful import events

schema = schemathesis.from_uri("http://127.0.0.1:8080/swagger.json")
state_machine = schema.as_state_machine()
sink = state_machine.sink()

runner = state_machine.runner()
for event in runner.execute():
    sink.consume(event)
    if isinstance(event, events.RunFinished):
        print("Test run finished")
print("Duration:", sink.duration)
for failure in sink.failures:
    print(failure)