Metamorphic Testing

Michael Seifert, 2022-01-03, Updated 2022-01-03

Metamorphic tests are a special case of property-based tests. They allow developers to describe richer properties, such as symmetry or idempotency. This article explains what metamorphic tests are and how they differ from other property-based tests. The post also shows how metamorphic tests are relevant for testing web APIs. The provided code examples use pytest, Hypothesis, and FastAPI.

A cocoon on a grass straw — Photo by Walter del Aguila on Unsplash

Traditional tests are example-based. The programmer comes up with an arbitrary input and a corresponding output to the system under test and creates as many test cases as they see fit. Property-based testing on the other hand asserts that logical properties of the system hold. It's generally impossible to exhaustively test those properties. Therefore, we rely on property-based testing frameworks to intelligently select samples of the input space. If we run the framework long enough or often enough, we have tested enough samples to be confident that the logical property holds. The following article is about metamorphic tests, a special class of property-based tests. If want to learn more about property-based testing in general, you can find a more extensive introduction in a previous article.

Let's assume we've come up with an optimized algorithm for computing the sine function. We implemented the function sine(x: float) -> float and want to ensure its correctness using property-based tests. One of the sine function's properties is that its output values are in the range -1 to +1. We can formulate a corresponding test case:

import hypothesis.strategies as st
from hypothesis import given

@given(x=st.floats(allow_infinity=False, allow_nan=False))
def test_sine_result_within_minus_one_and_plus_one(x: float):
    assert -1 <= sine(x) <= 1

The test simply chooses a floating point number that's not infinity or NaN and checks that the result of sine is between -1 and +1. We've now made sure that our sine implementation does not violate the upper or lower bounds.

A graph of a sine function in a coordinate system. The upper and lower bounds are marked by horizontal lines.

But what about the x-axis? Is it not possible that our implementation of the sine function is stretched out so that one period of the sine graph ends much later on the x-axis like in the following graph?

A graph of a sine function in a coordinate system. The value 2π is marked on the x-axis, but the sine wave stretches beyond that, showing that the graph is somehow stretched.

We know that the sine function is periodic to 2π, so let's write a corresponding test to ensure our implementation upholds the property.

from math import pi

@given(x=st.floats(allow_infinity=False, allow_nan=False))
def test_sine_is_periodic(x: float):
    assert sine(x) == approx(sine(x + 2*pi))

The test checks that for any floating point number the result of sine(x) and sine(x + 2*pi) are the same. Note that this property puts two calls of the sine function in relation to each other, whereas the first test case was only concerned with individual calls to the sine function.

At this point our test suite is comprised of test cases for the upper and lower bounds of our sine implementation and its periodicity. We know that the sine function is strictly monotonic in every "quarter" of the wave. But who guarantees that our implementation of the sine function doesn't return declining values in between?

A graph of a sine function in a coordinate system. The first ascent of the graph has been replaced with a graph resembling a third degree polynomial. The graph is no longer monotonic in this area.

Let's add another test to address the issue.

from math import pi

@given(
    x=st.floats(
        allow_infinity=False,
        allow_nan=False,
        min_value=0.0,
        max_value=pi/2.0 - 0.01,
    )
)
def test_sine_is_monotonic_from_zero_to_half_pi(x: float):
    assert sine(x) < sine(x + 0.01)

The test asserts that the result of sine is larger for larger input values when looking at numbers between 0 and 0.5π. We could come up with similar monotonicity tests for other sections of the function graph, of course.

Notably, the test property compares multiple invocations of the sine function with each other. A property that describes how two or more different inputs to the same function relate to their respective outputs is called a metamorphic relation. ^[1] All metamorphic relations are properties in the sense of property-based testing, but not all properties are metamorphic relations. A property-based test using a metamorphic relation is a metamorphic test.

The first test describing the upper and lower bounds of the sine function is not a metamorphic test, because it only concerns itself with a single input and output of the sine function. The second and third test cases are metamorphic tests, because they make assertions on how the sine function behaves when the inputs change. Metamorphic tests allow us to describe richer properties of our software systems in the same way that differentiation opens up additional possibilities to describe a mathematical function.

Metamorphic testing of Web APIs

"That's all well and good", I hear you say. "I don't usually implement sine functions or mathematical algorithms. I'm a web developer. How can I use this in my every day work?" No worries, I got you covered. In fact, Web APIs have lots of interesting properties to test.

Consider a webshop with a Buy now button that concludes a purchase. A press of the button calls an API to a backend system that maintains a list of purchases. The API supports three operations: adding a purchase, listing all purchases, and deleting a purchase. Its implementation could look as follows:

from fastapi import FastApi
from pydantic import BaseModel

app = FastApi()

class AnonymousBuyer(BaseModel):
    name: str
    address: str
    credit_card_number: str

class Purchase(BaseModel):
    id: str
    buyer: AnonymousBuyer
    product_id: str
    price: int

class PurchaseConfirmation(BaseModel):
    product_id: str

purchases: list[Purchase] = []

@app.post("/purchase")
async def purchase(purchase: Purchase)-> PurchaseConfirmation:
    await debit_credit_card(
        card_number=purchase.buyer.credit_card_number,
        amount=purchase.price,
    )
    purchases.append(purchase)
    return PurchaseConfirmation(
        product_id=purchase.product_id
    )

@app.get("/purchase")
async def list_purchases() -> list[Purchase]:
    return purchases

@app.delete("/purchase")
async def cancel_purchase(purchase_id: str) -> list[Purchase]:
    purchase = next(
        (p for p in purchases if p.id == purchase_id)
    )
    await refund_credit_card(
        card_number=purchase.buyer.credit_card_number,
        amount=purchase.price,
    )
    purchases.remove(purchase)
    return purchases

When a new purchase is made, the purchase triggers a financial transaction. Therefore, it's very important the purchase is triggered exactly once. The user must not be able to purchase the item twice by accident. We want to test that two calls to POST /purchase with the exact same arguments have the same effect as a single call. In other words, we want to make sure the purchase endpoint is idempotent.

http = TestClient(app)

@given(purchase=purchases())
def test_purchase_is_idempotent(purchase: Purchase):
    http.post("/purchase", json=purchase.dict())
    after_first_purchase = http.get("/purchase")

    http.post("/purchase", json=purchase.dict())

    after_second_purchase = http.get("/purchase")
    assert after_second_purchase == after_first_purchase

Our test triggers a new purchase and retrieves the recorded purchases. It triggers the same purchase again and asserts that the list of purchases has not changed. This test uncovers that purchases are recorded twice, because the author of the API did not make POST /purchase idempotent.

Another metamorphic test could assert that purchasing and cancelling a purchase is a no-op:

@given(purchase=purchases())
def test_post_and_delete_do_not_alter_purchases(
    purchase: Purchase
):
    status_before_purchase = http.get("/purchase")

    http.post("/purchase", json=purchase.dict())
    http.delete("/purchase", json=purchase.dict())

    status_after_purchase = http.get("/purchase")
    assert status_after_purchase == status_before_purchase

We retrieve the initial list of purchases and trigger a POST immediately followed by a DELETE request. We retrieve the list of purchases again and assert that nothing has changed. This makes sure that the API correctly manages a list of purchases. We can easily come up with assertions about the financial transactions triggered in the background as well.

As you can see, metamorphic testing is not just a sophisticated research topic. It has real applications in the real world. Like other property-based tests, they can be applied in situations where the exact result of a program is not known, thus avoiding the test oracle problem (opens new window). Metamorphic tests also result in more concise test code. Fewer lines of code, in turn, mean less maintenance effort. In summary, I encourage you to employ property-based testing in general and metamorphic testing in specific in your current work.

If you want to learn more about property-based testing, check out my book Bulletproof Python – Property-Based Testing with Hypothesis (opens new window).

Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. 1, 1 (January 2018), 26 pages. (opens new window) ↩︎