Advanced PHPUnit shenanigans

In my last post, we talked about PHPUnit's data providers, and how to leverage them to write more maintainable tests. Today, I want to talk about two more test-writing techniques that I've found to be very helpful.

Testdox

First, a brief aside:

After my last post, PHPUnit Maintainer Sebastian Bergmann responded on Twitter with another advantage of data providers is if you're using the --testdox switch. That switch then reports an English-language description of the test in the test output, rather than a simple .. The output message for a test is controlled by the @testdox annotation. And that annotation accepts the variable names of the test method constructor as placeholders. That allows you to do stuff like this:

/**
 * @dataProvider pythagorasProvider
 * @testdox The square of $a plus the square of $b is the square of $c.
 */
public function validate_triangles(float $a, float $b, float $c): void
{
    $this->assertSame($c * $c, $a * $a + $b * $b);
}

(OK, it's a stupid test, but that's not the point.) Now, if you test a huge set of data sets, each one will get a textdox output that includes the specific data for each run. Neat! I may have to use this feature in the future.

Dynamic assertions

Last time around, we talked about using data providers with the yield statement to make testing datasets more maintainable. One limitation of data providers, though, is that the test logic is the same for every data set. Often that's fine, but other times you want to verify different things after each test. Verify that a value returned true or false, verify that it contains a specific set of objects or values in a collection, etc. That could be quite complicated to capture in an abstract, generic fashion.

Fortunately, we don't have to. You can pass any value as an argument from a data provider to a test method, including a callable. And that callable can assert things.

Let's start with the second example we were working with before. Its test code looked like this:

class ComplexObjectTest extends TestCase
{
    /**
     * @test
     * @dataProvider configurationProvider()
     */
    public function verifyProcessing(
        string $yamlConfig,
        array $setupObjects,
        object $expected): void
    {
        $config = Yaml::parse($yamlConfig) ?? [];
        
        $subject = new ClassBeingTested($config);
        
        $subject->populate($setupObjects);
        
        $result = $subject->process();
        
        self::assertEquals($expected, $result);
    }
    
    public function configurationProvider(): iterable
    {
        $defaultObjects = [new A(), new B()];
    
        $defaultConfig = '...';
    
        yield 'default behavior' => [
            'yamlConfig' => $defaultConfig,
            'setupObjects' => $defaultObjects,
            'expected' => new C('args', 'here'),
        ];
        
        // ...
    }
}

All we can do to verify each test case is to do an object equality comparison. That may be OK, but often it's very much not, especially if we're returning or manipulating some more complex object tree. For example, the code this example is based on is testing a Compiler Pass for the Symfony Dependency Injection Container (which TYPO3 uses); in that case, we need to use the configuration to create several additional services, the names of which are predictable based on the configuration but not easy to hard code.

Instead, what we can do is make one of the arguments to the test method a callable that takes the output to analyze. That can be whatever we want, based on the context of our tests.

To continue our example, let's try this:

class ComplexObjectTest extends TestCase
{
    /**
     * @test
     * @dataProvider configurationProvider()
     */
    public function verifyProcessing(
        string $yamlConfig,
        array $setupObjects,
        callable $test): void
    {
        $config = Yaml::parse($yamlConfig) ?? [];
        
        $subject = new ClassBeingTested($config);
        
        $subject->populate($setupObjects);
        
        $result = $subject->process();
        
        $test($result);
    }
    
    public function configurationProvider(): iterable
    {
        $defaultObjects = [new A(), new B()];
    
        $defaultConfig = '...';
    
        yield 'default behavior' => [
            'yamlConfig' => $defaultConfig,
            'setupObjects' => $defaultObjects,
            'test' => function(C $result) {
                self::assertEquals('value', $result->prop1);            
                self::assertEquals('other', $result->prop2);            
            },
        ];
        
        // ...
    }
}

There's three things this approach buys us.

One, we can type the callable argument (C in this case) to get an automatic instanceof assertion. If for whatever reason the wrong type of object was returned, we'll get a TypeError thrown.
Second, because we've now asserted, implicitly, the type of $result, our IDE can auto-complete properties and methods of the object for us.
Third, we can run any tests we want on the result in a way that is specific to this particular test case. We can assert different properties, we can call methods, we can dig into deeper objects in a tree of value objects, etc.

Of course, nothing precludes us from doing both. If there are tests that are always run against the result, we can put those in the main test method and then also pass in a callable for additional test assertions.

I find I do this on about half of my data provider methods these days. Maybe slightly more. It's such a simple but powerful pattern that makes tests much more robust.

Custom assertions

This second trick is one that seems so obvious I don't know why I only started doing so recently, and why I haven't seen it done more often. (Or maybe people are already doing it, and I'm just late to the party; that's also possible.)

PHPUnit comes with a bunch of assertion methods out of the box. You're already familiar with many of them, although self::assertEquals(), self::assertTrue(), and assert::arrayHasKey() probably make up the lion's share of what gets used. However, there's nothing really special about those methods. If you look inside them, they are mostly just sub-calling to assertThat() with an object that defines the assertion. As long as a function eventually ends up calling assertThat(), it's a useful assertion method.

So let's make our own.

To expand our example, suppose that we expect $result to be a container object that contains other objects, but not just a single linear list but multiple, for different types of object, and that those objects can reference each other. That is, a successful run may produce something like this:

$result = new PenpalConfiguration();
$larry = new Person('Larry');
$typo3 = new Company('TYPO3 GmbH');
$larry->setEmployer($typo3);
$result->addReceiver($larry);
$result->addReceiver(new Person('Benni'));
$result->addReceiver($typo3);
$result->addSender($larry);
$result->addSender(new Person('Oliver'));
$result->addSender(new Company('Platform.sh'));

We want to ensure that a given YAML configuration turns into this object graph. How do we test it?

Let's start with the naive way:

$receivers = $result->getReceivers();
$names = [];
foreach ($receivers as $receiver) {
    $names[] = $receiver->getName();
}
sort($names);
self::assertEquals(['Benni', 'Larry', 'TYPO3 GmbH'], $names);

senders = $result->getSenders();
$names = [];
foreach ($senders as $sender) {
    $names[] = $sender->getName();
}
sort($names);
self::assertEquals(['Larry', 'Oliver', TYPO3 GmbH'], $names);

foreach ($result->getReceivers() as $receiver) {
    if ($receiver->getName() === 'Larry') {
        $employer = $receiver->getEmployer();
        self::assertEquals('TYPO3 GmbH', $employer->getName());
    }
}

That works, but... eew. That's extremely brittle, redundant, and we're going to have to repeat it for every test data set we write. Surely we can do better.

Let's start by noting that the receiver check and sender check have identical behavior. (They also implicitly rely on Person and Company both having a getName() method. Presumably they share an interface? If not, this is a good indication that they should!) That should trigger our refactoring Spider Sense that we can pull that logic out to a separate method. What kind of method? We could do anything, but let's mimic the pattern of PHPUnit's own assertions and make it a static method.

protected static function assertNamesMatch(array $expected, array $objects): void
{
    $names = array_map(static fn (Nameable $o): string => $o->getName(), $objects);
    sort($names);
    sort($expected);

    self::assertEquals($expected, $names);
}

And now we can call that from our test code:

self::assertNamesMatch($result->getReceivers(), ['Larry', 'Benni', 'TYPO3 GmbH']);
self::assertNamesMatch($result->getSenders(), ['Larry', 'Oliver', 'TYPO3 GmbH']);

foreach ($result->getReceivers() as $receiver) {
    if ($receiver->getName() === 'Larry') {
        $employer = $receiver->getEmployer();
        self::assertEquals('TYPO3 GmbH', $employer->getName());
    }
}

That's a lot better! It's shorter, easier to read, allows us to throw in sorting for both arrays to make life even easier (assuming for the moment that order doesn't matter in our use case), and we now have a nice utility we can reuse for other name matching. Personally, I like to go even one step further and make wrappers for each case, although I'm sure some will consider that excessive:

protected static function assertHasReceivers(PenpalConfiguration $result, array $names): void
{
    self::assertNamesMatch($result->getReceivers(), $names);
}

protected static function assertHasSenders(PenpalConfiguration $result, array $names): void
{
    self::assertNamesMatch($result->getSenders(), $names);
}

self::assertHasReceivers($result, ['Larry', 'Benni', 'TYPO3 GmbH']);
self::assertHasSenders($result, ['Larry', 'Oliver', 'TYPO3 GmbH']);

I find that reads a bit more naturally, and allows the test cases in each test data set to be more self-descriptive. I want to assert that the result has certain receivers, and has certain senders. That's exactly what the code says it's doing.

We can do the same to the other block, too. Assuming for the moment that PenpalConfiguration has no direct-lookup methods (maybe it should?), we can pull that logic out separately, too.

protected static function assertUserWorksFor(PenpalConfiguration $config, string $person, string $companyName): void
{
    foreach ([...$result->getReceivers(), ...$result->getSenders()] as $user) {
        if ($user->getName() === $person) {
            self::assertEquals('$companyName, $user->getEmployer()->getName());
        }
    }
}

While we're at it, we can improve the test to search through all senders and receivers to look for the person in question. Our test list for the specific test data set is now this highly self-descriptive block:

self::assertHasReceivers($result, ['Larry', 'Benni', 'TYPO3 GmbH']);
self::assertHasSenders($result, ['Larry', 'Oliver', 'TYPO3 GmbH']);
self::assertUserWorksFor($result, 'Larry', 'TYPO3 GmbH');

Once you get used to this pattern, it makes other tests easier to consider, too. For example, suppose we want to ensure that any given Person or Company is instantiated only once. If someone is both a sender and receiver, or a company is someone's employer and also a sender/receiver, it should be the same object used in both cases. That's likely more work than we want to put into each test case, even if it's important functionality. But if we make it a custom assertion function:

protected static function assertUniqueParticipants(PeopleConfiguration $config): void
{
    $items = [...$config->getReceivers(), ...$config->getSenders()];
    
    $people = array_filter($items, static fn (Nameable $o): bool => $o instanceof Person);
    $employers = array_map($people, static fn (Person $p): ?Company => $p->getEmployer());
    $employers = array_filter($employers);
    
    $seen = [];
    foreach ([...$items, ...$employers] as $item) {
        if (isset($seen[$item->getName()]) {
            self::assertSame($seen[$item->getName()], $item, 'The configuration should not contain duplicate entries.');
        } else {
            $seen[$item->getName()] = $item;
        }
    }
}

That would be highly annoying to repeat in each test, but now it's a single line we can add to every test case and get that correctness check on every data set.

Putting it all together, our test class now looks like this:

class ComplexObjectTest extends TestCase
{

    protected static function assertNamesMatch(array $expected, array $objects): void
    {
        $names = array_map(static fn (Nameable $o): string => $o->getName(), $objects);
        sort($names);
        sort($expected);
    
        self::assertEquals($expected, $names);
    }
    
    protected static function assertHasReceivers(PenpalConfiguration $result, array $names): void
    {
        self::assertNamesMatch($result->getReceivers(), $names);
    }
    
    protected static function assertHasSenders(PenpalConfiguration $result, array $names): void
    {
        self::assertNamesMatch($result->getSenders(), $names);
    }
    
    protected static function assertUserWorksFor(PenpalConfiguration $config, string $person, string $companyName): void
    {
        foreach ([...$result->getReceivers(), ...$result->getSenders()] as $user) {
            if ($user->getName() === $person) {
                self::assertEquals('TYPO3 GmbH', $user->getEmployer()->getName());
            }
        }
    }
    
    protected static function assertUniqueParticipants(PeopleConfiguration $config): void
    {
        $items = [...$config->getReceivers(), ...$config->getSenders()];
        
        $people = array_filter($items, static fn (Nameable $o): bool => $o instanceof Person);
        $employers = array_map($people, static fn (Person $p): ?Company => $p->getEmployer());
        $employers = array_filter($employers);
        
        $seen = [];
        foreach ([...$items, ...$employers] as $item) {
            if (isset($seen[$item->getName()]) {
                self::assertSame($seen[$item->getName()], $item, 'The configuration should not contain duplicate entries.');
            } else {
                $lookup[$item->getName()] = $item;
            }
        }
    }

    /**
     * @test
     * @dataProvider configurationProvider()
     */
    public function verifyProcessing(
        string $yamlConfig,
        array $setupObjects,
        callable $test): void
    {
        $config = Yaml::parse($yamlConfig) ?? [];
        
        $subject = new ClassBeingTested($config);
        
        $subject->populate($setupObjects);
        
        $result = $subject->process();
        
        $test($result);
    }
    
    public function configurationProvider(): iterable
    {
        $defaultObjects = [new A(), new B()];
    
        $defaultConfig = '...';
    
        yield 'default behavior' => [
            'yamlConfig' => $defaultConfig,
            'setupObjects' => $defaultObjects,
            'test' => function(PenpalConfiguration $result) {
                self::assertUniqueParticipants($result);
                self::assertHasReceivers($result, ['Larry', 'Benni', 'TYPO3 GmbH']);
                self::assertHasSenders($result, ['Larry', 'Oliver', 'TYPO3 GmbH']);
                self::assertUserWorksFor($result, 'Larry', 'TYPO3 GmbH');          
            },
        ];
        
        // ...
    }
}

Is there some setup involved? Certainly. But the result is that adding more test cases to our data set is as simple as can be. Create a new set of preconditions and then yield them, along with the domain-context-aware tests that ensure everything went smoothly. If someone finds a bug, toss a new config file (precondition) at it and drop in the tests that describe what you expect.

Conclusion

In this mini-series, we've looked at ways to make writing PHPUnit tests easier and more ergonomic. To recap:

Use data providers. They rock.
Name your data sets.
Name your arguments in your data sets.
Use yield in your data providers to make them easier to maintain.
Pass extra assertions and post-condition checks to your test method from your provider using a closure.
Writing custom assertions is straightforward, simple, and makes for far more self-documenting and maintainable tests.

Happy testing!