Property hooks in Practice

in PHP24 days ago

Two of the biggest features in the upcoming PHP 8.4 are property hooks and asymmetric visibility (or "aviz" for short). Ilija Tovilo and I worked on them over the course of two years, and they're finally almost here!

OK, so now what?

Rather than just reiterate what's in their respective RFCs (there are many blog posts that do that already), today I want to walk through a real-world application I'm working on as a side project, where I just converted a portion of it to use hooks and aviz. Hopefully that will give a better understanding of the practical benefits of these tools, and where there may be a rough edge or two still left.

One of the primary use cases for hooks is to not use them: They're there in case you need them, so you don't need to make boilerplate getter/setter methods "just in case." However, that's not their only use. They're also really nice when combined with interface properties, and delegation. Let's have a look.

The use case

The project I'm working on includes a component that represents a file system, where each Folder contains one or more Page objects. Pages are keyed by the file base name, and may be composed of one or more PageFiles, which correspond to a physical file on disk.

So, for instance, form.latte and form.php would both be represented by PageFiles, and grouped together into an AggregatePage, form. (Do those file names suggest what I'm doing...?) However, if there's only a single news.html file, then it would be just a PageFile on its own. AggregatePage and PageFile both implement the same Page interface, which includes various metadata derived from the file (title, summary, tags, last-modified time, etc.)

Additionally, a Folder can be represented by a page inside it named index. That means a Folder also implements Page. As you can imagine, this makes the Page interface rather important. But it's actually two interfaces, because there's also PageInformation, which has the bare metadata and a child interface, Page, which adds logic around the file multiplexing. The data about a folder is also lazy-loaded and cached for performance, which means we need to handle that lazy-loading transparently.

(Why am I doing something so weird? It makes routing easier. Stay tuned for more details.)

The 8.3 version

This is exactly the situation where interfaces shine. However, in PHP 8.3, interfaces are limited to methods. That means in PHP 8.3, the various interfaces look like this:

interface Hidable
{
   public function hidden(): bool;
}

interface PageInformation extends Hidable
{
   public function title(): string;
   public function summary(): string;
   public function tags(): array;
   public function slug(): ?string;

   public function hasAnyTag(string ...$tags): bool;
   public function hasAllTags(string ...$tags): bool;
}

interface Page extends PageInformation
{
   public function routable(): bool;
   public function path(): string;

   /**
    * @return array<Page>
    */
   public function variants(): array;
   public function variant(string $ext): ?Page;
   public function getTrailingPath(string $fullPath): array;
}

Several of those are quite reasonable. However, nearly any of the methods that have no arguments... don't really need to be methods. Conceptually, the "title" of a page is just data about it. It's aspect of the page, not an operation. We're used to capturing that as an operation (method), because that's all PHP let us do historically: Properties are basic, and if you expose them directly you lose a lot of flexibility, as well as safety. You cannot have interesting logic for them, and you cannot prevent someone from setting it externally (unless you make it readonly, which has its own challenges). The tools don't let us do it right.

For example, I have a degenerate case implementation called BasicPageInformation, like so:

readonly class BasicPageInformation implements PageInformation
{
   public function __construct(
       public string $title = '',
       public string $summary = '',
       public array $tags = [],
       public ?string $slug = null,
       public bool $hidden = false,
   ) {}

   public function title(): string
   {
       return $this->title;
   }

   public function summary(): string
   {
       return $this->summary;
   }

   public function tags(): array
   {
       return $this->tags;
   }

   public function slug(): ?string
   {
       return $this->slug;
   }

   public function hidden(): bool
   {
       return $this->hidden;
   }

   public function hasAnyTag(string ...$tags): bool { ... }

   public function hasAllTags(string ...$tags): bool { ... }
}

That's... a lot of code. 5 methods that do nothing but expose a primitive property. Of course, I also have the properties public, as the class is readonly. But I cannot rely on that because the interface cannot guarantee the presence of the properties, only the methods. So even though I could just have public properties in this case, they're still not reliable.

Enter Interface Properties

A part of the property hooks RFC, interface properties really deserve to be billed as their own third feature. They integrate well with hooks and aviz, and make those better, but they're a standalone feature.

The change in this case is pretty simple:

interface Hidable
{
   public bool $hidden { get; }
}

interface PageInformation extends Hidable
{
   public string $title { get; }
   public string $summary { get; }
   public array $tags { get; }
   public ?string $slug { get; }
   public bool $hidden { get; }

   public function hasAnyTag(string ...$tags): bool;
   public function hasAllTags(string ...$tags): bool;
}

interface Page extends PageInformation
{
   public bool $routable { get; }
   public string $path { get; }

   public function variants(): array;
   public function variant(string $ext): ?Page;
   public function getTrailingPath(string $fullPath): array;
}

Now, instead of read-only methods to implement, the interfaces require readable properties. In this case we don't need to set anything, so the properties are marked to only require a get operation. Whether we satisfy that requirement with a public property, a public readonly property, a public private(set) property, or a virtual property with just a get hook is entirely up to us. In fact, we'll do all of the above.

Right off the bat, that makes BasicPageInformation shorter and easier:

readonly class BasicPageInformation implements PageInformation
{
   public function __construct(
       public string $title = '',
       public string $summary = '',
       public array $tags = [],
       public ?string $slug = null,
       public bool $hidden = false,
   ) {}

   public function hasAnyTag(string ...$tags): bool { ... }

   public function hasAllTags(string ...$tags): bool { ... }
}

In this case, simple readonly properties is all we need. We can conform to the interface with about 10 fewer lines of boring, boilerplate code. Neat.

Where it gets more interesting is the other Page implementations.

The PageFile

In 8.3, PageFile looks like this (stripping out irrelevant bits for now to save space):

readonly class PageFile implements Page
{
   public function __construct(
       public string $physicalPath,
       public string $logicalPath,
       public string $ext,
       public int $mtime,
       public PageInformation $info,
   ) {}

   public function title(): string
   {
       return $this->info->title()
           ?: ucfirst(pathinfo($this->logicalPath, PATHINFO_FILENAME));
   }

   public function summary(): string
   {
       return $this->info->summary();
   }
  
   // tags(), slug(), and hidden() all look exactly the same.

   public function path(): string
   {
       return $this->logicalPath;
   }

   public function routable(): true
   {
       return true;
   }

   // ...
}

The PageFile delegates to an inner PageInformation object, and handles some defaults and extra logic. It works, but as you'll note, it's so verbose I didn't want to ask you to read such a long code sample.

In 8.4, we can remove those methods and instead use properties.

class PageFile implements Page
{
   public private(set) string $title {
       get => $this->title ??=
           $this->info->title
           ?: ucfirst(pathinfo($this->logicalPath, PATHINFO_FILENAME));
   }
   public string $summary { get => $this->info->summary; }
   public array $tags { get => $this->info->tags; }
   public string $slug { get => $this->info->slug ?? ''; }
   public bool $hidden { get => $this->info->hidden; }

   public private(set) bool $routable = true;
   public string $path { get => $this->logicalPath; }

   public function __construct(
       public readonly string $physicalPath,
       public readonly string $logicalPath,
       public readonly string $ext,
       public readonly int $mtime,
       public readonly PageInformation $info,
   ) {}

   // The boring methods omitted.
}

Much more compact, much more readable, much easier to digest. In this case, we're using hooks to create virtual properties, which have no internal storage at all. There is no "slug" slot in the memory of PageFile. Internally to the engine, it still looks and acts like a method. Because most of the properties are virtual, we don't need to bother with the set side, as it will be an engine error to even try. There's two special cases, however.

First, $routable is hard-coded to true. We can do that. Just... not with readonly, which cannot have a default value. We'd have to define it un-initialized and then manually initialize it in the constructor, which is too much work. Now, however, we can set it to public private(set) and give it a default value. In theory the class could still modify that property internally, but it's my class and I know I'm not doing that, so there's nothing to worry about.

Second, $title has some non-trivial default value logic. I don't want to run that multiple times, so it's cached onto the property itself. On subsequent calls, $this->title will have a value, so it will just get returned. That makes $title a "backed property," meaning there is a set operation. But we don't want anyone to set the title externally, so again we make it private(set).

Also note that hooked properties cannot be readonly. That means the class cannot be readonly, and the individual promoted constructor properties need to be marked readonly instead. (We could just as easily have made them private(set). It would have the same effect in this case.)

The Folder

The Folder object is even more interesting. It does a number of things that are off-topic for us here, so I'll hand-wave over them and focus on the property refactoring.

In PHP 8.3, Folder works roughly like this:

class Folder implements Page, PageSet, \IteratorAggregate
{
   public const string IndexPageName = 'index';

   private FolderData $folderData;

   public function __construct(
       public readonly string $physicalPath,
       public readonly string $logicalPath,
       protected readonly FolderParser $parser,
   ) {}

   public function routable(): bool
   {
       return $this->indexPage() !== null;
   }

   public function path(): string
   {
       return str_replace('/index', '', $this->indexPage()?->path() ?? $this->logicalPath);
   }
  
   public function variants(): array
   {
       return $this->indexPage()?->variants() ?? [];
   }

   public function variant(string $ext): ?Page
   {
       return $this->indexPage()?->variant($ext);
   }

   public function title(): string
   {
       return $this->indexPage()?->title()
           ?? ucfirst(pathinfo($this->logicalPath, PATHINFO_FILENAME));
   }

   public function summary(): string
   {
       return $this->indexPage()?->summary() ?? '';
   }
  
   // tags(), slug(), and hidden() omitted as they're just like summary().

   public function all(): iterable
   {
       return $this->folderData()->all();
   }

   public function indexPage(): ?Page
   {
       return $this->folderData()->indexPage;
   }

   protected function folderData(): FolderData
   {
       return $this->folderData ??= $this->parser->loadFolder($this);
   }

   // Various other methods omitted.
}

(Although not relevant here, PageSet is an interface for a collection of pages. It extends Countable and Traversable, and adds a few other operations like filter() and paginate(). None of its methods are relevant to hooks, though, so we will skip over that.)

That's a lot of code for what is ultimately a very simple design: A folder is given a path that it represents. (Ignore the physical vs logical paths for now, that's also not relevant.) It lazily builds a folderData value that is a collection of Pages the Folder contains. One of those pages may be an index page, in which case the Folder can be treated the same as its index page. If not, there's reasonable defaults.

But that's a lot of dancing around. Let's see if we can simplify it using PHP 8.4.

class Folder implements Page, PageSet, \IteratorAggregate
{
   public const string IndexPageName = 'index';

   protected FolderData $folderData { get => $this->folderData ??= $this->parser->loadFolder($this); }
   public ?Page $indexPage { get => $this->folderData->indexPage; }

   public private(set) string $title {
       get => $this->title ??=
           $this->indexPage?->title
           ?? ucfirst(pathinfo($this->logicalPath, PATHINFO_FILENAME));
       }
   public private(set) string $summary { get => $this->summary ??= $this->indexPage?->summary ?? ''; }
   public private(set) array $tags { get => $this->tags ??= $this->indexPage?->tags ?? []; }
   public private(set) string $slug { get => $this->slug ??= $this->indexPage?->slug ?? ''; }
   public private(set) bool $hidden { get => $this->hidden ??= $this->indexPage?->hidden ?? true; }

   public bool $routable { get => $this->indexPage !== null; }
   public private(set) string $path { get => $this->path ??= str_replace('/index', '', $this->indexPage?->path ?? $this->logicalPath); }

   public function __construct(
       public readonly string $physicalPath,
       public readonly string $logicalPath,
       protected readonly FolderParser $parser,
   ) {}

   public function count(): int
   {
       return count($this->folderData);
   }

   public function variants(): array
   {
       return $this->indexPage?->variants() ?? [];
   }

   public function variant(string $ext): ?Page
   {
       return $this->indexPage?->variant($ext);
   }

   public function all(): iterable
   {
       return $this->folderData->all();
   }
  
   // Various other methods omitted.

Now, we've done a few things.

  1. folderData was already a property, and a method. You had to do that if you wanted caching. Now, they're combined into a single lazy-initializing, caching property. It's still protected, though.
  2. The indexPage was always just a silly little wrapper around folderData. Now that wrapper is even thinner, in a property. Code calling it can just blindly assume it's there and use it safely.
  3. The various other simple data from Page/PageInformation are also now just properties. Also, it's super easy for us to cache them so defaults don't need to be handled again in the future. As before, we make the properties private(set) so they're read-only to the outside world without any of the shenanigans of readonly.
  4. Features like null-coalesce assignment, null-safe method calls, and shortened ternaries make the code overall really nice and compact. (That's not new in PHP 8.4, I just like them.)

In the end, we have less code, more self-descriptive code, and no loss in flexibility. Score! The performance should be about a wash; hooks cost very slightly more than a method call, but not enough that you'll notice a difference.

Declaration interfaces

Another place where interface properties came in handy is in my "File Handlers." The interface for those in PHP 8.3 looks like this:

interface PageHandler
{
    public function supportedMethods(): array;

    public function supportedExtensions(): array;

    public function handle(ServerRequestInterface $request, Page $page, string $ext): ?RouteResult;
}

supportedMethods() and supportedExtensions() are both, well, boring. Those methods will, 95% of the time, just return a static array value. However, the other 5% of the time they will need some minimal logic. That means they cannot be attributes, and have to be methods.

Which means most implementations have this verbose nonsense:

readonly class MarkdownLatteHandler implements PageHandler
{
    public function __construct( /* ... */) {}

    public function supportedMethods(): array
    {
        return ['GET'];
    }

    public function supportedExtensions(): array
    {
        return ['md'];
    }

    // ...
}

Which is like... why?

In PHP 8.4, interface properties let us shorten both the interface and implementations to this:

interface PageHandler
{
    public array $supportedMethods { get; }
    public array $supportedExtensions { get; }

    public function handle(ServerRequestInterface $request, Page $page, string $ext): ?RouteResult;
}

class MarkdownLatteHandler implements PageHandler
{
    public private(set) array $supportedMethods = ['GET'];
    public private(set) array $supportedExtensions = ['md'];

    public function __construct(/* ... */) {}

    // ...
}

Much shorter and easier! We can just declare the properties directly, with values, and keep them private-set, then never set them. It's marginally faster, too, as there's no function call involved (though in practice it doesn't matter). We don't even need hooks most of the time, just aviz!

And in that other 5%, well, we can use hooks just as well:

class StaticFileHandler implements PageHandler
{
    public private(set) array $supportedMethods = ['GET'];
    public array $supportedExtensions {
        get => array_keys($this->config->allowedExtensions);
    }

    public function __construct(
        /* ... */
        private readonly StaticRoutes $config,
    ) {}
}

One more thing...

There's one other place where PageInformation gets used, and where PHP 8.4's new features help out in hilarious ways.

Another task this project does is loading Markdown files off disk, with YAML frontmatter (which is, you guessed it, PageInformation's properties). The way I'm doing so is to load the file, rip off the YAML frontmatter, and deserialize that into a MarkdownPage object using Crell/Serde. Serde creates an object by bypassing the constructor and then populating it, but one thing that won't be populated is the content property. That gets set by just writing to it afterward.

The relevant loading code looks like this (abbreviated):

   public function load(string $file): MarkdownPage|MarkdownError
   {
       $fileSource = file_get_contents($file);

       if ($fileSource === false) {
           return MarkdownError::FileNotFound;
       }

       [$header, $content] = $this->extractFrontMatter($fileSource);

       $document = $this->serde->deserialize($header, from: 'yaml', to: MarkdownPage::class);
       $document->{$this->documentStructure->contentField} = $content;

       return $document;
   }

(The property to write the content to is configurable via attributes, for reasons unrelated to the topic at hand.) Problem: That means the content property needs to be publicly writable, which is generally not ideal. Technically we could use a bound closure to dance around that and set it from private scope, but PHP 8.4 lets us do something even more wild:

class MarkdownPage implements PageInformation
{
   public function __construct(
       #[Content]
       public(set) readonly string $content,
       public readonly string $title = '',
       public private(set) string $summary = '' { get => $this->summary ?: $this->summarize(); },
       public readonly string $template = '',
       public readonly array $tags = [],
       public readonly ?string $slug = null,
       public readonly bool $hidden = false,
       public readonly array $other = [],
   ) {}

   private function summarize(): string { ... }
  
   // And other stuff.
}

(I'm skipping over the fact that in 8.3 we needed a bunch of extra do-nothing methods, as we've already discussed those benefits.)

That's right. I have found a use case for public(set) readonly! Really, no one is more surprised at this than I am. With this configuration, $content can be set only once, but it can be set externally. Trying to set it a second time, from anywhere, results in an error. (Yes, we could have just used a bound closure, but this is more fun.)

Also note that most properties are just public readonly, which fully satisfies the interface. The exception is $summary, which has more interesting default logic, and thus uses a hook, and thus uses private(set) instead of readonly. Nothing especially new here.

Conclusion

I am overall happy with the result. I think it makes the code cleaner, more compact, and easier to extend. When adding more properties to the PageInformation interface, as I expect I will, adding that property to all the places it gets used will be less work, too.

The one complaint I have is that I do miss the double-short syntax that we removed from the hooks RFC, as it had too much pushback. Since the property hooks above are all get-only, they could have been abbreviated even further to (to use the Folder example):

public private(set) array $tags => $this->tags ??= $this->indexPage?->tags ?? [];
public private(set) string $slug => $this->slug ??= $this->indexPage?->slug ?? '';
public private(set) bool $hidden => $this->hidden ??= $this->indexPage?->hidden ?? true;

I find that perfectly readable, and with less visual noise of the { get wrapped around it. If folks agree, maybe we can try to re-add it in a future version.

So there we are: Interface properties, hooks, and asymmetric visibility, all dovetailing together to make code shorter, tidier, and more flexible. Welcome to PHP 8.4!

You can see a complete diff of all the PHP 8.4 upgrades I made as well. Looks like it shaved off around 150 lines of code, too.

(Note: If you're reading this article in the future, the code this is from will almost certainly have evolved further. This represents the code at the time of this blog post.)

Sort:  


The rewards earned on this comment will go directly to the people sharing the post on Reddit as long as they are registered with @poshtoken. Sign up at https://hiveposh.com. Otherwise, rewards go to the author of the blog post.