Upgrading PHP upgrades

in PHP2 years ago (edited)

PHP 8.2 was released on 8 December, to much fanfare. And, as always, to much wailing and gnashing of teeth about how the PHP language is evolving too quickly and breaking everyone's code. More specifically, it was the earlier, twin announcement that PHP 7.4 reached end-of-life on 28 November, as that has, somehow, forced everyone to suddenly rewrite their entire code base in a hurry.

And... while I sympathize with some of the complaints, I am once again left wondering "how?"

This question is inspired in part by a recent post by Ed Barnard as part of 24 Days in December, PHP's advent blog series. To be crystal clear: I like Ed, he's a good guy, very skilled, and this is in no way, shape, or form an attack on him. But I am confused by his assertions, which echo those made by many others this time every year.

Some choice quotes from the post:

PHP 8 has, in my view, mandated that the way we design our PHP software must change for the better.

Remember, it’s not the deprecations–it’s that PHP 8 is no longer PHP.

Ed laments that it used to be easy to write code that worked from early PHP 5 through late PHP 7, but somehow that is no longer possible. That's quite simply not true.

He also falls back on the tired trope that PHP 5 and earlier were the tool of casual non-computer scientists, and then the computer scientists came in and ruined everything by making the language... consistent? Less error prone? This has always been a nonsensical argument, trotted out any time someone doesn't like an improvement in any software. (We heard it a ton when rebuilding Drupal 7 to 8 about how all the fancy computer science types were forcing these fancy PhD-only features down people's throats, like... classes.)

The gist of the article is summed up in this line:

We’re forced into a rewrite, or something very like a rewrite, while at the same time remaining in production and producing new features to deal with rapid growth.

And I have to ask... how?

Before I go further, let's get a few common arguments (either direction) out of the way:

Open Source is not about you

Free Software and Open Source Software are great. I have built my career on them. But an important aspect of Open Source is that in most cases, you're getting someone else's work for free. You have paid nothing. That means, guess what, you're entitled to nothing. Absolutely nothing. The author(s) of the software you're using don't owe you even a response on GitHub, much less support, unless you have some additional contract with them. If you want some kind of guaranteed support, pony up and pay them.

Neither is there any "moral" expectation of indefinite support. That is an entirely made-up concept, mostly by self-interested companies that want the benefits of someone else's free code and none of the liability of not having a "throat to choke" vendor they're paying.

The only reason anyone contributing to the PHP engine itself cares what people think about BC breaks is because either 1) they use PHP themselves and don't want to break their own work or 2) they get warm-and-fuzzies from seeing people using their work. (Or often both.) So that's the level and scope we're talking about.

PHP release managers sign on for 3.5 years of volunteer work to to fix bugs and security issues for free. (Or at least coordinate such fixing.) That's a lot of free labor. Asking them to sign on for longer without pay is abusive entitlement.

"Never break anything" is not a strategy

Or, rather, it cannot be a strategy for a tool like PHP. Rust or Go may be able to get away with that, because they were actually designed, and designed with a small footprint, and with a careful plan for evolution. PHP was none of those things; PHP began life as a series of useful template hacks that outgrew their baby pants and became turing complete. Those hacks are almost universally recognized as being antithetical to stable software. That's been known for 20 years, in most cases.

PHP has technical debt that it needs to clean up. It is completely unreasonable to expect a project to never address technical debt so that you never have to do work, doubly so when you're not paying for it.

If PHP is going to be used to build reliable software, it needs to be a reliable language. And that means, somehow, removing or addressing the inherently-unreliable bits. The question isn't if, but how.

image.png

[via: https://xkcd.com/1172/]

Commercial support is available

If for whatever reason your system cannot be upgraded off an old version of PHP... so be it. PHP 7.4 doesn't suddenly become a buggy mess the instant the volunteers stop agreeing to fix security issues if they are found.

And if you really need support for older versions, that's available. Companies like Zend offer paid LTS support. Many Linux distributions will maintain select versions of PHP (whatever they shipped with) for many years past when the PHP team does, backporting security fixes as needed.

Deprecations are not breaks, yet

Most changes to PHP that have BC implications are first deprecated, meaning nothing changes in their behavior but they will trigger a deprecation message. By default, deprecations do not stop execution; they just leave a note in your log. That means in most cases developers have years to update their code before the behavior actually changes.

Some, perhaps many, systems, however, are configured to treat all deprecations as errors. These systems are Doing It Wrong(tm). Period. They are inventing problems that do not exist, and the fault lies not with the PHP team but with the people who configure their systems wrong.

PHPUnit used to also treat deprecations as test fails by default. It no longer does. It never should have, but it doesn't anymore, so that's no longer an excuse.

Code is an opex

In accounting, there's the concept of a "capital expenditure" (capex) and an "operating expenditure" (opex). A capex is a one-time cost, like buying a new printer. An opex is an ongoing cost that needs to be budgeted for, like the paper and ink for the printer.

Most code is an opex. Many accountants still like to treat it as a capex. They are wrong. All code entails ongoing maintenance. Companies that forget this suddenly find themselves unable to reboot their systems because they fired their developers (or they retired). If you don't schedule time and cost for system maintenance, your system will schedule it for you. That includes routine upgrades for security fixes if nothing else.

Calling yourself a "fast moving startup" doesn't change that fact, and a manager at a fast moving startup that doesn't get the need for maintenance will soon find himself a manager and a fast failing startup. (Or jump ship to another fast moving startup where he can screw up again, but that's a different blog post.)

Open source developers matter, too

Conversely, though, I will give a lot more leeway to people maintaining Open Source PHP libraries than to companies using PHP commercially. They are also often and largely volunteers; Many OSS maintainers lament that they hate releasing code because it becomes a second job. (See the first point.) While the PHP project itself doesn't owe those developers anything, and vice versa, it's still considerate to not make their lives harder unnecessarily. (This applies only to indies, not to companies.) Many of them are maintaining projects small enough that turning maintenance for their libraries into a paid service isn't viable, certainly not enough to compete with what their day job pays.

Tests or go home

This is a simple one. It's 2022. Nearly 2023. I don't care how old your code base is. If you don't have a reasonable test suite in place, at least functional tests, then it's not PHP's problem, it's your problem. "This code is too old to have tests" is BS. You've just not prioritized writing tests for the mystery meat code base you have. (And I say this having inherited such minimal-test code bases before.) This is not an argument, it's a lazy copout, and I will not give it any more air.

The release cycle

I often hear people complaining that when an old PHP release is deprecated that they "suddenly" have to upgrade everything. This is absolutely false, and disingenuously so.

All PHP releases (in the modern version scheduling) get two years of bug and security fixes from the date of their release, plus another year of security-only fixes. That means even if you only run the oldest-supported release (security only), there's a two year window in which you know what needs to be done for compatibility. From the time PHP 8.0 was released to the date free, volunteer support for 7.4 was dropped was about two years. Nothing about what's needed is a surprise if someone is taking the bare minimum responsibility to remain informed. (Just read php.net once or twice a year.)

And PHP has a many-months-long feature freeze before a release, and a many-month dev cycle before that. Most changes have at least an extra 6 months that they're known, maybe even a year, before the actual release.

Anyone who is surprised by some change at the last minute has no one but themselves to blame.

Not all changes are equal

Let's also bear in mind that people tend to be very generic and non-specific in their complaints, but not all changes to the language are equal.

I cannot comprehend a codebase in which the addition of short lambdas (7.4), constructor property promotion (8.0), enums (8.1), or allowing constants in traits (8.2) would break an existing codebase. If that somehow happens, the code in question was beyond broken already.

Others may have very slight changes of breakage, but they're small and usually easily fixed. If someone had a package that used the Random namespace and had a class named Randomizer, that would break with the new "Random extension" in 8.2. However, anyone doing so would be going against 14 years of well documented namespace conventions set up to protect them from exactly that situation. I have no sympathy.

Every new keyword in the language has the potential to conflict with some existing function name or constant. Sometimes there are ways to hack around that, other times not. It's reasonable to ask the PHP team to check before they introduce a new keyword to see how much it would break... and most of the time that is exactly what happens. Not always, perhaps not as much as some would like, but often.

At the same time, though, with billions of lines of PHP code in the world, knowing what keyword might break something is impossible. That's where major projects bear some responsibility to protect themselves by... being engaged and calling out things they know will break, when it's still early enough to fix them. Some projects do a decent job here, others pointedly do not. (Looking at you, Wordpress.)

In some cases there's no good workaround but to rename something in user-space:

I know one esports tournament platform that can't upgrade to PHP 8 because they have a class in their codebase called 'Match'.

[cf: https://mastodon.me.uk/@mintopia/109480647003836301]

Which is a fair criticism! But also not a world-breaker. In most cases it's a single IDE "rename" refactor operation.

And then there's the subtle (or not so subtle) behavior changes, which is what people are usually complaining about, but don't bother to separate from everything else. Things like promoting missing variables from a Notice to a Warning (8.0), or converting some resource values to objects (8.0), or adding types on core interfaces like Iterator (8.1), or deprecating dynamic properties (8.2). Depending on your codebase, those could require small to medium work to address. Grumbling about those is, at times, valid.

But what scale?

What is not valid, though, is this:

We’re forced into a rewrite, or something very like a rewrite, while at the same time remaining in production and producing new features to deal with rapid growth.

This is pure hyperbole. While it is certainly true that the optimal, recommended way to write PHP code has changed over the years, most of that change happened before PHP 5.3 in 2009. The language has just gotten better at encouraging you to write that way since then. But, for instance, "arrays as pseudo-objects" has been a known-bad-practice since at least 2007. That's not new, and yet code that does that still works today.

While newer PHP versions have, in many cases, included changes that necessitated code updates, none of them have required full rewrites. None. And in most cases, well-behaved code didn't even need changes. There's just a lot of not-well-behaved code out there.

As a point of comparison, I did most of the PHP 8.0 compatibility work for TYPO3 v11. TYPO3 is a 20+ year old system. It's one of the very few applications that has a continuous history since PHP 3. It relies very, very heavily on anonymous arrays, still. It has code debt upon code debt. There's over 800,000 lines of code in the core system and almost 4000 classes, not including dependencies. It's huge. And I was able to do the PHP 8.0 compatibility upgrades in about 4-5 weeks. Over 85% of them were some variation on adding ?? null to some line.

Should the code get refactored and updated? Absolutely. Is it necessary for modern PHP-compatibility? Absolutely not.

But let's consider a few recent changes and consider their impact.

Attributes

Attributes were introduced in 8.0. There was some drama around selecting the syntax, but in the end, the syntax that was chosen has a great extra feature: In earlier versions of PHP, it's a comment. It's new syntax that gets ignored. That was a major reason the current syntax was chosen, and it makes it easy to add attributes gracefully without any breakage, even in older versions. That helped a great deal to mitigate other changes, as we'll see below.

Undefined variables

PHP 8.0 raised the error level of undefined variables from a notice to a warning, effectively making them "real errors" when they weren't before. This impacts a lot of old code, it's true.

But relying on undefined variables to silently turn to null has been a known bad-practice that introduces security holes since at least 2005. It's been a good-practice recommendation to develop under E_ALL (report all errors, including notices) since at least 2007, probably longer. 17 years is, I would argue, entirely sufficient time to fix such issues, and anyone who has been developing under E_ALL wouldn't even notice this change.

Even if a code base has been sloppy for two decades, as was the case in TYPO3, then as noted this change doesn't require a rewrite. I fixed it with a few hundred ?? null or similar additions. Tedious, yes, but not world-breaking.

Refactoring the code to avoid those undefined vars being possible in the first place is wise, but not required. And it's been wise for 17 years at least. It's not some new development.

Interface types

PHP 8.1 added parameter and return types to most PHP-provided interfaces, such as Countable or Iterable. Improving the type declarations on those is unquestionably an improvement, as having those types there helps prevent people from doing something that is broken, and may cause unexpected behavior. However, it's also entirely true that it creates a problem for existing classes that have no return types, since that's now a type violation. That requires work to fix.

Work, but not a full rewrite. A full rewrite doesn't even make sense here. And the work needed is mitigated by three factors:

  1. Narrowing the return type is legal, so it's been possible to add return types to those methods since PHP 7.0.
  2. It's a deprecation, not an error. No code actually breaks. See the above point about deprecations.
  3. Even then, there's an opt-in attribute that can be placed on a method, #[\ReturnTypeWillChange], that will suppress the deprecation until PHP 9.0, and the attribute will be ignored on PHP 7 (which didn't have attributes) thanks to it being interpreted as a comment.

So, assuming a worst case scenario, the work involved is "paste #[\ReturnTypeWillChange] a bunch of times around the codebase and get back to it later." Annoying perhaps, but far from forcing a full rewrite.

One place I will raise a note is that some of the return or parameter types added were mixed, which was only introduced a year earlier in 8.0. That means adding those type hints (to avoid the attribute) introduced an incompatibility with 7.4 and earlier, which was, frankly, too short of a window. It might have been better to wait until 8.2 here to allow for a larger number of concurrent versions, although with the attribute such code is valid and error-free from 7.0 until 9.0 so that's still a minor complaint.

Stricter type internal functions

Many of PHP's older standard library functions (which are implemented in C and therefore behave subtly differently than user-space code already) have, historically, silently accepted null arguments and done some one-off "it made sense at the time" logic to avoid throwing an error. Just as with undefined variables, this is a convenience feature from PHP's very early days that is not, it turns out, very convenient. In fact, it's quite inconvenient to potentially get errors when PHP tells you a string is of length 0 when it's not even a string, causing bugs elsewhere in the code that you could have caught earlier. That's the entire value of types, but sometimes those older functions weren't smart about it.

In PHP 8.1, those functions were changed to trigger a deprecation when used with null. This also affected a not-small amount of code.

But again, there are many mitigating factors:

  1. Once again, it's a deprecation. See above.
  2. Typed parameters and returns have been around since PHP 7.0. Code that's been gradually adding proper types over the last seven years has probably caught and fixed a lot of null errors already, to the point that such code paths had already been eliminated.
  3. Passing null to strlen() is a bug. Period. If code out there was doing so, it was already broken and buggy. The developers just didn't realize it because they weren't relying on 7 year old features of the language to catch bugs for them. The noose on bugs has slowly tightened, so... yeah, newer PHP versions do a better job of telling you ahead of time that your code has a bug so you can fix it. Fix your bugs.

Once again, though, this doesn't necessitate a full rewrite. strlen((string)$buggy_might_be_null) will make the deprecation go away, and instead you'll be passing an empty string. That still means your code likely has a bug in it, but PHP won't tell you about it.

Resource to object conversion

PHP has an ancient variable type called resource that is unlike anything else. It's mostly useless, un-introspectable, and breaks the type system in exciting ways. Certain things being resources (like file references, some database connections, etc.) make certain improvements to PHP impossible. In PHP 8.0, many of those were converted to be objects. Not all of them were, due almost entirely to lack of people and time. Another batch was converted in PHP 8.1.

Most use cases won't notice; but there is code that has historically needed to check is_resource($var) or get_resource_type($var)or similar for various valid reasons. That code now has to do a version-check first, and then depending on the version handle the value differently.

Honestly... I'll grant this one. To be fair, it doesn't affect most code, as most code doesn't even use resources, or if it does it doesn't care about implementation details. But for the small percentage of code that does care about resources, it's an annoying change without a clear transition path.

That said, I believe there are steps that the engine could have taken to make it a more graceful transition. For instance, Matthew Weier O'Phinney (project lead for Laminas) has suggested having is_resource() and friends hard-code exceptions for objects that used to be resources, at least for a while. I cannot speak for that part of the engine to say how hard it would be (or would have been), but on the surface that sounds reasonable. Such a hack could be removed in 9.0.

Dynamic properties

PHP 8.2 deprecates dynamic properties on objects. Much like undefined variables, this is an old PHP misfeature that has long fallen out of favor. I've not seen anyone seriously treat it as a good feature since at least PHP 5.3. Maybe earlier. It's bad for documentation, a great bug breeding ground, and also terrible for performance. Most of the time, it's a bug hiding a typo.

As of 8.2, assigning to a property that doesn't exist is deprecated. In 9.0, it will turn into an error. That is, unless the #[\AllowDynamicProperties] attribute is added to the class, which will continue to work even after 9.0.

This is perhaps the change I am least forgiving of complaints about. I've not seen a production class that deliberately relied on undefined dynamic properties since... I dunno, George W. Bush's first term? But I have seen typos that created them by accident. Dynamic properties have been considered a misfeature for a long, long time. As before, well-behaved code won't even notice this change.

Even then:

  1. It's a deprecation only. See above.
  2. There's an easy opt-in attribute to add.
  3. The __get/__set magic methods still work fine, and it's trivial to emulate the old behavior with them if you really want and don't want to use an attribute for some reason.

So even if you have the one PHP project in the world that relies on dynamic properties as a core feature... your code is still not broken. You have several years before PHP 9.0 when you will have to include an attribute for it to keep working. There's no schedule at the moment for removing dynamic properties outright, although some people wanted to.

So yeah, worst-case, in a few years you may need to sprinkle some attributes around your code, but that's it. Or, you know, declare your properties, which has many other advantages. You probably should completely rewrite your code if you're relying that heavily on dynamic properties, but aren't forced to.

So where are we?

I could go on, but I think you get the idea. Matthew refers to it as "death by a thousand cuts," and while I disagree with him about the extent of it I do feel the challenge is much more in that territory.

Could the PHP project do a better job in some cases of graceful upgrades to reduce breakage? Honestly, yes. Better long-term planning would help the project in many ways, that included.

But let's stop pretending that the PHP team "doesn't care," or "PHP 8 is no longer PHP," or that developers are "forced to rewrite" their entire codebase. That's straight up FUD. It doesn't make PHP change less. It makes people not want to work on PHP at all, and then even the absolutely-zero-breakage features (like short lambdas, constructor promotion, attributes, enums, union and intersection types, etc.) don't happen.

If you want to improve PHP's BC... suggest specifics, that allow the language to clean up code and design debt in a way that minimizes breakage. Actually participate in the volunteer community that is PHP and flag things you know are going to be problematic for your project. No one in PHP likes breaking changes for their own sake, but they're a useful means to an end that are weighed for their net effect on balance. If you want to impact that discussion, come with solutions, not complaints.

How to engage

As I hinted above, PHP as an open source project is open for participation. Writing patches for PHP itself is hard (because PHP is written in a custom, proprietary macro language that is itself written in C), but reviewing RFCs is easy. There's a public mailing list (php-internals, specifically) plus a few public chat channels, which are open for anyone to join. All RFCs have at minimum two weeks of discussion before a two week vote. Most have longer. That's ample time to see if a particular RFC will affect you in some way. All active RFCs are also publicly listed.

If you're concerned, sign up for the mailing list and ignore anything that isn't the announcement of a new RFC. Then check out the RFC to see if it might cause issues for your project. If so, you can and should point that out -- politely and calmly works best. People do bring up issues from time to time, and while they are not always acted on (as noted, it's always a balancing act), they're always listened to. Many people on the Internals list do speak up about maintaining backward compatibility. It is a tricky subject, and takes genuine human effort. So put in that effort. Or at least see and appreciate the effort that others are putting in.

Voting on RFCs requires having contributed to PHP itself directly. That can be done via code, but also by contributing to the documentation (that's how I got on the voting list) or various other means. If you go that route and put in the work, you get a seat at the table.

That said, please don't use that seat to just block things. PHP does need to evolve, it does need to get rid of code debt from dumb designs in its early days, and it does need to do so responsibility. If you think "responsibly" is an easy thing to define, then you're not being responsible.

If you work for a company that uses PHP, and you're not sure you have the time to engage in this fashion... then your company is not actually engaged in PHP. Just watching the list and chiming in now and again, and running some pre-tests, is not a huge time commitment. It's an hour or two a week if all you're doing is keeping an eye out for things that might cause trouble for you. Tell your company to include that time in your job. If they won't, then the problem isn't PHP, it's your manager.

Room for improvement

Could the PHP project do better itself about BC? Yes, I agree it could. In the interest of cooperation, here's some thoughts, not all of which are necessarily actionable but at least they can be a starting point.

Extended security releases

PHP 5.6 was given a two year security-only period instead of one year, in recognition that 7.0 might be a bumpy ride. That wasn't done for 7.4->8.0. I couldn't speak to why, but I believe it's worth considering as a standard procedure for the last release in a major-series.

(Of course, there was still wailing when 5.6 was retired that it was "too soon," so you can't please everyone.)

Tag-team release managers

Release managers used to be assigned to a release in pairs, one veteran and one new person. The last 2 releases have switched to one veteran and two new people, which is even better for avoiding bus-factors. But that's still a 3+ year commitment.

Would it make sense to have different people take over at different times, say when a release goes into security-only mode? I don't know, but 8.1/8.2 release manager Ben Ramsey has hinted that it's worth considering, and I agree.

Automate project CI

In the past, there have been sporadic tools used to gauge the impact of a change. Sometimes people have run CI servers to run the test suite of major projects with dev PHP versions. Nikita Popov has a script that downloads the top 1000 packages from Packagist to be able to scan them for some usage, to see how common something is. (A change that we know will break code for 3 users is quite different from a change we know will break for 30,000 users.) Both of those have been intermittent and sporadic, though.

Could the tooling be improved so those are easier to do? Probably. I'm not certain what it would look like, but it's something to consider. This would also be partially a cultural change, to ask people to run Nikita's "top 1000" script more often. Right now it's very ad hoc.

To be fair, though, running CI of major projects is... frankly those projects' job. So what could PHP do packaging-wise to make it easier for Symfony, Laravel, WordPress, TYPO3, etc. to build such CI themselves so they get an early-warning that something might break?

More effort on "temp hacks"

This will absolutely vary depending on the type of change, but as noted before, perhaps the resource-to-object transition could have had more BC shims in it? I don't know how easy that would be, but if feasible it would probably help to include those. And, especially, have a clear sunset date for them, because some changes need to happen for other functionality to be possible. (Making the streams API not suck, for instance.)

The goal should be that "well-behaved" code can be written that works across several versions without specific version checks, not that "any code" works indefinitely. There is a reasonable middle-ground.

No-final-version deprecations

The point of deprecations is to give people ample heads-up that a change is coming. As noted repeatedly, if someone chooses to treat deprecations as errors that's their own problem. But we still want to ensure that there is "ample heads-up."

So perhaps we could disallow new deprecations in the final minor release of a series. (That would probably be 8.4, if the recent pattern holds and it gets followed by 9.0.) That would mean all deprecations have at least a two year period before the actual change happens.

This would be a change from the previous approach of having a big "what shall we deprecate" discussion for the last release, but I think it would be a reasonable good-will gesture to ensure at least a two year deprecation window.

Note that this applies to things that use deprecations; changes like making undefined variables a warning have been de facto-deprecated for long enough that they could vote in the US, so it wouldn't apply there.

Conclusion

So yeah, PHP could do better. But, to be blunt, so could those objecting to change. Most notably, honesty about the actual situation, without hyperbole, would go a long way to being able to address the issue.

And most of all, don't make shit up about how you have to "rewrite everything." That's simply not true, and it does no one any good to spread that kind of FUD.

Remember:

Not all change is an improvement, but all improvement requires change.

--Larry Garfield

Sort:  

Congratulations @crell! You received a personal badge!

Happy Hive Birthday! You are on the Hive blockchain for 5 years!

You can view your badges on your board and compare yourself to others in the Ranking

Check out our last posts:

Christmas Challenge - Offer a gift to your friends
HiveBuzz World Cup Contest - Sponsor Feedback and Feedback Request
HiveBuzz World Cup Contest - Prizes from our sponsors
The Hive Gamification Proposal Renewal
Support the HiveBuzz project. Vote for our proposal!

Being a 20+ years C# .NET developer I have to use PHP for some projects since several years. This year much more often than ever before.

As much as I "hate" the language/libraries, I am so excited about the more recent versions of PHP that go in the right direction (in my opinion). RFCs like the Property Hooks makes it so much easier to write C# like code and have less to type.

So in my opinion, moving forward in improving PHP should go at an ever increasing speed.

Thank you very much for your efforts.