Title
https://xkcd.com/963/ (October 2011)
[Mouseover text] Thomas Jefferson thought that every law and every constitution should be torn down and rewritten from scratch every nineteen years–which means X is overdue.
Back in the day X was a great protocol that reflected the needs of the time.
- Applications asked it to draw some lines and text.
- It sent input events to applications.
People also wanted to customize how their windows were laid out more flexibly. So the window manager appeared. This would move all of your windows around for you and provide some global shortcuts for things.
Then graphics got more complicated. All of a sudden the simple drawing primitives of X weren’t sufficient. Other than lines, text and rectangles applications wanted gradients, rounded corners and to display rich graphics. So now instead of using all of these fancy drawing APIs they were just uploading big bitmaps to the X server. At this point 1/3 of what the X server was previously doing became obsolete.
Next people wanted fancy effects and transparency (like drop shadows). So window managers started compositing the display. This is great but now they need more control than just moving windows around on the display in case they are warped, rendered somewhere slightly differently or on a different workspace. So now all input events go first from X to the window manager, then back to X, then to the application. Also output needs to be processed by the window manager, so it is sent from the client to X, then to the window manager, then the composited output is sent to X. So another 1/3 of what X was doing became obsolete.
So now what is the X server doing:
- Outputting the composited image to the display.
- Receiving input from input devices.
- Shuffling messages and graphics between the window manager and applications.
It turns out that 1 and 2 have got vastly simpler over the years, and can now basically be solved by a few libraries. 3 is just overhead (especially if you are trying to use X over a network because input and output need to make multiple round-trips each).
So 1 and 2 turned into libraries and 3 was just removed. Basically this made the X server disappear. Now the window manager just directly read input and displayed output usually using some common libraries.
Now removing the X server is a breaking change, so it was a great time to rethink a lot of decisions. Some of the highlights are:
- Accessing other applications information (output and input capture) requires explicit permission. This is a key piece to sandboxing applications.
- Organize the system around frames to avoid tearing except for when desired (X doesn’t really have the concept of a frame).
- Remove lots of basically unused APIs like fonts, drawing and many others.
So the future is great. Simpler, faster, more secure and more extensible. However getting there takes time.
This was also slowed down by some people trying to resist some features that X had (such as applications being able to position themselves). And with a few examples like that it can be impossible to make a nice port of an application to Wayland. However over time these features are being added and these days most applications have good Wayland support.
X is old and very hard to maintain. A lot of rules about how displays work have changed drastically since X became a thing. X went along with most of those changes, which meant the introduction of more and more hacks to keep it running.
Over time X became worse and worse to work on and people realized that it’s easier to write something new from scratch instead of trying to fix the decade-old technical debt in X.
That new thing was Wayland and over time most if not all people that where interested in working on desktop compositing pivoted away from X.
Wayland (as it is always the case with new software of that size) didn’t hit the ground running. It had various issues at the beginning and also follows a different desig philosophy than X.
Despite a lot of issues being fixed some people are still very vocal about not wanting to use wayland for one reason or another. While some of those reasons are valid, most come from ignorance or laziness to adapt.
Applications needs some coordination between each other in order to act like you would expect - things like one window at a time having focus and thus getting all keyboard and mouse inputs. As well as things like positioning on the screen and which screen to render to, the clipboard, and various others things.
X is a server and set of protocols that applications can implement to allow all this behaviour. X11 is the 11th version of the server and protocols. But X was also first created in 1984, and X11 since around 1987. Small changes have been made to X11 over the years but the last was in 2012.
Which makes it a very old protocol - and one which is showing its age. Advances in hardware since then and the way we use devices have left a lot to be desired in the protocol and while it has adapted a bit to keep up with modern tech it has not done so in the best of ways. I also believe its codebase is quite complex and hard to work with so changes are hard to do.
Thus is has quite a lot of limitations that modern systems are rubbing up against - for instance it does not really support multi cursors or input that is not a mouse and keyboard. So things like touch screens or pen/tablets tend to emulate a mouse and thus affect the only pointer X has. It is also not great at touchpads and things like touch pad gestures - while they do work, they are often clunky or not as flexible as some applications need.
It is also very insecure and has no real security measures in place - any GUI application has far more access to the system and input then it really requires. For instance; any application can screen grab the screen at any point in time - not something you really want when you have a banking web page open.
Wayland is basically a new set of protocols that takes more modern hardware and security practices in mind. It does the same fundamental job as X11, but without the same limitations X11 has and to fix a lot of the security issues with X.
One big difference with X though is that Wayland is just a protocol, and not a protocol and server like X. Instead it shifts the responsibilities of the X server into the window manager/compositor (which used to manage window placement and window borders as well as global effects such as any animations or transparency). It also has better controls over things like screen grabs so not every application can just grab a screen shot at once or register global shortcut keys or various things like that. Which for a while was a problem as screen sharing applications or even screenshot tools did not work - but over time these limitations have been added back in more secure ways than how X11 did them.
Does that mean that every application will need to be updated to work with Wayland?
In theory yes. In practice most X11 applications can be ran using Xwayland as a compatibility layer
Additionally any application using a GUI toolkit (like kde, qt or gtk etc) only needs to to update to a version that has native Wayland support. Which means most applications already support it. At least if they don’t use any X11 APIs directly (which is not that common).
Yes, nominally, but there is a layer called XWayland to support backwards compatibility, so it’s not really a concern.