Novadays, there is a lot of desktop hatred around. KDE3 vs. KDE4, GNOME2 vs. GNOME3, KDE vs. GNOME vs. XFCE vs. you-name-it… Everyone hates something. Especially when it comes to new things.
I will describe instead what I would love – my dream of a desktop.
All modern desktops are the old Xerox-born 2D paradigm. You have a fixed viewpoint over a flat surface (the desktop). A pointer (mouse) moves on it in two directions. The active applications / data are represented on it as flat surfaces (windows or icons). You can navigate and interact with them via the mouse pointer.
This paradigm is simple, convenient and productive for many tasks. However, it is inherently limited. For some tasks, a 3D paradigm will be more productive. Newer desktops get some cosmetic effects towards 3D (lensing, tilting windows, shadows, transparency, cube switching etc), but nothing really outside of 2D. What I dream of is a real 3D interface.
1. The paradigm
You have a viewpoint (“fly”) that moves and rotates in 3 directions in a 3D space (“room”). The applications / data are represented in it as 3D objects. You can navigate around them via the fly, and interact with them via one or more pointers (“hands”).
Similar things have been done before for games etc. – why not for a desktop? Yes, it will be awful for a lot of things, but will be great for a lot, too. Social stuff, virtual environments, education, research… you name it. You can always use it for what it is great for, and choose for other tasks what is great for them.
2. Hardware mapping
In the 2D paradigm, the desktop is mapped to the monitor screen, and the mouse pointer is mapped to a mouse, touchpad or a similar pointer-oriented device (rarely – on some keyboard keys). In the 3D paradigm, the fly vision might be mapped to the monitor screen (“mono mode”), or to a stereoscopic device (3D glasses, helmet…) (“stereo mode”), or to a panoramic vision device, etc. The fly “hearing” might be mapped to a sound device, either surround mode, or (if the device has a directional sensor) a stereo mode. The fly movement might be mapped to a pointer-oriented device, keyboard keys or both. The action of the hands (they might also need some degree of movement relative to the fly; if more than one hand, this is mandatory) might be mapped also to a pointer-oriented device and / or keyboard keys. Standard mappings might be established, but configurability is mandatory.
A very crude sample mapping of the fly movement to a mouse with a scroller and three buttons might be:
– Moving forward / backward: moving the mouse forward / backward.
– Moving left / right: moving the mouse left / right.
– Moving up / down: rotating the scroller backward / forward.
– Rotating to left / right: holding mouse button 3 and moving the mouse left / right.
– Rotating to up / down: holding mouse button 3 and moving the mouse forward / backward.
– Tilting to left / right: holding mouse button 3 and rotating the scroller backward / forward.
The movements of hand 1 or 2 relative to the fly can be mapped to the same movements of the mouse, but if mouse button 1 or 2 is pressed.
This mapping will not allow simultaneous moving and rotating of the fly, or moving / rotating the fly while also moving / rotating a hand relative to the fly. Additional keyboard key mappings, or devices with more sensors (eg. mice with additional scrollers, joysticks with 3D sensing, gloves…) may help.
For ease of use, the speed of movement / rotation of the sensors might be enhanced in the movement the fly and / or hands.
The size of a screen, measured in pointer movement “quanta”, is typically close to the size of a mouse pad, measured in device movement “quanta”. The size of a 3D “room”, however, might be much larger. For this reason, it might be more convenient to translate movement / rotation distance of the hardware not to movement / rotation distance of the 3D pointer, but to movement / rotation speed in the appropriate direction. Eg. moving the mouse a little will result not in moving the fly some distance, but in giving it some speed in this direction. Moving the mouse back will not return the fly to the original position, but will slow its speed and eventually stop it. Alternatively, the user might be able to choose between the two modes of movement (eg. pressing an additional button for one of them).
3. Paradigm extensions
As mentioned above, the fly may have one or two viewpoints and sound perceptors. At programming level, it may have a number of “sensors”. Sensors might belong to different types (2D, eg. camera; 1D, eg. slit camera; point, eg. microphone), and to have different “channels” (eg. visible light, infrared, sound, ultrasonics…), different sensitivity to the entire channel and across its spectrum, different resolution, directionality etc. Hands might be not simple pointers, but devices with many degrees of freedom, eg. human-like hands with separately movable fingers, etc.
All sensors of a fly, together with its hands, are termed “presence”. A typical presence might have two visible light 2D cameras, 2 microphones, possibly some “tactile”, “odor” and “taste” sensors, and one or two hands. It might also have its own 3D object (“avatar”).
3D objects have “characteristics” for the different sensor “channels” – that is, they emit and / or reflect to a degree the “interaction” that is carried by this channel (light, sound, physical contact…). In this way they project “images” on the sensors around, eg. these of a presence. That is how presences (or non-presence sensors) are able to “feel” the virtual objects.
The interactions are two-way – not only objects emit or reflect an interaction, but they are also notified of the contact with this interaction. Thus, they are able to react in some way to this contact. Eg. a glass might break when interacting with a hammer, or a light cell might start producing electricity when exposed to light. Objects reactions might have to be controlled by the room system software (eg. by being “mediated” by a layer in it), if their reactions must conform to a standard (eg. in a room where glass cannot be unbreakable, or gravity is mandatory),
A room is a virtual space, typically running on a single hardware platform (server, cluster…). Rooms typically have “walls”, constraints of the room volume. Information exchange between rooms can be done through “windows” (wall spaces where some interactions are carried between the two rooms, but objects and presences cannot cross them), and “doors” (wall spaces where objects and/or presences may pass from one to another room, and possibly interactions are carried between the rooms). Windows and doors can be one-way.
In a room, scaling might also be limited. For example, in a room where objects might not be able to present themselves correctly at molecular / atomic level, scaling a presence down to a size that would allow it to see molecules and atoms should not be allowed. Similarly, scaling a presence up to a size that would allow it to see too many objects at once, and thus to overload the system resources, should not be allowed. Typically, a room might have limits on scaling up/down, and will require objects to be able to present adequately at the smallest scaling, and to not consume resources enough to overload the system at the largest scaling. The scaling caused by a distance might be handled by applying “blur” to distant objects, by “greeking” them, etc. Client application might have tighter limits, if the client platform is not powerful enough, or it might blur or greek to allow the platform to handle the extra load.
4. Programming (as I see it)
Free software. At first thought, some parts will best be under GPLv3, and others under LGPL. Since it will be modular, however, every module might have versions under different licenses (incl. commercial).
An OO paradigm is inevitable. GNOME/LXDE style gobjects will eat less resources, but will also be harder to deal with (and attract programmers). True OO with C++ might be neater and easier to do.
Visuals will be based upon OO thin binding to an underlaying OpenGL (I see OpenGL-capable hardware and drivers as a must). Other “channels” can be dealt with according to the need.
The software will feature “client” and “server” parts. The client will be responsible for the visualisation etc. of the objects around (hopefully using hardware acceleration). The server will deal with the interaction between the objects (hopefully also using some form of HA). Info will flow between them in the form of object and presence descriptions, and changes in these. They can be transmitted over network, internal sockets, direct function calls (if on the same PC), etc. A protocol will have to be defined for these; I imagine it currently as JSON-based, but anything that works well will do.
Most of the object description contents will be description elements IDs (eg. texture 3638549036, movement formula 748950, object form 37383944…). The elements themselves will be separately downloaded from trusted repositories, or from the room servers, and will be cached at the client. (Much like the images in a HTML page are downloaded separately, cached and used to render the page.) In some cases description elements may be embedded in the description.
The client will supply the server with data about the presence actions. The server will supply the client with everything else, and may modify the presence actions according to the room rules and the interactions with objects and/or other presences.
Once object descriptions are exchanged, most info flowing between the server and the client will be changes in the objects data (eg. movement direction and speed, form, color, size…). The server will supply the changes relative to the previous data exchange regarding the object. The client might request a “rewind” – supplying the changes in the object since given moment. The server will supply them, or if this is not possible (eg. info about this particular moment is already deleted to save memory) will supply again the entire object description.
Objects might be made to be able to group together into a single object to save resources, or split back (to provide better detail) at need. For example, a tree might be represented as a single object when viewed from afar, and as a set of leaves, branches etc. objects, when scrutinized at close.
Objects must be able to have activities of their own. This could be done by a sandbox-executed script language, using appropriate functions and/or libraries. Or even different languages using the same bindings, eg. Java and / or Python.
“Designer” software might be developed, allowing anyone to create their own objects and/or object elements (textures etc), rooms etc. Copyright over these objects will belong, of course, to the creator…
… Just dreaming? Yes. I don’t have the free time and productivity to create a first demo alone, even if very limited. But others might pick the idea, and might decide to go for it. If yes, I’ll be glad to help, as much as I can.
Looks like an iOS killer.
Nice article, excellent level of English, so professional written….