Ubuntu Desktop: Contacts as Indexed Files

Some people have seen me mention some ideas about how we can effectively access data on the Ubuntu Desktop. Making it much better than even MacOSX for standard data types. My main grudge with current methods is the “hide everything in a database” pattern, which seriously stuffs up some basic user interaction and how user data is controlled by the user outside of the applications that generate it. These are some quick moc-ups of what I want from my computer:

contacts-view

contacts-inner-view

If the controlling mechanism is the type of content i.e. contact, photo and so forth. Then it can be argued that XDG directories are the primary sorter for the first level directory structures. Each type of data is welcomed into their configured directories, or disabled if those directories do not exist.

In order to allow files to express their content in multiple places, we need to be able to mount new directories inside the XDG directories for certain dynamic collections. These would not be files but fusefs mounts, and would provide dynamic access to some data and in other cases would be cache files from hardware devices and online services and present that data in those collections.

Now onto the problem of meta data, this includes tagging and other useful tid-bits. This requires a complex set of un-relational data, similar to xml but not suitable for a typical database such as mysql or sqllite. The storage of this data should be with the files themselves if possible and stored somewhere near by in a worse case scenario. Some content files don’t need much meta data as they are much of the data themselves This includes contacts and events who’s internal data is a similar structural relativeness to meta data. Recommend the use of ext meta data fields + xml caches for indexing.

We should also consider indexing, I’m not happy about Tracker, at the moment it seems to serve a very monolithic design. We can’t afford to have unindexable meta data or content from files and every field but be separable and searchable in it’s own form; this includes date ranges and other data. For this I propose I’m thinking Xapian full text searcher and indexer with heavy use of field indexing, it’s faster than lucene, better than trackerd and more flexible that most indexers.

Consider re-indexing. In fact it’s not enough to just index things periodically, this creates way too much strain the system. What is required is a more selective and progressive indexing mechanism. The modification dates of directories do not currently cascade. Unfortunately this would have provided an ideal mechanism for testing service start-up initialisation, allowing the system to create a possible list of re-indexable out of date targets.

To index progressively the changes as they happen, we should use inotify, this would then pass off the management process to the immediate computer run time. Threaded obviously.

We’ll also need a data services manager, something that can handle online authorisation, access to hardware and other data sourcing. This then passes off to fusefs drivers for data provisioning.

We also need to think about data standardisation or dealing with malformed data. Do we convert badly written vCards to something useful and save those, or do we leave that to the program? do we create a filtering system for all data types or simply allow the applications to handle them? This means the whole thing would need the following technologies:

  • Xapian Indexer / Searcher
  • iNotify Kernel FS Events
  • New XDG Directories for new data collections
  • FuseFS for Mountable Dynamic Data Sources
  • Dbus for hanging system interfaces
  • Data Source Manager (doesn’t exist)
  • Data Filtering System (doesn’t exist)

This blog post has mainly been written to get a lot of these thoughts out of my head. Feel free to poke over them.

No Responses to “Ubuntu Desktop: Contacts as Indexed Files”

  1. Awesome.

    I love this idea for a number of reasons, not the least of which is that it sort of ‘just-feels-right’ in the *nixlike environments where everything is a mounted file point.

    I can’t help but feel this is pretty darn insightful and innovative.

  2. ethana2 says:

    Everything in this room is a file. In fact even I am a file, but that is called unixism my dear children and is frowned upon in most civilizations.

    Me, I’m all for it. Just feels right. On Facebook I can’t right click on a person and call them. With this, I could.

  3. ethana2 says:

    ..or just make calling the default action for double clicking with that file type.

  4. Dylan McCall says:

    I like the fusefs idea, definitely. Any theories why apps don’t usually do this already? Maybe this spot is in need of a magic library to make the job easier, or some nifty fuse-fs on the desktop documentation :)

    On the other hand, my thought on stuff like contact management is that this task should ALWAYS be handled by an app designed for contact management. Meanwhile, a file manager should just be for managing files and disks, without the wobbling monolith of cruft we see today. It’s important to accept that the reason we are even picturing a file manager doing contact management is because something is broken. (And it makes the gods of Unix sad).

    Still, these things fit together well: the file system IS an awesome database of databases, of sorts. It is already up and running, it’s reasonably fast, it’s flexible, and any app has continuous access to it at any given time. The last point enables an awesome feature: integration.
    I don’t understand why so many applications abandon that awesome file system for their own self-contained, hidden databases, other than maybe being control freaks. Sure, it makes sense for performance-critical stuff given that content is quick to search then (as I understand it). But for photos (eg: iPhoto, not fspot) and contacts? Really?

    That’s why I like your idea :)
    If we can get apps on the normal file system, they’ll be able to talk together properly whether we go the monolith file manager route or not. Fuse is a great tool for such stuff, for kudos for the reminder; people seem to forget it exists.
    No matter what, this seems like a very natural, very important next step. I’m also crossing my fingers, dreaming of the day when desktop search solves all the other problems immediately after.

  5. Omer says:

    Sounds great! This is exactly the kind of user experience innovation that will allow us to get people to convert to an open system and never go back. Definitely needs to be discussed at the next UDS.

  6. foo says:

    Fucking awesome! Please buy yourself a beer!

  7. Seif Lotfy says:

    hey Mo
    Fucking awsome blogpost but i think you got something wrong, there is the “Tracker Storage” and “Tracker Indexer” . The storage is a not a file storage rather a meta data storage that meets up with the RDF standards and believe it would help the central data service more that you think. Since it uses a unified ontology to represent stuff across desktops giving each file type its own attributes such as contacts have brithdays and songs have artists.
    As for notification you should not depend on inotify but rather zeitgeist since zeitgeist is a event logging and distribution framework. Instead of listening to the FS we listen to the apps since apps are what change local files or online stuff.
    Tell me when you want to hack we might have a zeitgeist hackfest after UDS contact me per mail if you want to

  8. John Drinkwater says:

    vCards, embedded PHOTO, and a nautilus thumbnailer… it’s already working for me.
    http://i28.tinypic.com/1sjcrp.jpg

  9. Scaine says:

    This will also simplify backup granularity. I love the idea – the mockups are very convincing and the overall idea is insightful. I’d love to hear how likely you think this is and who you think might take responsibility for coding it.

  10. directhex says:

    Know what? BeOS was doing this back in the 1990s.

    http://www.birdhouse.org/beos/byte/24-scripting_the_bfs/before.jpg

    Another reason why Be is awesome

    *wistful gaze*

  11. Rob Taylor says:

    Tracker 0.7 is no longer monolithic. The new core is an efficient RDF/SPARQL store/query engine. The infexer and crawler are both seperate and form just one way of feeding data into the metadata store.

    There’s quite a few people thinking along much the same lines as yourself, come onto #tracker on freenode and chatt about it :)

  12. Murat Gunes says:

    I can see things along these lines happening with Zeitgeist FS.

  13. pvanhoof says:

    We should also consider indexing, I’m not happy about Tracker, at the moment it seems to serve a very monolithic design

    In master we are working on a split design. With tracker-store being a storage service that talks Nepomuk as ontology and SPARQL as query language. And the other part being tracker-miner-fs being the ‘indexer’ for the filesystem.

    The tracker-store can be separately packaged and works independent of the tracker-miner-fs. The tracker-store doesn’t do anything related to filesystem ‘mining’ (nor has it a dependency on for example inotify or doesn’t it do file monitoring).

    With Nepomuk and SPARQL Tracker in master fulfills the requirements that you gave in this blog perfectly.

    I think you should really give tracker’s current master a try. We plan to release this as 0.7.x series (being unstable) very soon. Although only at 0.8.0 we will call it stable and production-ready.

  14. doctormo says:

    Would we have to write a plugin for every possible interaction? Could you guarantee that files placed into a directory via ssh, ftp, or via wget or some other yet to be invented tool would automatically work?

    I think of inotify because it is a one to many event handler that uses the very bases of how the system functions. But I may be able to be persuaded via irc.

  15. Haschek says:

    I hope the Ubuntu/Gnome community is checking out what KDE is working on: The Semantic Desktop. I really would like to have something like this in Gnome :) You talking about meta data and XML, why not use RDF, it’s possible to serialize it to XML. Coming back to your example, the contact data could be in a triple store (for example Evolution should use this store too) and something like a WebDav port shows you the contact groups as directories and the persons as files. It’s all about overcome the data silos :)

  16. Seif Lotfy says:

    Nope not really interactions are defined by an ontolgy. we will use inotify as a fallback for some events. We are also workng on storing our events as metadata in the tracker storage! Right now alot of applications are being mnitored by zeitgeist but we r working on lettng apps send their events to zeitgeist. The apps know more abouut their data than the filesystem.

  17. Daeng Bo says:

    On a related note, I tried asking on the XDG mailing list about having a standard format for user mail directories (preferably simply MailDir in ~/Maildir), but was shot down because every client has a different database format (which was, ironically, the fact which prompted me to make my proposal).

    I’d like to see more in the “everything is a granular file” direction (including browser history, bookmarks, etc.) and use indexing to mitigate the speed loss. Tracker 0.7 look like a great way to do this so I say “Great work on this post, and keep it up.”

  18. G says:

    That’s exactly how People files worked in the BeOS!

  19. [...] data on the Ubuntu Desktop. Making it much better than even MacOSX for standard data types. More here My main grudge with current methods is the “hide everything in a database” pattern, which [...]