Ubuntu Desktop: Contacts as Indexed Files

Posted in Hat Talk, Programming and Technical, Ubuntu on July 25th, 2009 by doctormo

Some people have seen me mention some ideas about how we can effectively access data on the Ubuntu Desktop. Making it much better than even MacOSX for standard data types. My main grudge with current methods is the “hide everything in a database” pattern, which seriously stuffs up some basic user interaction and how user data is controlled by the user outside of the applications that generate it. These are some quick moc-ups of what I want from my computer:

contacts-view

contacts-inner-view

If the controlling mechanism is the type of content i.e. contact, photo and so forth. Then it can be argued that XDG directories are the primary sorter for the first level directory structures. Each type of data is welcomed into their configured directories, or disabled if those directories do not exist.

In order to allow files to express their content in multiple places, we need to be able to mount new directories inside the XDG directories for certain dynamic collections. These would not be files but fusefs mounts, and would provide dynamic access to some data and in other cases would be cache files from hardware devices and online services and present that data in those collections.

Now onto the problem of meta data, this includes tagging and other useful tid-bits. This requires a complex set of un-relational data, similar to xml but not suitable for a typical database such as mysql or sqllite. The storage of this data should be with the files themselves if possible and stored somewhere near by in a worse case scenario. Some content files don’t need much meta data as they are much of the data themselves This includes contacts and events who’s internal data is a similar structural relativeness to meta data. Recommend the use of ext meta data fields + xml caches for indexing.

We should also consider indexing, I’m not happy about Tracker, at the moment it seems to serve a very monolithic design. We can’t afford to have unindexable meta data or content from files and every field but be separable and searchable in it’s own form; this includes date ranges and other data. For this I propose I’m thinking Xapian full text searcher and indexer with heavy use of field indexing, it’s faster than lucene, better than trackerd and more flexible that most indexers.

Consider re-indexing. In fact it’s not enough to just index things periodically, this creates way too much strain the system. What is required is a more selective and progressive indexing mechanism. The modification dates of directories do not currently cascade. Unfortunately this would have provided an ideal mechanism for testing service start-up initialisation, allowing the system to create a possible list of re-indexable out of date targets.

To index progressively the changes as they happen, we should use inotify, this would then pass off the management process to the immediate computer run time. Threaded obviously.

We’ll also need a data services manager, something that can handle online authorisation, access to hardware and other data sourcing. This then passes off to fusefs drivers for data provisioning.

We also need to think about data standardisation or dealing with malformed data. Do we convert badly written vCards to something useful and save those, or do we leave that to the program? do we create a filtering system for all data types or simply allow the applications to handle them? This means the whole thing would need the following technologies:

  • Xapian Indexer / Searcher
  • iNotify Kernel FS Events
  • New XDG Directories for new data collections
  • FuseFS for Mountable Dynamic Data Sources
  • Dbus for hanging system interfaces
  • Data Source Manager (doesn’t exist)
  • Data Filtering System (doesn’t exist)

This blog post has mainly been written to get a lot of these thoughts out of my head. Feel free to poke over them.