Tobias recently blogged about Nepomuk, and from the comments it seems that people are a bit in the dark about what Akonadi, Nepomuk and Strigi actually do and how they interact with each other. So if you want to understand those technologies, read on! This blog post is an attempt to clear things up a bit.
Let me start with Soprano. Soprano is a Qt library for accessing semantic storage (RDF). In many ways, Soprano can be compared to the QtSql module, the key difference is that QtSql accesses relational data with SQL as the query language, whereas Soprano accesses semantic data with SPARQL as the query language.
But what is semantic data, and what is the difference to relational data? You probably all know relational databases, which use tables to store data. Semantic databases, on the other hand, use statements, also sometime referred to as sentences, to store data. Statements consist, just like real-world sentences, of a subject (noun), a predicate (verb) and an object. By storing many sentences in a database, one can create a big network of data. The best way to make this clear is probably by examples, so here are some:
- “image.jpg” “has the width” “800 pixels”
- “image.jpg” “is tagged with” “example-tag”
- “image.jpg” “was photographed by” “Max Mustermann”
- “example-tag” “has the title” “Holidays”
- “example-tag” “has the icon” “beach.svg”
- “Max Mustermann” “lives in” “London”
- “Max Mustermann” “was born on” “01.01.1970”
As you can see in those examples, it is possible to link together totally different topics, such as information about a file, a tag and a person. Those examples are of course a bit fabricated, real data would look different. However, you can see the basic sentence structure consisting of subjects, predicates and objects. One can add an arbitrary number of statements about things to a semantic database. The powerful aspect about semantic data is the way it is linked together. With the above statements, you could for example search for all images tagged with “example-tag”. You could also do much more interesting searches, like searching for all pictures taken by people living in London, or searching for all files that were sent as e-mail attachments from your boss.
Read the RDF Primer from the W3C for an in-depth introduction about RDF and semantic data (RDF is one way to describe semantic data, but not the only way).
Like QtSql, which supports different backends such as SQLite or MySQL, Soprano also supports different backends. Currently, there are three backends for Soprano: Redland, Sesame2 and Virtuoso. Redland is C++ based and orders of magnitudes too slow for what we need in Akonadi, so basically Redland should never be used at all. Sesame2 is Java based and performs well. This currently seems to be the only usable backend for Soprano. Read Tobias’ blog entry for the details about this. The third backend is Virtuoso, which combines the strength of the other two backends: It is C++ based and performs well. To my knowledge, this backend is currently in development and therefore not usable, but it will certainly be an interesting choice in the future.
Ok, now you should have a basic understanding about what semantic data is and what Soprano is. Read on to find out what Nepomuk is!
Nepomuk is the KDE library for accessing semantic data. It uses Soprano for storage access. Nepomuk provides a KDE API for many high-level functions such as tagging and annotating. An important point is that Nepomuk also provides a set of standard ontologies, and convenience classes to use them.
Let me explain what an ontology is. Although you can store arbitrary statements in a semantic database, that rarely is useful. Consider the case that you store the sentence “Laura lives in Leeds” and the sentence “Ralf resides in Leeds”. Notice that the predicate, i.e. the verb, is different in those sentences, once it is “resides in”, and once it is “lives in”. Now, if you attempt to do a semantic search for all people living in Leeds, you will not find Ralf, since the statement about him uses a different predicate. Therefore it is a good idea to have a set of standard predicates and other terms, to have a clearly defined vocabulary about things. This is what ontologies are. Nepomuk comes with a set of standard ontologies which define vocabulary for talking about annotations, files, contacts, mails, calendars, music and more. These ontologies are now also a freedesktop standard, GNOME’s Tracker uses them as well.
Now, Nepomuk would be useless without any data. There are basically two ways of getting data into Nepomuk: One way is by manual user action, for example when a user tags a file in Dolphin. The other way is automatic indexing, which is done by both Strigi and Akonadi. Read the next sections for details.
Strigi is the file indexer for KDE. It looks at every file on your hard disk, extracts semantic data out of the file, and then feeds the data into Nepomuk. When saying that Nepomuk uses a lot of CPU or IO, that is usually because Strigi is indexing files in the background. There are however many settings to improve this, for example indexing is disabled while on battery, and the IO niceness is set to a low level. Also, Strigi indexing can be disabled completely, without disabling any other parts of Nepomuk, because the file indexing is just one way to get data into Nepomuk.
What currently is badly missing, in my opinion, is a good GUI client to actually search for all the data that has been indexed. It seems that there was some very nice progress during this year’s summer of code with that, so I am sure the situation will get a lot better in the future
Now, on to the last technology of this blog post! Akonadi is a framework to access PIM data like mails, contacts and calendar events. Think of it as a cache or a proxy to your PIM data: The real data is still stored in local Maildir folders, local vCard files, IMAP servers or in your Google address book. Akonadi provides an easy API to access that PIM data in an uniform way. Additionally, it can act as a cache, for things like disconnected IMAP or for offline access to your Google address book. Another advantage is that the PIM data can easily be shared between applications. Now not only KMail can access your mail, but also LionMail or Mailody. Additionally, there is no need to have KMail running to access your calendars and contacts on your Kolab server. Akonadi furthermore replaces the brittle system of index files in KMail, and the new Akonadi IMAP resource is already much faster than KMail’s old IMAP code.
So as you can see, Akonadi will bring many advantages to the end user, once the applications are ported to use Akonadi. For KDE 4.4, only the new KAddressbook and KPilot will use Akonadi natively. Ports of KOrganizer, Akregator and KMail are in progress and (hopefully) expected to be released with KDE 4.5.
Akonadi of course needs to store information about the PIM items and folders it knows about somewhere. For this, we use a classical SQL database. For now, we support only MySQL, but there is work done on PostgreSQL and SQLite support. Those two database backends are both work in progress, help there would be very welcome.
Now, how is Akonadi related to Nepomuk? Applications which use Akonadi require a fast search and good support for virtual folders. Now, we didn’t want to code our own search support into Akonadi. It is quite a lot of work and difficult to get right. The virtual folders in KMail 1 are for example too slow to be useful for larger volumes of mail. What we did for searching instead was to use a technology that is actually good at finding stuff: Nepomuk.
We use Akonadi agents to feed information about contacts, events and mails into Nepomuk. So just like Strigi, those Akonadi agents will put data into Nepomuk. We use the standard ontologies, like the NMO (Nepomuk Mail Ontology) to store the mails. This data is then used for searches and virtual folders. By using Nepomuk, we hope to overcome many of the KMail 1 shortcomings, like the slow virtual folders mentioned earlier or the inability to search in base64-encoded attachments. It already works quite good, for example we have a working tag resource to show all your mails tagged with specific tags, and searches with SPARQL are also working (although there is no GUI for it, yet, for now you need to use the development tool akonadiconsole to see them).
That’s it, folks. I hope I made some things clearer to you. If anything is unclear, please ask in the comments section.
My next blog post will have screenshots again, I promise