Posted by: Thomas McGuire | 2009/10/03

Akonadi, Nepomuk and Strigi explained

Tobias recently blogged about Nepomuk, and from the comments it seems that people are a bit in the dark about what Akonadi, Nepomuk and Strigi actually do and how they interact with each other. So if you want to understand those technologies, read on! This blog post is an attempt to clear things up a bit.

Soprano

Let me start with Soprano. Soprano is a Qt library for accessing semantic storage (RDF). In many ways, Soprano can be compared to the QtSql module, the key difference is that QtSql accesses relational data with SQL as the query language, whereas Soprano accesses semantic data with SPARQL as the query language.

Semantic Data

But what is semantic data, and what is the difference to relational data? You probably all know relational databases, which use tables to store data. Semantic databases, on the other hand, use statements, also sometime referred to as sentences, to store data. Statements consist, just like real-world sentences, of a subject (noun), a predicate (verb) and an object. By storing many sentences in a database, one can create a big network of data. The best way to make this clear is probably by examples, so here are some:

.

  • “image.jpg” “has the width” “800 pixels”
  • “image.jpg” “is tagged with” “example-tag”
  • “image.jpg” “was photographed by” “Max Mustermann”
  • “example-tag” “has the title” “Holidays”
  • “example-tag” “has the icon” “beach.svg”
  • “Max Mustermann” “lives in” “London”
  • “Max Mustermann” “was born on” “01.01.1970″

As you can see in those examples, it is possible to link together totally different topics, such as information about a file, a tag and a person. Those examples are of course a bit fabricated, real data would look different. However, you can see the basic sentence structure consisting of subjects, predicates and objects. One can add an arbitrary number of statements about things to a semantic database. The powerful aspect about semantic data is the way it is linked together. With the above statements, you could for example search for all images tagged with “example-tag”. You could also do much more interesting searches, like searching for all pictures taken by people living in London, or searching for all files that were sent as e-mail attachments from your boss.

Read the RDF Primer from the W3C for an in-depth introduction about RDF and semantic data (RDF is one way to describe semantic data, but not the only way).

Backends

Like QtSql, which supports different backends such as SQLite or MySQL, Soprano also supports different backends. Currently, there are three backends for Soprano: Redland, Sesame2 and Virtuoso. Redland is C++ based and orders of magnitudes too slow for what we need in Akonadi, so basically Redland should never be used at all. Sesame2 is Java based and performs well. This currently seems to be the only usable backend for Soprano. Read Tobias’ blog entry for the details about this. The third backend is Virtuoso, which combines the strength of the other two backends: It is C++ based and performs well. To my knowledge, this backend is currently in development and therefore not usable, but it will certainly be an interesting choice in the future.

Nepomuk

Ok, now you should have a basic understanding about what semantic data is and what Soprano is. Read on to find out what Nepomuk is!

Nepomuk is the KDE library for accessing semantic data. It uses Soprano for storage access. Nepomuk provides a KDE API for many high-level functions such as tagging and annotating. An important point is that Nepomuk also provides a set of standard ontologies, and convenience classes to use them.

Let me explain what an ontology is. Although you can store arbitrary statements in a semantic database, that rarely is useful. Consider the case that you store the sentence “Laura lives in Leeds” and the sentence “Ralf resides in Leeds”. Notice that the predicate, i.e. the verb, is different in those sentences, once it is “resides in”, and once it is “lives in”. Now, if you attempt to do a semantic search for all people living in Leeds, you will not find Ralf, since the statement about him uses a different predicate. Therefore it is a good idea to have a set of standard predicates and other terms, to have a clearly defined vocabulary about things. This is what ontologies are. Nepomuk comes with a set of standard ontologies which define vocabulary for talking about annotations, files, contacts, mails, calendars, music and more. These ontologies are now also a freedesktop standard, GNOME’s Tracker uses them as well.

Now, Nepomuk would be useless without any data. There are basically two ways of getting data into Nepomuk: One way is by manual user action, for example when a user tags a file in Dolphin. The other way is automatic indexing, which is done by both Strigi and Akonadi. Read the next sections for details.

Strigi

Strigi is the file indexer for KDE. It looks at every file on your hard disk, extracts semantic data out of the file, and then feeds the data into Nepomuk. When saying that Nepomuk uses a lot of CPU or IO, that is usually because Strigi is indexing files in the background. There are however many settings to improve this, for example indexing is disabled while on battery, and the IO niceness is set to a low level. Also, Strigi indexing can be disabled completely, without disabling any other parts of Nepomuk, because the file indexing is just one way to get data into Nepomuk.

What currently is badly missing, in my opinion, is a good GUI client to actually search for all the data that has been indexed. It seems that there was some very nice progress during this year’s summer of code with that, so I am sure the situation will get a lot better in the future


Akonadi

Now, on to the last technology of this blog post! Akonadi is a framework to access PIM data like mails, contacts and calendar events. Think of it as a cache or a proxy to your PIM data: The real data is still stored in local Maildir folders, local vCard files, IMAP servers or in your Google address book. Akonadi provides an easy API to access that PIM data in an uniform way. Additionally, it can act as a cache, for things like disconnected IMAP or for offline access to your Google address book. Another advantage is that the PIM data can easily be shared between applications. Now not only KMail can access your mail, but also LionMail or Mailody. Additionally, there is no need to have KMail running to access your calendars and contacts on your Kolab server. Akonadi furthermore replaces the brittle system of index files in KMail, and the new Akonadi IMAP resource is already much faster than KMail’s old IMAP code.

So as you can see, Akonadi will bring many advantages to the end user, once the applications are ported to use Akonadi. For KDE 4.4, only the new KAddressbook and KPilot will use Akonadi natively. Ports of KOrganizer, Akregator and KMail are in progress and (hopefully) expected to be released with KDE 4.5.

Akonadi of course needs to store information about the PIM items and folders it knows about somewhere. For this, we use a classical SQL database. For now, we support only MySQL, but there is work done on PostgreSQL and SQLite support. Those two database backends are both work in progress, help there would be very welcome.

Now, how is Akonadi related to Nepomuk? Applications which use Akonadi require a fast search and good support for virtual folders. Now, we didn’t want to code our own search support into Akonadi. It is quite a lot of work and difficult to get right. The virtual folders in KMail 1 are for example too slow to be useful for larger volumes of mail. What we did for searching instead was to use a technology that is actually good at finding stuff: Nepomuk.

We use Akonadi agents to feed information about contacts, events and mails into Nepomuk. So just like Strigi, those Akonadi agents will put data into Nepomuk. We use the standard ontologies, like the NMO (Nepomuk Mail Ontology) to store the mails. This data is then used for searches and virtual folders. By using Nepomuk, we hope to overcome many of the KMail 1 shortcomings, like the slow virtual folders mentioned earlier or the inability to search in base64-encoded attachments. It already works quite good, for example we have a working tag resource to show all your mails tagged with specific tags, and searches with SPARQL are also working (although there is no GUI for it, yet, for now you need to use the development tool akonadiconsole to see them).

That’s it, folks. I hope I made some things clearer to you. If anything is unclear, please ask in the comments section.
My next blog post will have screenshots again, I promise :-)

Posted by: Thomas McGuire | 2009/08/24

Junior Job Achievements

I my last post, I talked about Junior Jobs in KMail a bit. Now, I want to write about the progress of those Junior Jobs that was made during the last few weeks.

I think the Junior Jobs program was a success, the wiki page where I listed ideas was almost empty at one point. Some of the developers who started with Junior Jobs later picked something else up by themselves, which is the way it should go. Especially since it is not easy to actually come up with ideas :)

Now, without too much further talking, let me present you the progress, including pretty screenshots. Note that the order here is totally random, and also I probably forgot many things here, so don’t feel left out when you committed something that isn’t listed here.

James Bendig improved the usability of the options of the new message list. There is now an unified way to configure tooltips for the folder list and the message list. Remember all the buttons next to the quick search field that appeared in KDE 4.2? Those buttons were confusing to new users, who often didn’t discover how to change the theme or the aggregation. Also, those buttons cluttered the UI a bit. This is how it looked like:

Quick search line before the changes

This is how it looks now:

Quick search after the changes

As you can see, the buttons have been removed. Vincent Dupont helped with converting the filter to a combo box again. But where are the options to change the theme, aggregation and sort order now? They are now in two places: The View menu, and in the context menu of the header. The global theme and aggregation now can also be changed in the config dialog. Setting a per-folder theme or aggregation is now also easier, it can be done in the folder properties dialog now:

Folder Properties Dialog

Overall the options for the new message list are now much more user-friendly.

Bruno Bigras ported over some long forgotten features from the old kdepim 3.5.5+ branch, for example an improved recipients picker that shows the alternative mail addresses as children of the contact and has more grouping capabilities like grouping per address book category.

New recipient picker

Bruno also added a new filter action that can add people to the addressbook automatically. Also, you can now filter messages in KMail before they are sent.

Torgny Nyblom again converted one hardcoded dialog into an UI file. I remember a year ago or so, there were no UI files in KMail, but now those keep increasing.

Jonathan Armond brought back searching by status and added searching by tag. Tags can now also be added by filters. Switching the identity in the composer now switches the template as well, if you have not modified the message already. For some people, adding the signature at the beginning or the end is not enough, so now the %SIGNATURE command is supported in templates.

Jaime Torres, whom you probably know as a member of the bugsquad, also contributed a couple of bugfixes.

Apart from the Junior Jobs listed above, there were of course more commits in KMail, but I don’t want to talk about those now. One person deserves special mention though: Martin Koller. He is by no way a “junior”, since he was listed in the KMail about dialog long before I was added there.  But recently he started coding for KMail again and fixed a lot of bugs, over 30 I think. He also went through bugzilla and closed a lot of bugs there as well.

A big thank you to all the people who contributed to KMail and help making it better!

Last but not least, there has been much progress in the akonadi-ports branch. Kevin blogged about progress with the ports of the message list and the reader widget, you should read that if you haven’t already. Constantin made good progress with the port of the composer, which does not sound exciting, but it is a very important step. His work will eventually make it easier to implement HTML replies, share the composer library with other applications, make it easier to support native Exchange sending methods instead of SMTP and much more. But there is so much stuff in Akonadi-land that I really should do a separate post about this.

Posted by: Thomas McGuire | 2009/06/06

Hello Planet & KMail Junior Jobs

Hello Planet! If you don’t know me yet, let me introduce myself: I’m Thomas McGuire, and I’m a KDE developer. My main work area is KMail, as I’m the KMail maintainer, but I do touch other bits occasionally.

Why did I start a blog? Not many members of the KDEPIM team blog often, and I think it is important to tell the outside world what is going on, I plan to give updates when something interesting related to KMail or KDEPIM happens. Also, when googeling for my name, the first entry still is a certain (dead) air force major, that has to change ;)

Ok, now that I got mandatory introduction finished, let me get to today’s topic: Junior Jobs in KMail.

As you maybe know, for example through Allen’s blog, we have relatively few developers, but a huge codebase that needs to be maintained, and thus are always in the need of new blood.

Getting into KDE development is not always easy, especially when there is lot of code like in KMail. To make it easier for potential developers, I’ve created a wiki page on techbase with Junior Jobs for KMail. Those are little coding tasks that don’t require much knowledge about the internals of KMail, and are self-contained, so working on those issues should be easy and fun.

So if you know C++ and a bit of Qt and want to help out with KMail development, look at the wiki page: http://techbase.kde.org/Projects/PIM/KMail_Junior_Jobs!

I’m glad that two developers, Jonathan and Frank, already joined in the fun.

Frank fixed a rather annoying regression which badly affected the speed of disconnected IMAP syncing in 4.2.3 and fixed a bug that renaming a sending account had no immediate effect.

Jonathan added support for tags that change the background color (will be in KDE 4.4, not 4.3 due to the feature freeze) and fixed two bugs, one relating to grouping of messages if the start of the week is not on Monday, and one with the filter dialog.

So thanks to Frank and Jonathan!

Hopefully we’ll see more new developers picking up Junior Jobs soon!

The next posts will probably be about Akonadi or the Summer of Code project by my student Constantin, so stay tuned! Tell me if there is any topic you are particularry interested in.

Categories