Managing my email with Guile - interfacing to notmuch's C library

in #ocd4 years ago

Manually creating the bindings for a sizable C program is a lot of work, also it is not very rewarding. I'm lazy and don't want to do that. I looked around for what other options I have and I was surprised about the solution.

In this series of posts I record how to use Guile as a scripting language and solve various tasks related to email work.

If my experience with Python has taught me something is that the endeavor of interfacing programs between languages can be quite painful. I remember trying to use boost.python, then cython, and then even hearing about pybind and python-cffi. All those projects, why is there no simple solution? They had a good start and then it was painful the rest of the way. With Guile I didn't search for long and I was quickly blown away.

NYACC is a project that did what I wanted very quickly. It can read the C code and automagically generate all the bindings you need. I still experienced some difficulties, and I still spent a lot of time looking up at the C code in notmuch and their Python bindings for guidance, yet the overall experience was a lot nicer than what I remember from the Python world. My work was focused on designing a usable interface, thinking about how I want my implementation to work. The typing and generation of the bindings was done entirely by NYACC. What I like about it is that you don't repeat yourself, you take the C header file and NYACC builds the bindings, it directly understands the C code. On the many Python projects I have used, you must implement a new copy, which you then need to maintain.

Creating the module

(define-ffi-module (ffi notmuch)
  #:library '("libnotmuch")
  #:include '("notmuch.h"))

That is all you need to start with, write it on a file called ffi/notmuch.ffi inside your guile path. Then as the NYACC documentation says just execute:

guild compile-ffi ffi/notmuch.ffi

and you get the ffi/notmuch.scm file with ALL the bindings defined on the notmuch.h file. It provides even wrappers/unwrappers between Guile and C types and their enums. I was really amazed how well it works. For reasons I'm unaware of, you still need to call string->pointer and pointer->string when dealing with those string pointers. Since it is written in the documentation there might be a limitation or be a design choice.

With all the bindings already implemented for you, the only thing left is to implement some adapters to interact with the library the way you like and not the way it was written to be used in the C world.

Building the interface

Make sure that the generated file ffi/notmuch.scm is in your path and import it. The workflow is now much easier, since all the bindings are already at your disposal. I can directly use the module to create my adapters to use notmuch in Guile.

Wrappers around the wrappers - the adapters

NYACC creates a binding for notmuch_database_open, which looks more complicated that what I presented in the previous post, yet that is because it provides additional wrappers/unwrappers to the types. Same thing with all other exposed functions.

NYACC also defines constructors for types, for example make-notmuch_database_t* creates a pointer to that type and I get it with a nice representation in the REPL, which is much nicer than, what I had in the previous post with make-bytevector. My adapter to open the database is now much cleaner.

(use-modules
 (system foreign)
 (system ffi-help-rt) ;; functions from nyacc
 (ffi notmuch))       ;; the module just created

(define (open-database path mode)
        ;; nyacc provides the pointer "constructor"
  (let ((ffi-db (make-notmuch_database_t*)))
    (notmuch_database_open (string->pointer path) mode (pointer-to ffi-db))
    ffi-db))

Next I set up a query, and set the default of omitting the deleted and spam tags. I should read those options from the notmuch-config, yet I don't want to create that interface at the moment, thus I just put it here.

(define (query-db db str)
  (let ((query (notmuch_query_create db (string->pointer str))))
    (for-each (lambda (tag)
                (notmuch_query_add_tag_exclude query (string->pointer tag)))
              (list "deleted" "spam"))
    query))

To process the query I need to see the matching messages. For that I implement result-messages and the extra utility function count-messages.

(define (result-messages query)
  (let ((messages (make-notmuch_messages_t*)))
    (notmuch_query_search_messages query (pointer-to messages))
    messages))

(define (count-messages query)
  (let ((counter (make-int32)))
    (notmuch_query_count_messages query (pointer-to counter))
    (fh-object-ref counter)))

Iterating over the messages

The previous functions allowed me to get the messages matching the query, yet I need to be able to process them, that means iterating over each message. Looping in Guile is done via recursion. I use the named let to express recursion for an iterative process. Here I do heavy use of the C++ functions to iterate over the messages, very similar to how it is implemented in the C++ code. It gets annoying to differentiate in the functions between plural and singular, because there are messageS and message. I'm guilty of this crime on my own software, yet with so many more prefixes and suffixes in the function names in here it was tougher on my eyes this time. The message iterator with inline redundant explanations is as follows:

(define (messages-iter query proc)
         ;; get all messages that match the query
  (let ((obj (result-messages query)))
    ;; This is the named let, LOOP is the procedure which accepts
    ;; the amount of bindings as arguments
    ;; ITEM are the individual messages, here I get the first one to initialize it
    ;; ACC is is a list accumulating the results of the iteration
    (let loop ((item (notmuch_messages_get obj))
               (acc '()))
      ;; Terminate iteration if the obj, which is a pointer for the messageS
      ;; is not pointing to a valid message anymore.
      (if (= 0 (notmuch_messages_valid obj))
          (begin
            ;; Extremely important to clear memory of the messageS
            (notmuch_messages_destroy obj)
            ;; This is the retun value, the list of results
            acc)
          (let ((result (proc item))) ;; I evalutate proc to a message
            ;; Extremely important to clear the memory of the message
            (notmuch_message_destroy item)
            ;; This moves the pointer of messageS to the next message
            (notmuch_messages_move_to_next obj)
            ;; Recursion in play, LOOP is called, it gets the next message
            ;; because the pointer was just moved and RESULT is placed at the
            ;; head of ACC, for the next iteration
            (loop (notmuch_messages_get obj)
                  (cons result acc)))))))

The power of Scheme is that I can abstract that iteration and pass a function to process the messages. On the C++ I found all those pointer manipulating functions being called all over the place, each time an iteration was needed.

A simple function to extract selected headers is just again an iteration of the notmuch_message_get_header with different arguments. I get the benefit to abstract behavior in a function of variable arity for each header I want. I return a new function that only takes a message as argument.

(define (get-headers . labels)
  (lambda (message)
    (map (lambda (label)
           (pointer->string
            (notmuch_message_get_header message (string->pointer label))))
         labels)))

;; Use it like this, where msg is a notmuch message pointer
;; ((get-headers  "date" "to" "from") msg)

;; Use it with the iterator like this:
(let* ((db (open-database "/home/titan/.mail/" 0))
       (query (query-db db "discussions on some mailing list"))
       (result (messages-iter query (get-headers "date" "to" "from"))))
  ;; always clear memory
  (notmuch_query_destroy query)
  (notmuch_database_destroy db)
  ;; return the result
  result)

Tagging emails

Tagging is again a procedure I apply to a message, thus I only need to implement that function like I did with get-headers. In this case apply-tags-to-message returns a function that consumes the message and applies the desired tags, which are given all is the same string. This is to reuse my configured tags from previous setups (tags : "+sent +project -inbox"). The next function is quite dense as it needs to iterate over each tag that is going to be applied or removed.

(define (apply-tags-to-message tags)
  (lambda (message)
    (let loop ((rest (string-tokenize tags)))
      (unless (null-list? rest)
        (let ((tag (string->pointer (substring (car rest) 1))))
          (if (string-prefix? "-" (car rest))
              (notmuch_message_remove_tag message tag)
              (notmuch_message_add_tag message tag)))
        (loop (cdr rest))))))

I use this function just as I showed in the previous section, however I need to open the database in READ_WRITE mode, that is I must pass a 1 to the open-database function. Keep in mind that the tagging function does not return anything, thus the result of the iterator will be a list of undefined values.

Deleting message files

This task has some new challenges. Notmuch has two functions to get the message filename. notmuch_message_get_filename and notmuch_message_get_filenames, singular and plural cases again. The first one gets a filename of the message, most of the time a message has also one corresponding file. However, when you interact with mailing lists, you might end with some copies of the same message, also if you manage multiple email accounts and receive the same message on many of the accounts. For that reason there is the function with the plural case. It returns an iterator over the filenames, which is to be processed just like I did with the messages iterator.

The code is so similar I will not write it here, you can take it as an exercise left to you. In an upcoming post I'll show my attempt at using macros to write an iterator to deal with both messages and filenames.

Summary

As I already said at the beginning of the post, NYACC is an amazing solution to create the bindings from Guile to C, it felt much more comfortable to use than anything I have used on the Python world, when I had to use those tools some years ago.

After creating some basic adapters, extending my script to process my emails with notmuch was an absolute pleasure. I did run into problems like running out of memory and running out of file handles. I had forgotten about those old problems since I moved to the land of Python, yet when interfacing with C, you need to take that responsibility again, whether you are in Python or Guile.

I implemented enough features for this email processing script that I already manage to do everything afew, was able to do. As such I have replaced it already with my script. It is not comparatively better, yet it makes me proud to use my stuff, designed in the way I want to use it.

I still need to practice a way of abstracting the behavior I want my code to express, I still need to learn about macros and how the object system works.

Sort:  

Congratulations @titan-c! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

You distributed more than 53000 upvotes. Your next target is to reach 54000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @hivebuzz:

Feedback from the January 1st Hive Power Up Day
Happy New Year - Project Activity Update