Doodle: find information on your computer

About

Doodle is a tool to quickly search the documents on a computer. Doodle builds an index using meta-data contained in the documents and allows fast searches on the resulting database. Doodle uses libextractor to support obtaining meta-data from various file-formats. The database used by doodle is a suffix tree, resulting in fast lookups. Doodle supports approximate searches.
Features that Doodle does not have at the moment include:

A web interface
Ordering of search results
Spidering (indexing the Internet or websites)

If you need these features, have a look at the alternatives section.

Doodle is licensed under the GNU GPL. Indexing large volumes can take several hundred MBs of memory (depending on the amount of meta-data found). Searching should nevertheless require almost no memory. Using the latest version of libextractor is recommended. Doodle has so far only been tested under Debian and RedHat GNU Linux. Doodle is expected it to work under any platform supported by GNU libextractor.

Download

You can find the current release here. Man-pages for doodle, doodled and libdoodle are also on-line.

Debian packages can be found here. RedHat/Fedora RPM packages can be found here. A Python binding for doodle (and GNU libextractor) can be found here.

Using doodle

First the doodle database needs to be created. The simplest way to create the database is to run doodle with the -b option on the directories that are to be indexed. For example:

$ doodle -b

This will create the doodle database under ~/.doodle.
After creating the doodle database, you can search it. For example:

$ doodle keyword

Keeping the database up-to-date

If you want to keep your doodle database up-to-date, you can either periodically re-run doodle with the -b option, or you can use doodled, the doodle daemon. doodled uses fam to notice whenever a file is changed and instantly updates the doodle database. In order to use doodled, you must have famd running. If famd is running, you can start doodled by passing the same arguments that you would pass to doodle to construct the database, but without the -b option:

$ doodled

You can also use doodled to construct the initial database. While doodled is updating the database, any doodle search will block until the update is complete. Note that while you may want to index your entire disk (i.e., doodle -b /), it is typically not a great idea to have doodled monitor your entire system for changes -- especially since /usr is unlikely to change frequently. You can address this issue by first indexing / and then using doodled to monitor only directories that change frequently:

$ doodle -b /
$ doodled

This way, your entire system will be in the index, and your home directory will be always up-to-date.

Full-text search

You can achieve a (limited) form of full-text search with doodle. For that, the dictionary-based plaintext extractors from libextractor are used. In order to use them, you need to pass the option -b LANG to doodle. LANG is a two letter language code that selects the dictionary. Available languages at the moment are en, es, fr, it and no. Words and sentences that are available in the respective dictionaries for these languages will then be added to the index. While libextractor attempts to avoid full-text extraction for certain kown binary formats, it may still find words in non-text files. Running with this option will dramatically increase the size of the index and the time it takes to build the index. Note that if you change the options used to build a database will not (!) result in doodle re-indexing files that were processed with other options previously. The only way to force doodle to re-index files with different options is to either touch the files (change modification timestamp) or to delete the old database and start from scratch.

Hints for system administrators

If you are the system administrator, you might want to run doodle on the entire system periodically (cron job) and have doodled monitor the home directories in the background. In that case, it is suggested to have the doodle database be group-readable for a group doodle. Set the permissions for the doodle binary to SGID to allow users to poll the database. Doodle will ensure that information about files not accessible to the user are not leaked by checking if files found in the database are accessible to the user. doodled has to run as root since otherwise it would be impossible to index the personal files of all users. If that it too risky, doodled will still work, but only index the files readable to the user that runs doodled.

Using different options for different directories

You can build a database from multiple doodle runs over distinct sets of files with different options. For example, the following can make sense:

$ doodle -b /usr/share/doc  # normal index
$ doodle -n -l elf -l mime -b /usr /opt /bin /lib # only ELF and Mime support
$ doodled /home # monitor /home (changes frequently)

A simple doodle search will then find files in all listed directories. You can also build multiple disjoint databases and search all of them in one run (see doodle for option -d).

Bugtrack

Doodle uses Mantis for bugtracking. Visit https://bugs.gnunet.org/ to report bugs. You need to sign up for a reporter account. Please make sure you report bugs under Doodle and not under any of the other projects.

If Mantis does not work for you some reason and you need to report a bug, contact christian@grothoff.org via e-mail.

Frontends

Articles

Alternatives

Christian Grothoff

Copyright (C) 2004, 2005, 2006, 2007, 2009, 2010, 2018 Christian Grothoff.
Verbatim copying and distribution of this entire article
is permitted in any medium, provided this notice is preserved.