Posted by Steve on Fri 22 Jul 2005 at 10:29
One of the new features being introduced into Debian's unstable distribution currently is a "tag" implementation. This allows small pieces of meta-data to be associated with each package in the archive, this data can be useful for searching, and finding new packages.
The historical way that the Debian archive has been managed has been to split it up into sections. There are a small number of sections available and each package belongs to one, and only one, section.
For example a game would go into the Games section, and a Perl library would go in the Perl section. You can see each of the sections, and a brief description here on the Debian website.
Whilst the sections allow a simple and efficient way of categorising software the system suffers from two main flaws:
As a result of this various people have proposed expanding the number of available sections at different times. Another more flexible and open-ended solution has also been proposed several time; adding "tags" to packages to allow them to be described and categorised more fairly.
The tags system is now live in Debian's unstable distribution (codenamed Sid) and should make it into the Etch release.
I first noticed this by accident when viewing the description of a package with apt-cache. If you view, for example, the description of the tidy package you will see the tag information at the bottom:
skx@mystery:~$ apt-cache show tidy Package: tidy Priority: optional Section: web Installed-Size: 40 Maintainer: Jason ThomasArchitecture: i386 Version: 20050415-1 Depends: libc6 (>= 2.3.2.ds1-21), libtidy0 Suggests: tidy-doc Filename: pool/main/t/tidy/tidy_20050415-1_i386.deb Size: 17020 MD5sum: 983571c271b64f93b01903f56479a70d Description: HTML syntax checker and reformatter Corrects markup in a way compliant with the latest standards, and optimal for the popular browsers. It has a comprehensive knowledge of the attributes defined in the HTML 4.0 recommendation from W3C, and understands the US ASCII, ISO Latin-1, UTF-8 and the ISO 2022 family of 7-bit encodings. In the output: . * HTML entity names for characters are used when appropriate. * Missing attribute quotes are added, and mismatched quotes found. * Tags lacking a terminating '>' are spotted. * Proprietary elements are recognized and reported as such. * The page is reformatted, from a choice of indentation styles. . Tidy is a product of the World Wide Web Consortium. Tag: interface::commandline, use::checking, role::sw-utility, format::html, devel
As you can see the last line of the output includes various tags - giving some details about how it is used "interface::commandline", etc.
This information isn't contained in the Debian package itself, but instead it is contained inside the Packages file.
When you run "apt-get update", or "aptitude update" you connect to a number of repositories and download files which contain details about all the packages held on that repository, including their size, their description, etc, this information can be used to search for a package. Now this file also includes tag information.
The package lists are stored in the directory /var/lib/apt/lists, and are simple text files - You can examine them yourself if you wish to see the various "Tag:" entries.
If you wish you can now search for packages using the tags instead of any keywords which might be located inside the package description.
To do that you will need to install two new tools:
Installing both packages can be accomplished via apt-get:
apt-get install debtags debtags-edit
(Or "aptitude install debtags debtags-edit" - if you prefer aptitude.)
Once the debtags package has been installed you can conduct queries against the tags. Such as finding packages related to others.
For example you might be interested in seeing which package is related to bash:
skx@mystery:~$ debtags related bash bash3 - The GNU Bourne Again SHell (Version 3)
You can also search for packages which are related to IMAP mail:
skx@mystery:~$ debtags grep mail::imap mutt: application, interface::text-mode, made-of::lang-c, mail::imap, mail::pop, protocol::imap, protocol::ipv6, protocol::pop, role::sw-client, uitoolkit::ncurses, works-with::mail nail: interface::commandline, interface::shell, mail::imap, mail::list, mail::pop, mail::smtp, protocol::imap, protocol::pop, protocol::smtp, role::sw-client, special::completely-tagged, use::transmission, works-with::mail cyrus21-imapd: interface::daemon, mail::filters, mail::imap, network::service, protocol::imap, protocol::ipv6, role::sw-server, works-with::mail imapproxy: interface::daemon, mail::imap, protocol::imap, use::proxying squirrelmail: interface::web, made-of::lang-php, mail::imap, protocol::imap, works-with::mail getmail4: mail::imap, mail::pop, protocol::imap, protocol::pop, protocol::ssl
How did I know that mail::imap was the tag used for describing mail and IMAP ? That was the result of a "tagsearch":
skx@mystery:~$ debtags tagsearch mail mail::TODO - Need an extra tag mail::filters - Filters mail::imap - Mail access via IMAP mail::list - Mailing Lists mail::notification - Notification mail::pop - Mail access via POP3 mail::smtp - Mail transfer via SMTP media::mail - Email protocol::pop - Mail access via POP3 protocol::smtp - SMTP Simple Mail Transport Protocol works-with::mail - Email
There are several other options, perhaps the best way to learn more is to read the manpage by running "man debtags".
For much more detailed information please consult:
This article can be found online at the Debian Administration website at the following bookmarkable URL:
This article is copyright 2005 Steve - please ask for permission to republish or translate.