A simple introduction to Debian package tags
Posted by Steve on Fri 22 Jul 2005 at 10:29
One of the new features being introduced into Debian's unstable distribution currently is a "tag" implementation. This allows small pieces of meta-data to be associated with each package in the archive, this data can be useful for searching, and finding new packages.
The historical way that the Debian archive has been managed has been to split it up into sections. There are a small number of sections available and each package belongs to one, and only one, section.
For example a game would go into the Games section, and a Perl library would go in the Perl section. You can see each of the sections, and a brief description here on the Debian website.
Whilst the sections allow a simple and efficient way of categorising software the system suffers from two main flaws:
- The sections are too coarse; not very fine-grained.
- A package can only belong to one section.
As a result of this various people have proposed expanding the number of available sections at different times. Another more flexible and open-ended solution has also been proposed several time; adding "tags" to packages to allow them to be described and categorised more fairly.
The tags system is now live in Debian's unstable distribution (codenamed Sid) and should make it into the Etch release.
I first noticed this by accident when viewing the description of a package with apt-cache. If you view, for example, the description of the tidy package you will see the tag information at the bottom:
skx@mystery:~$ apt-cache show tidy Package: tidy Priority: optional Section: web Installed-Size: 40 Maintainer: Jason ThomasArchitecture: i386 Version: 20050415-1 Depends: libc6 (>= 2.3.2.ds1-21), libtidy0 Suggests: tidy-doc Filename: pool/main/t/tidy/tidy_20050415-1_i386.deb Size: 17020 MD5sum: 983571c271b64f93b01903f56479a70d Description: HTML syntax checker and reformatter Corrects markup in a way compliant with the latest standards, and optimal for the popular browsers. It has a comprehensive knowledge of the attributes defined in the HTML 4.0 recommendation from W3C, and understands the US ASCII, ISO Latin-1, UTF-8 and the ISO 2022 family of 7-bit encodings. In the output: . * HTML entity names for characters are used when appropriate. * Missing attribute quotes are added, and mismatched quotes found. * Tags lacking a terminating '>' are spotted. * Proprietary elements are recognized and reported as such. * The page is reformatted, from a choice of indentation styles. . Tidy is a product of the World Wide Web Consortium. Tag: interface::commandline, use::checking, role::sw-utility, format::html, devel
As you can see the last line of the output includes various tags - giving some details about how it is used "interface::commandline", etc.
This information isn't contained in the Debian package itself, but instead it is contained inside the Packages file.
When you run "apt-get update", or "aptitude update" you connect to a number of repositories and download files which contain details about all the packages held on that repository, including their size, their description, etc, this information can be used to search for a package. Now this file also includes tag information.
The package lists are stored in the directory /var/lib/apt/lists, and are simple text files - You can examine them yourself if you wish to see the various "Tag:" entries.
If you wish you can now search for packages using the tags instead of any keywords which might be located inside the package description.
To do that you will need to install two new tools:
- debtags - Commandline interface to libdebtags functions and Debtags administration tool
- debtags-edit - GUI application to search and tag packages
Installing both packages can be accomplished via apt-get:
apt-get install debtags debtags-edit
(Or "aptitude install debtags debtags-edit" - if you prefer aptitude.)
Once the debtags package has been installed you can conduct queries against the tags. Such as finding packages related to others.
For example you might be interested in seeing which package is related to bash:
skx@mystery:~$ debtags related bash bash3 - The GNU Bourne Again SHell (Version 3)
You can also search for packages which are related to IMAP mail:
skx@mystery:~$ debtags grep mail::imap mutt: application, interface::text-mode, made-of::lang-c, mail::imap, mail::pop, protocol::imap, protocol::ipv6, protocol::pop, role::sw-client, uitoolkit::ncurses, works-with::mail nail: interface::commandline, interface::shell, mail::imap, mail::list, mail::pop, mail::smtp, protocol::imap, protocol::pop, protocol::smtp, role::sw-client, special::completely-tagged, use::transmission, works-with::mail cyrus21-imapd: interface::daemon, mail::filters, mail::imap, network::service, protocol::imap, protocol::ipv6, role::sw-server, works-with::mail imapproxy: interface::daemon, mail::imap, protocol::imap, use::proxying squirrelmail: interface::web, made-of::lang-php, mail::imap, protocol::imap, works-with::mail getmail4: mail::imap, mail::pop, protocol::imap, protocol::pop, protocol::ssl
How did I know that mail::imap was the tag used for describing mail and IMAP ? That was the result of a "tagsearch":
skx@mystery:~$ debtags tagsearch mail mail::TODO - Need an extra tag mail::filters - Filters mail::imap - Mail access via IMAP mail::list - Mailing Lists mail::notification - Notification mail::pop - Mail access via POP3 mail::smtp - Mail transfer via SMTP media::mail - Email protocol::pop - Mail access via POP3 protocol::smtp - SMTP Simple Mail Transport Protocol works-with::mail - Email
There are several other options, perhaps the best way to learn more is to read the manpage by running "man debtags".
For much more detailed information please consult:
Managed AntiSpam
Fully managed filtering of your incoming email.
[ Parent | Reply to this comment ]
I just can't untill they implement deb database in SQL - is so close ;) But anyway that's great idea - consolidation and signs of Debian maturation.
Is it influenced by new Debian Project Leader, that I see more movement in Debian through Debian developers?
While I was waiting for sarge Debain development was, how to say, unaimed. Right now there are many movements in many directions, but I feel that they are well "leaded". Keep good work :) Etch will rock ;)
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
I don't think that many of the changes have come from the top, Debian really doesn't work like that.
The people implementing all the new ideas are really doing it of their own volition. Things like the tag support have been happening in the background for months before going live.
Whilst it's true that Branden is organising some teams for various purposes he's not directing technical direction - more making sure the projects infrastructure is OK.
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
What I see know, is that Debian community have thier destination, and they are selfrigorous, so waste is lowered.
Luke
PS. Ah! LANG=en_rubbish, but I hope I'm understood ;)
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
I think a lot of the slowdown is due to the sheer size of the distribution. Although pre-Sarge there were a lot of things that were just too late in the day, that couldn't be introduced without postponing the release even further.
It's very difficult to gain a consensus amongst all the developers in any single topic - so a lot of people give up too soon, or have to practically implement a solution fully before people can see how useful / difficult it is.
In small groups a simple idea can be discussed and agreed upon by all present. But this isn't often something that can work for a big project like Debian.
Still it's great to see new, useful, and interesting developments "make it".
(Your language is just fine, don't worry!)
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
stuff like this requires infrastructure changes. that requires a lot of time to design and plan. it also requires proper timing. until just recently with the release of sarge, everybody was holding off on any major developments because a release was perpetually pending "any day now".
now that sarge is release, we are seeing major developments being implemented: debtags, x.org, openoffice.org 2, etc.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
If you're on Sarge you're out of luck.
The sarge package files will not contain the information, and the software tools in Sarge wouldn't know what to do with it even if the information was available.
Querying on the web browsable tag catagories is possible though - it is linked to from the DebTags homepage included in the article.
(There's no reason why somebody with access to the Sid packages files couldn't setup another simple online CGI script if that implementation goes away, or to allow different searching/browsing possabilities. The tags are very simple to work with).
Steve
-- Steve.org.uk
[ Parent | Reply to this comment ]
They will however be kept separated, and you won't see them when doing apt-cache show.
--Enrico
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]