Weblog entry #91 for Steve

Looking up Audio CD-ROMs?
Posted by Steve on Wed 22 Mar 2006 at 14:19
Tags: none.
Audio CD-ROM Identification

There exist at least two systems for looking up song details by a "disk-id":

The problem is they suck.

Their details are limitted to artist + song. There is no flexible notion of genre.

They do not support:

  • Lyrics
  • Cover images.
  • Albums which contain multiple disks

I wish to replace these systems with my own, which will correct these faults.

There are three problems:

  1. How do you identify a given audio CD-ROM
  2. How do you store the data effectively.
  3. How do you allow clients to retrieve it.

I think the first one can be solved by:

request = sha1(audio Track 1 ) . sha1( audio Track 2 ) + .... sha1( audio track N )

(Where the requests are deliminated by some means. Perhaps the input will be XML?

The second is just a matter of database design:

  • Table: Songs
  • Table: Collections (has N songs)
  • Etc

Querying against the database with a given collection of hashs should be trivial. The hard part is giving back useful output. Perhaps:

<collection>
<name="The best of the doors" >
<artist>
<name="The Doors">
</artist>
<artist>
<name="The foo orchastra" >
</artist>
<track>
<sha1="xxxxx">
<md5="xxx">
<title="Test">
<Genre="Foo">
<Length="22:22">
<lyrics>
Once upon a time
Far far away
there was a cd-rom


It was cold
It was shiny
It was silver
</lyrics>

</track>
</collection>

Thoughts welcome. Pointers to already recognised schemas especially so.

 

Comments on this Entry

Posted by lee (193.82.xx.xx) on Wed 22 Mar 2006 at 14:34
[ Send Message | View Weblogs ]

In the words of Obi-Wan, "there is another". Musicbrainz is the system used by Sound Juicer for CD lookups - it falls back on FreeDB if it doesn't have an entry.

[ Parent | Reply to this comment ]

Posted by Steve (212.20.xx.xx) on Wed 22 Mar 2006 at 14:39
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Well spotted, thanks.

I think the biggest difference in my (proposed) scheme is that "genre", "artist", and other basic information becomes per-track rather than per-disk.

The only potential issue is the number of expected collisions in the track hashing. And the fact that the hashes only work on the .wav files, not anything else like mp3/ogg

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (213.164.xx.xx) on Wed 22 Mar 2006 at 14:47
> The problem is they suck.
I think they're great. A community of users submits track listings, so that when you enter a cd, you know what it is.

> There are three problems:
> 1. How do you identify a given audio CD-ROM
> 2. How do you store the data effectively.
> 3. How do you allow clients to retrieve it.
They were not intended to support that - but there's no reason they couldn't.
Each of the problems you list has already been solved by the sites you list, there's no reason to reinvent them.

What you want is to extend the existing databases to include more information.
Why not do that?

[ Parent | Reply to this comment ]

Posted by Steve (212.20.xx.xx) on Wed 22 Mar 2006 at 14:49
[ Send Message | View Steve's Scratchpad | View Weblogs ]
so that when you enter a cd, you know what it is. If only that worked. There are collisions which my suggested scheme would (I believe) avoid. Whilst it is true that a large part of the system I describe is relating to additional information, such as per-track artist + genre information, I can't easily see how to add that to the freedb.org system. And cddb is of course closed.

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (213.164.xx.xx) on Wed 22 Mar 2006 at 14:55
> There are collisions which my suggested scheme would (I believe) avoid.
But there are reasons why that scheme was chosen.

One reason is that audio discs aren't data discs, so an sha1sum of every track wouldn't necessarily be the same each time. It'd also be really slow, much slower than the current track length algorithm.

I still stand by my point though. Why re-invent it from scratch rather than talking to the freedb people? (This seems quite a contrast to your normal viewpoint)

Other problems:
* Who is going to submit all the extra data?
* Cover art copyright
* Song lyrics copyright.

[ Parent | Reply to this comment ]

Posted by Steve (212.20.xx.xx) on Wed 22 Mar 2006 at 14:58
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I'm not averse to having somebody else do the coding + hosting etc, so I most probably would contact the freedb people - but only after I have a proof-of-concept to share.

Otherwise I doubt many people would be interested.

I am a little concerned that reading the audio data might be non-identical, just from the sight of cdparanoia doing "error correction".

As for copyright, yes a valid concern.

One mitigating factor is that with this scheme using the SHA1 hash does ensure the submitter of a disk actually has a legitimate local copy.... Still I can see there will most likely be challenges there if the system were to be adopted.

(As for people inputting the data, probably a subset of the same people that do now. The ones that care about per-track information, and decent handling of compilations / multi-disk albums.)

Steve

[ Parent | Reply to this comment ]

Posted by lee (193.82.xx.xx) on Wed 22 Mar 2006 at 15:06
[ Send Message | View Weblogs ]

Musicbrainz allocates a unique identifier for everything entered into it's database, which contains the algorithmically generated IDs - e.g. "Ace of Spades"

[ Parent | Reply to this comment ]

Posted by Steve (212.20.xx.xx) on Wed 22 Mar 2006 at 15:10
[ Send Message | View Steve's Scratchpad | View Weblogs ]

That looks like a good GUID.

I'm curious how they decide that, and can tell the difference between the "Studio" version of the song, or one of the "Live" versions.

Knowing roughly how musicbrainz works I'd guess they couldn't tell .. but if I were honest in my music snobbery I'd want them both tagged differently.

Steve

[ Parent | Reply to this comment ]

Posted by Steve (212.20.xx.xx) on Wed 22 Mar 2006 at 16:40
[ Send Message | View Steve's Scratchpad | View Weblogs ]

There might be an issue with working with the server for some people, even though the data is "free":

Steve

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search