Most of my software is written in C or Perl5. For database (SQL) work I highly recommend PostgreSQL.
This is a project in the works. What I'm thinking of is writing a some tools in perl. I've currently got a song metadata persistence module, a CD ripping tool, and a CGI web browsing script. So far none of the tools here have reached a usable "release" stage.
The most usable portion is the persistence module for audio metadata, the AudioDBI interface. With some work, it could be turned into a general tool for persistence in Perl.
You might be able to view the web interface here if it is currently working (no promises).
I've put together some UML diagrams for AudioDBI as well. The class diagram shows most of the details. The main thing to know is creating objects is done through an AudioDBI object, and the created objects are aggregated together. The system is very flexible, as shown in this object diagram. The aggregation can be much less complex by using references to the same object, which is appropriate for a typical single artist CD. A simpler object diagram shows this.
AudioDBI is very generic, and is close to being able to run on databases outside of PostgreSQL. The only thing holding it back is it's currently using a PostgreSQL specific datatype (serial), along with some minor other PostgreSQL-ish things. These PostgreSQL centricities should be blatantly obvious in the code.
The database schema is also supplied. This must be entered into the database before using the AudioDBI module. This is a visual representation of the database schema. A bit of clarification: AudioDBI was designed to allow a music collection to be very tweaked. So a song has an artist and also a composer. This lets us easily store cover songs. Looking further, the song has a genre and the artist has a genre. Using this, it's easy to model songs that don't fit an artist's status quo. Using these tools you can easily construct a database for a various artist compilation or even an album of cover songs and maintain accurate records of who the original artist was along with the actual performer's metadata. Using references you can easily set a song's genre to the same as the artist's genre, so changing one will change both (if desired).
My solution wasn't elegant and a bit, errr, heavy handed. It was important to store all the data in memory, disk access would kill access times. It also had to be memory efficient, since each "row" was 30 two byte integers, along with a four byte key. Thus a million rows of raw data requires: (2 * 30 + 4 bytes) * million rows = 64 megs -- and that's without any usable structures for searching. The solution was simplified since inserting was only done once when the program began, and no further inserts or deletes would be performed.
My solution involved inserting the data on 30 arrays followed by a sort once all the data has been entered. The memory requirements are then (2 bytes * 30 values + 4 byte id + 4 byte pointer in array * 30) 184 bytes per entry. Each array can be searched in one dimension in log2 time via binary search. The search was done on each individual array then the results were merged. The merging takes significant time, I tested hashing methods and binary trees for this.
I'll put this up here if their is any interest for it. Since I've started reading Knuth's books I've been considering rewriting this. I don't have the dataset anymore so it's tough to test, which pretty much means I've given up on it.