Software

I've been writing so much software I've got it coming out of my ears. Some of it is getting close to being fit for public consumption -- or, at least as close as it's going to get. If you find any of the software below usable let me know. If you fix bugs and provide patches that'd be even better!

Most of my software is written in C or Perl5. For database (SQL) work I highly recommend PostgreSQL.


Software Audio Manager (SAM)

I'm working on providing an audio manager of sorts. What does this encompass? An audio manager should (at least) be able to rip audio from CDs to MP3 or a similar format, store the disc metadata, provide classification tools, and allow a listener to traverse an audio collection by various dimensions.

This is a project in the works. What I'm thinking of is writing a some tools in perl. I've currently got a song metadata persistence module, a CD ripping tool, and a CGI web browsing script. So far none of the tools here have reached a usable "release" stage.

The most usable portion is the persistence module for audio metadata, the AudioDBI interface. With some work, it could be turned into a general tool for persistence in Perl.

You might be able to view the web interface here if it is currently working (no promises).


AudioDBI (Perl persistence module for SAM)

AudioDBI utilizes Perl's DBI to provide persistent storage for audio metadata. The cool thing about AudioDBI is it allows for various types of string matching including exact, case insensitive, keyword, and (in the future) regular expressions matching. To find persisted data, the caller fills in as much or as little information as they know and search based on that. Storing data is even simpler, just call the "persist" method and your hierarchy is stored.

I've put together some UML diagrams for AudioDBI as well. The class diagram shows most of the details. The main thing to know is creating objects is done through an AudioDBI object, and the created objects are aggregated together. The system is very flexible, as shown in this object diagram. The aggregation can be much less complex by using references to the same object, which is appropriate for a typical single artist CD. A simpler object diagram shows this.

AudioDBI is very generic, and is close to being able to run on databases outside of PostgreSQL. The only thing holding it back is it's currently using a PostgreSQL specific datatype (serial), along with some minor other PostgreSQL-ish things. These PostgreSQL centricities should be blatantly obvious in the code.

The database schema is also supplied. This must be entered into the database before using the AudioDBI module. This is a visual representation of the database schema. A bit of clarification: AudioDBI was designed to allow a music collection to be very tweaked. So a song has an artist and also a composer. This lets us easily store cover songs. Looking further, the song has a genre and the artist has a genre. Using this, it's easy to model songs that don't fit an artist's status quo. Using these tools you can easily construct a database for a various artist compilation or even an album of cover songs and maintain accurate records of who the original artist was along with the actual performer's metadata. Using references you can easily set a song's genre to the same as the artist's genre, so changing one will change both (if desired).

AudioDBI-0.93.tar.gz


Multi-dimensional range search in C

Awhile back I tried doing a 30 dimension range search in C. This was written on a bet at a previous job that it couldn't be easily done (who won the bet has yet to be determined). The software was written on my own time after I got laid off, so it's 100% mine and not the companies. We needed to perform this 30 dimensional search on millions of records, and I was trying to show we didn't need a $QL solution such as Oracle.

My solution wasn't elegant and a bit, errr, heavy handed. It was important to store all the data in memory, disk access would kill access times. It also had to be memory efficient, since each "row" was 30 two byte integers, along with a four byte key. Thus a million rows of raw data requires: (2 * 30 + 4 bytes) * million rows = 64 megs -- and that's without any usable structures for searching. The solution was simplified since inserting was only done once when the program began, and no further inserts or deletes would be performed.

My solution involved inserting the data on 30 arrays followed by a sort once all the data has been entered. The memory requirements are then (2 bytes * 30 values + 4 byte id + 4 byte pointer in array * 30) 184 bytes per entry. Each array can be searched in one dimension in log2 time via binary search. The search was done on each individual array then the results were merged. The merging takes significant time, I tested hashing methods and binary trees for this.

I'll put this up here if their is any interest for it. Since I've started reading Knuth's books I've been considering rewriting this. I don't have the dataset anymore so it's tough to test, which pretty much means I've given up on it.


Kyle (remove invalid)
Last modified: Thu May 3 10:11:14 PDT 2007