January 27, 2010

Ali Polatel

Batch tagging of audio files from the command line

As many of you know MusicDNS is an acoustic fingerprinting service and a software development kit provided by MusicIP. The fingerprinting client library that looks up and identifies audio files based on existing fingerprints is called libofa. MusicBrainz has a great audio tagger called Picard which can tag audio files by querying this MusicDNS service.

There is, however, a simple problem. Picard is a GUI and thus doesn’t allow batch tagging of audio files from command line.

Hence I decided to write my own tool for generating acoustic fingerprints and for querying MusicDNS service. I’ve chosen to use libsndfile to do the decoding as libofa expects raw audio data. libsndfile is a C library for reading and writing files containing sampled sound through one standard library interface. It’s pretty easy to use and its API hides most of the low-level details from the programmer.

The tool is named afprint, released under GPLv2. Following the UNIX philosophy it just does one thing, calculation of acoustic fingerprint and duration of the given audio file.

Usage is simple:

alip@harikalardiyari> afprint -h
afprint-0.1.0-7b17577 audio fingerprinting tool
Usage: afprint [-hVv0] <infile>

Options:
    -h, --help      Display usage and exit
    -V, --version   Display version and exit
    -v, --verbose   Be verbose
    -0, --print0    Delimit path and fingerprint by null character instead of space
If <infile> is '-' afprint reads from standard input.
alip@harikalardiyari> afprint -v sample.ogg
[dump_print.294] Format: OGG (OGG Container format)
[dump_print.295] Frames: 2188368
[dump_print.296] Channels: 1
[dump_print.297] Samplerate: 44100Hz
[dump_print.298] Duration: 49735ms
[dump_print.302] essential frames: 5953500 > frames: 2188368, adjusting
sample.ogg 49735 ARaJDAgL...

afprint decodes the audio data using libsndfile and feeds it to libofa. It also calculates the duration of the audio file and prints them in format: FILENAME DURATION FINGERPRINT

Reading from standard input is tricky because pipes aren’t seekable thus it’s not possible to calculate the duration of the audio file. For this reason, when the audio data is fed via standard input, when <infile> is -, afprint saves this data into a temporary file and reads from it. This makes it possible to calculate acoustic fingerprints of Mp3 files, which libsndfile doesn’t support, easily.

alip@harikalardiyari> mpg123 -q --au - 01_san_francisco.mp3|afprint -v -
[wav.c:388] warning: Cannot rewind AU file. File-format isn't fully conform now.
[wav.c:388] warning: Cannot rewind AU file. File-format isn't fully conform now.
[dump_print.294] Format: AU (Sun/NeXT)
[dump_print.295] Frames: 8000111
[dump_print.296] Channels: 2
[dump_print.297] Samplerate: 44100Hz
[dump_print.298] Duration: 181820ms
/dev/stdin.au 181820 AQMZN...

Note the --au option passed to mpg123 as --wav doesn’t work.

So far so good, now we need a tool to query the MusicDNS server to find out the PUID of the audio file and query MusicBrainz to get the audio tags.

I’ve written a simple Perl script to do the job. The script, which has the name puidlookup, reads audio fingerprints from standard input and queries the MusicDNS server. Optionally it can query MusicBrainz as well to receive the tags.

Here are the requirements:

Usage is simple, just pipe afprint’s output to puidlookup.

alip@harikalardiyari> puidlookup -h
Usage: puidlookup [-hVv0]
    -h, --help          Display usage and exit
    -V, --version       Display version and exit
    -v, --verbose       Be verbose
    -0, --null          Expect input is null delimited
    -m, --musicbrainz   Look up PUIDs from MusicBrainz
                        (requires WebService-MusicBrainz)
    -l, --limit         Limit results to the given number
puidlookup reads filename, duration and audio fingerprint from standard input

The --null option responds to afprint’s --print0 option. These options are useful if filenames have spaces or other weird characters in it.

By default it only queries MusicDNS:

alip@harikalardiyari> afprint 04sheep.ogg | puidlookup
ARTIST='Pink Floyd'
TITLE='Sheep'
PUID=930806c1-e1e0-588a-b7de-2dacb1b8b11e

The --musicbrainz option can be used to query MusicBrainz:

alip@harikalardiyari> afprint 04sheep.ogg | puidlookup --musicbrainz
PUID=930806c1-e1e0-588a-b7de-2dacb1b8b11e
TRACKID=431a85dd-e22b-4626-91c9-c0abb8058d3f
ARTISTID=83d91898-7763-47d7-b03b-b92132375c47
ARTIST='Pink Floyd'
TITLE='Sheep'
TRACK=4
ALBUM='Animals'

The output is quoted so it’s safe to pass to eval, making it easy to integrate with shell scripts.

Last step is writing a tagger script to tag audio files. I’ve written a shell script called ofatag which uses envtag. It recognizes Mp3 files using the file command and decodes using mpg123, other formats are directly fed to afprint.

Now, to tag your files using MusicBrainz web services just do
ofatag /path/to/music/*.mp3 /path/to/music/*.ogg
etc.

I haven’t released a version yet because it’s all pretty new and needs testing. So please test it and report back! Any comments, thoughts, patches are appreciated.

January 27, 2010, 08:00 UTC

January 23, 2010

Ali Polatel

sydbox-0.6.3

sydbox-0.6.3 is released. ( tarball, sign, sha1sum )

  • Resolve path of non-abstract UNIX sockets
  • Intercept dup family calls and fcntl calls to see if a socket descriptor we care about has been duplicated

January 23, 2010, 08:00 UTC

January 16, 2010

Ali Polatel

ptrace on BSD

ptrace is a system call which is used for process tracing and debugging. This system call is available on many operating systems. However each operating system has different versions.

I want to explain about my efforts to port sydbox to FreeBSD. The ptrace implementation of FreeBSD is similar to Linux’. The request PT_SYSCALL is available to stop the traced process at every system call and exit similar to PTRACE_SYSCALL of Linux. In addition to that FreeBSD has the requests PT_TO_SCE and PT_TO_SCX which stops the traced process only at the beginning of system call entry or exit. This is a feature I really miss on Linux.

There is, however, a big difference, I’m inclined to call it a bug, about ptrace on FreeBSD. When a traced process is stopped at the entry of a system call, there’s no way to prevent the execution of this system call. On Linux this is done by changing the system call number to either something invalid like 0xbadca11 or something harmless like getpid.

Here is an example:

    /* denying system calls using ptrace on Linux
     */

    #include <assert.h>
    #include <fcntl.h>
    #include <signal.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/reg.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <linux/ptrace.h>

    #if defined(__x86__)
    #define ORIG_ACCUM    (4 * ORIG_EAX)
    #elif defined(__x86_64__)
    #define ORIG_ACCUM    (8 * ORIG_RAX)
    #else
    #error unsupported architecture
    #endif

    int main(void)
    {
            int status;
            pid_t pid;

            if ((pid = fork()) < 0) {
                    perror("fork");
                    abort();
            }
            else if (pid == 0) {
                    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
                    kill(getpid(), SIGSTOP);
                    open("foo.bar", O_WRONLY | O_CREAT);
                    _exit(0);
            }

            if (waitpid(pid, &status, 0) < 0) {
                    perror("waitpid");
                    abort();
            }

            assert(WIFSTOPPED(status));
            assert(WSTOPSIG(status) == SIGSTOP);

            if (ptrace(PTRACE_SYSCALL, pid, NULL, NULL) < 0) {
                    perror("ptrace(PTRACE_SYSCALL, ...)");
                    ptrace(PTRACE_KILL, pid, NULL, NULL);
                    abort();
            }

            if (waitpid(pid, &status, 0) < 0) {
                    perror("waitpid");
                    ptrace(PTRACE_KILL, pid, NULL, NULL);
                    abort();
            }

            assert(WIFSTOPPED(status));
            assert(WSTOPSIG(status) == SIGTRAP);

            /* Change the system call to something invalid, so it will be denied.
             */
            if (ptrace(PTRACE_POKEUSER, pid, ORIG_ACCUM, 0xbadca11) < 0) {
                    perror("ptrace(PTRACE_POKEUSER, ...)");
                    ptrace(PTRACE_KILL, pid, NULL, NULL);
                    abort();
            }

            /* Let the process continue */
            ptrace(PTRACE_CONT, pid, NULL, NULL);

            waitpid(pid, &status, 0);
            assert(WIFEXITED(status));
            exit(WEXITSTATUS(status));
    }

Now although the traced process calls open("foo.bar", O_WRONLY | O_CREAT) the file foo.bar won’t be created because the tracer process denies the system call.

Here is the same example for FreeBSD:

    /* denying system calls using ptrace on FreeBSD
     */

    #include <assert.h>
    #include <fcntl.h>
    #include <signal.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <machine/reg.h>

    int main(void)
    {
            int status;
            pid_t pid;
            struct reg r;

            if ((pid = fork()) < 0) {
                    perror("fork");
                    abort();
            }
            else if (pid == 0) {
                    ptrace(PT_TRACE_ME, 0, NULL, 0);
                    kill(getpid(), SIGSTOP);
                    open("foo.bar", O_WRONLY | O_CREAT);
                    _exit(0);
            }

            if (waitpid(pid, &status, 0) < 0) {
                    perror("waitpid");
                    abort();
            }

            assert(WIFSTOPPED(status));
            assert(WSTOPSIG(status) == SIGSTOP);

            if (ptrace(PT_SYSCALL, pid, (caddr_t)1, 0) < 0) {
                    perror("ptrace(PT_SYSCALL, ...)");
                    ptrace(PT_KILL, pid, (caddr_t)1, 0);
                    abort();
            }

            if (waitpid(pid, &status, 0) < 0) {
                    perror("waitpid");
                    ptrace(PT_KILL, pid, (caddr_t)1, 0);
                    abort();
            }

            assert(WIFSTOPPED(status));
            assert(WSTOPSIG(status) == SIGTRAP);

            /* Change the system call to something invalid, so it will be denied.
             */
            if (ptrace(PT_GETREGS, pid, (caddr_t)&r, 0) < 0) {
                    perror("ptrace(PT_GETREGS, ...)");
                    ptrace(PT_KILL, pid, (caddr_t)1, 0);
                    abort();
            }

            r.r_eax = 0xbadca11;

            if (ptrace(PT_SETREGS, pid, (caddr_t)&r, 0) < 0) {
                    perror("ptrace(PT_SETREGS, ...)");
                    ptrace(PT_KILL, pid, (caddr_t)1, 0);
                    abort();
            }

            /* Let the process continue */
            ptrace(PT_CONTINUE, pid, (caddr_t)1, 0);

            exit(0);
    }

We expect the same to happen here, the file foo.bar shouldn’t be created. But it’s created. Replace the PT_GETREGS and PT_SETREGS calls with a PT_KILL to terminate process with signal SIGKILL. The file will still be created! So there’s no way to deny a system call using ptrace which makes it impossible to port sydbox to FreeBSD without patching the kernel.

None of the other BSD’s, neither NetBSD nor DragonFlyBSD nor OpenBSD, has the ptrace request PT_SYSCALL so I haven’t checked if the behaviour is the same on these systems.

January 16, 2010, 08:00 UTC

January 10, 2010

Wulf C. Krueger

Recognition

What keeps me doing things in my life are primarily two factors: Money and recognition. Not necessarily in that order.

In my job, I’m being paid to do what I do but I couldn’t ever be satisfied with just that. What really thrills me is being recognised for the professional I am. Receiving an email from a customer that simply said “Thank you. You’re one of the few persons I can always rely on.” made my day. I don’t get that from receiving my pay-cheque.

In my private life, I’m mostly a father, a husband and, last but not least, a guy who loves to work on Linux. A machine that just works is a boring machine. Thus, I really love working on Exherbo.

Working on Exherbo allows me to do and try everything, make things work exactly the way I want them to, give back to the FOSS community – and being recognised for the professional I am. :-)

Recognition, thus, is very important for me. Now, Bryan “kloeri” Østergaard, has decided to remove Exherbo’s “Developers” page which lists all the core developers in favour of a simple list of all contributors ever.

This in itself is fine with me. What I really don’t like about it is the fact, that those of us who do most of the work on Exherbo will be buried somewhere in that rather huge list (after all, Exherbo currently has about 95 contributors).

I have Exherbo in my CV as well but am I supposed to send recruiters to a list of everyone and their dog and find me in there with no indication of my level of involvement?

As much of a trifle this may look, it annoys me and so I’m now using gitstats to create statistics and a list of authors myself. It’s not hosted on exherbo.org (linked from our “Resources” page, though) but on my own server:

http://www.mailstation.de/egitstats/

I’m going to add a few more graphs and stuff over time (like changes per package directory, category, etc.).

If you have any suggestions (preferredly upstream-able ones), please let me know.

by Wulf C. Krueger at January 10, 2010, 17:19 UTC

January 09, 2010

Ciaran McCreesh

Paludis 0.44.0 Released


Paludis 0.44.0 has been released:

  • The ‘everything’ set is now called ‘installed-packages’. A new set named ‘installed-slots’ has been added, which is similar but includes slot restrictions matching installed slots.
  • kdebuild-1 support has been removed, following the Gentoo Council’s decision to remove all mention of it from the Package Manager Specification. Users with installed kdebuild-1 packages must remove them before upgrading.
  • Support for EAPI 4 (formerly known as EAPI 3) is present but not installed, since the specification has yet to be approved.
  • Support for the new EAPI 3 is present but not installed, since the specification has yet to be approved.
  • The [.key=value] syntax for user dep specs now works with sets, sequences and spec trees. If < is used instead of =, a less than comparison is used for numeric values, and for compound values, a match succeeds if any item of the key is equal to the specified pattern.
  • build_options: preserve_work can be used to avoid removing temporary working directories, and to force a non-destructive merge.
  • Profile updates (package and slot moves) are now enabled by default.
  • Workarounds for various interactivity abuses carried out by certain ebuilds have been added.
  • Various large code cleanups and build system cleanups.
Posted in paludis releases Tagged: paludis

by Ciaran McCreesh at January 09, 2010, 15:53 UTC

January 07, 2010

Bryan Østergaard

Half the solution..

For a long time I've wanted to replace the Developers listing on Exherbos website with a list generated from git log showing all the authors.

A while ago this got much easier as Ingmar Vanhassel and others added .mailcap files to our repositories. This means that some of my commits that I've accidentally made as kloeri@localhost can be grouped with kloeri@exherbo.org commits.

The actual page generation is still missing however so I'm looking for a volunteer that want to tackle this task.

Steps involved should be something like:
1. Clone git://git.exherbo.org/www.git
2. Read the Makefile to get an idea how our website is maintained. Reading my old blog post on our website setup is also useful
3. Figure out how to get a list of authors from git log. I just want a plain list containing the real names of all the contributors but without email addresses, commit count or other stats like that
4. Sort the list so it's easier to read
5. Make sure your list can be parsed by Maruku (a Markdown processor) and make sure it's processed along all the static .mkd files

Limiting the author list to just cover the arbor repository is fine for now.

Finally, give me a git format-patch of your changes so I can push it and we can all enjoy the improved website. Of course, you're more than welcome to ask me for help as needed.

January 07, 2010, 15:38 UTC

Ali Polatel

sydbox-0.4

sydbox-0.4 is released.

What’s new?

  • Make network sandboxing on by default.
  • When bind’s port argument is zero, look up the actual port from /proc/net/tcp{,6} after the subsequent listen call for network_restrict_connect.
  • GObject isn’t a dependency anymore.
  • Try hard to restore errno after ptrace errors.
  • Moved all check based unit tests to gtest. dev-libs/check isn’t a dependency anymore.

Download

January 07, 2010, 08:00 UTC

January 06, 2010

Ali Polatel

Network sandboxing and /proc

As many of you know sydbox can do network sandboxing but for some reasons we didn’t have it on by default on Exherbo.

For those who don’t know much about sydbox and network sandboxing let me explain it briefly. Network sandboxing has three modes:

  • allow: All network connections are allowed.
  • local: Only local network connections are allowed.
  • deny: No network connections are allowed.

In addition to that there’s a restrict_connect option which disallows connects to all addresses except addresses that one of the parents has bind()‘ed to.

There’s also a network white list which specifies the additional network addresses that are allowed in local and deny modes.

On Exherbo we use the mode local with restrict_connect option enabled.

One limitation of sydbox was it couldn’t white list bind() addresses whose port were zero. The reason is obvious. The only place we can look up the actual port is /proc/net/tcp, or /proc/net/tcp6 for ipv6, and we need to do this before the bind() call has completed. The problem arises here. The /proc/net/tcp entry is only created after the bind() call has succeeded.

The solution isn’t entirely trivial. We have to note the file descriptor argument of bind() along with the socket family and socket address and intercept the subsequent listen() call. Only then we can look up the port argument from /proc/net/tcp.

The sydbox master has a simple implementation to solve this problem. If the port argument of a bind() call is zero, we save the file descriptor and the corresponding socket family and address to a GHashTable. After that the subsequent listen() call is intercepted and if the file descriptor of the listen() call matches a file descriptor in the hash table, sydbox looks up the port from /proc/net/tcp, fills it in and white lists the address.

With sydbox-0.4, which I’ll release after some testing, network sandboxing will be on by default again for the Paludis profile.

Just to be on the secure side ;)

January 06, 2010, 08:00 UTC

January 04, 2010

Ali Polatel

mpdcron-0.3

mpdcron-0.3 is released:

What’s new?

  • Added stats module to keep statistics of played songs in a sqlite database
  • Added notification module to send notifications via notify-send
  • Added scrobbler module to submit songs to Last.fm or Libre.fm
  • Added module support through GModule
  • Added initial manpage
  • Changed name to mpdcron

Download

January 04, 2010, 08:00 UTC

Bringing Last.fm home with mpdcron (part 3)

I wrote a script to import Last.fm data to mpdcron’s statistics database with the name homescrape.

It’s written in ruby and requires nokogiri to parse HTML. Currently it can import play count and loved songs. By default it will import all your Last.fm tracks and if you don’t want that you can pass a date using the –since option. Optionally homescrape can make use of chronic to parse dates in a huge variety of date and time formats.

With this, the statistics module is complete feature-wise and I’ll release mpdcron-0.3 after some testing.

January 04, 2010, 08:00 UTC

January 02, 2010

Ali Polatel

Bringing Last.fm home with mpdcron (part 2)

First of all, happy new year to everyone!

After my first post I’ve done many things to improve this statistics module of mpdcron. Here’s a list of major improvements:

Client/Server protocol

After discussing on IRC with qball, we’ve decided that it’s a better idea to build a network abstraction so that clients won’t access the sqlite database directly. The protocol will simply be like mpd protocol with minor differences.

I used GIO to implement the server. GIO has a high level network API as of version 2.22.

Tagging

This was one of the things I really wanted to implement before releasing a new version. Tagging songs as you would do with mail is a really nice way to sort your music in my opinion. Tags are implemented as a colon delimited list and uses just one TEXT column of a row. This makes removing a tag a slow operation but after reading the sqlite optimization FAQ I managed to reduce this slowness a lot making it not noticeable.

The documentation on the website is up to date so if you want to give it a shot you can read the documentation and start using it right away. It’s very simple.

January 02, 2010, 08:00 UTC

December 26, 2009

Ali Polatel

Bringing Last.fm home with mpdcron

I’ve been working on an mpdcron module to save mpd song data, like play count, to a local sqlite database. Build mpdcron like:

$> ./configure --enable-gmodule --with-standard-modules=all
$> make
$> sudo make install

Then add:

[main]
modules = stats

to your configuration file.

Of course just saving the data isn’t enough. We need a client to manipulate the data and interact with mpd using this data.

Thus eugene was born.

Usage:

First create your database:

$> eugene update
Updating /
Successfully processed 12283 songs

Note this phase is optional. All other eugene commands will create the database if it doesn’t exist.

Basic interaction with the database:

# Love/Hate/Kill/Unkill the current playing song
$> eugene love/hate/kill/unkill
# Love/Hate/Kill/Unkill the current playing artist
$> eugene love/hate/kill/unkill --artist
# Love/Hate/Kill/Unkill the current playing album
$> eugene love/hate/kill/unkill --album
# Love/Hate/Kill/Unkill the current playing genre
$> eugene love/hate/kill/unkill --genre
# Give the current playing song a rating of 10
$> eugene rate 10
# Increase the rating of the current playing song by 5
$> eugene rate --add 5
# Decrease the rating of the current playing song by 10
$> eugene rate --substract 10

Advanced interaction with the database using the –expr switch:

# Love all songs whose artist includes the string Beatles
$> eugene love --expr 'artist like "%Beatles%"'
# Hate all songs whose genre is Pop
$> eugene hate --expr 'genre="Pop"'
# Kill all songs whose duration is less than 10 seconds
$> eugene kill --expr 'duration < 10'
# Unkill all songs whose play count is more than 10
$> eugene unkill --expr 'play_count > 10'

For more information about the expression syntax see:
http://www.sqlite.org/lang_expr.html
To learn more about the database layout see:
src/gmodule/stats/stats-sqlite.c

Loading songs to mpd queue:

# Load all loved songs, exclude killed ones
$> eugene load --expr 'love > 0 and kill != 0'
# Clear the playlist and load all songs with a duration less than 30 seconds
$> eugene load --clear --expr 'duration < 30'

This is all very basic right now and possibly buggy.
If you’re interested please check it out and tell me about it! I plan to release mpdcron-0.3 after some testing.

Last but not least:
Careful with that axe, Eugene!
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

December 26, 2009, 08:00 UTC

December 24, 2009

Ciaran McCreesh

Paludis 0.42.3 Released


Paludis 0.42.3 has been released:

  • Updated environment filtering code for bash 4.1.
  • Fixed symlink handling under SELinux.
Posted in paludis releases Tagged: paludis

by Ciaran McCreesh at December 24, 2009, 16:14 UTC

December 18, 2009

Ali Polatel

mpdhooker-0.2

The initial release!

About

MpdHooker is a daemon that adds hook support to mpd. Upon certain events it executes hooks. It uses mpd’s idle mode. It sets environment variables to pass data to the hooks. Read the README for more information.

What’s new?

See NEWS

Download

edit: Fixed link names, thanks to filko.

December 18, 2009, 08:00 UTC

February 09, 2010, 12:40 UTC