Open Source computer vision “applications”

I’ve been having a bit of browse through the forest of Open Source computer vision “applications” as of late. I quote “application” to mean packages with a high level of integration — either end-user applications, or high-level libraries which are one step from a demo application — in contrast to say, OpenCV or ANN which are very much at the nuts and bolts level.

I’ve made no effort to evaluate these packages, just to call out that they exist. I’m ambivalent to academic work which leads to large monolithic packages — I guess I just want things to be a bit more atomic so the quality visualizer from this project can be mated to the descriptor code from that project etc without completely knocking the two apart.

But on the other hand, these are wonderful repositories for knowledge and code — code for constructing and decomposing fundamental matrices, bundle adjustment, etc. There’s already far too much re-re-re-engineering of algorithms — although I’m as guilty as others, as re-implementation is a great way to learn an algorithm inside and out.

Of course, at this point I should mention OpenSLAM which aims to be a repository (in the philosophical and literal sense) for SLAM algorithms and test datasets.

  • Insight3d is a full 3D reconstruction application. Images in one side, point cloud (and geometric model?) out the other.
  • PTAM (Parallel Tracking and Mapping) is a SLAM-like toolkit/application designed for augmented reality, coming out of the Machine Vision group at Oxford.

These are a bit more like libraries, but I’ll call them out:

  • Christopher Zach has published Simple Sparse Bundle Adjustment (SSBA). He’s also provided a full Structure for Motion toolkit.
  • SBA is another sparse bundle adjustment toolkit “from Manolis Lourakis at the Institute of Computer Science of the Foundation for Research and Technology – Hellas (FORTH) in Heraklion, Crete, Greece.”

New Scientist Enigma 1656

I was flipping through an issue of New Scientist over lunch, and came across their Enigma column. It seemed like a pretty simple computational problem, so I thought I’d give it a try. Here’s the problem (link to New Scientist site):


Five integers

I have written down five single-digit positive integers.

The sum of the five integers is divisible by the fifth integer, but not by any of the first four. The sum of the first four integers is divisible by the fourth but not by any of the first three. The sum of the first three integers is divisible by the third but not by either of the first two. The sum of the first two integers is divisible by the second but not by the first.

List the five integers in the order in which I have written them down.


My code follows. It does the problem backwards, starting with a list of all 2-digit combinations, then deleting any that don’t meet the criteria. Then it expands the survivors to three digits, then eliminates any that don’t pass the test, etc. up to five digits.

#!/usr/bin/env ruby

def divisible a,b
  p = a.to_f/b
  (a - p.to_i*b) == 0
end

def indivisible a,b
  not divisible a,b
end

# Given a list, computes the "enigma test": is the sum divisible by the
# last array but not by any of the other entries.
#
def enigma_test list
  s = list.inject(:+)
  p = list.slice(0, list.length-1).map { |m| indivisible s, m } << divisible( s, list[-1] )
  p.inject( true ) { |s,i| s and i }
end

# Given an array of N arrays, returns an array of 9N arrays, each of the
# original with the digits 1..9 tacked on the end.
#
def add_digit list
  list.map { |l| Array.new(9) { |i| [ *l, (i+1)] } }.flatten(1)
end

# Seed the original array...
#
a = (1..9).to_a.map { |i| [i] }

4.times { |j|
  a = add_digit a
  a.delete_if { |i| not enigma_test i }
  puts "For length #{j+2}"
  p a
}
view raw enigma1656.rb This Gist brought to you by GitHub.

And the output looks like:

For length 2
[[2, 1], [3, 1], [4, 1], [4, 2], [5, 1], [6, 1], [6, 2], [6, 3], [7, 1], [8, 1], [8, 2], [8, 4], [9, 1], [9, 3]]
For length 3
[[4, 2, 1], [4, 2, 3], [6, 2, 1], [6, 3, 1], [8, 2, 1], [8, 2, 5], [8, 4, 1], [8, 4, 2], [8, 4, 3], [8, 4, 6], [9, 3, 1], [9, 3, 2], [9, 3, 4]]
For length 4
[[8, 4, 2, 1], [8, 4, 2, 7], [8, 4, 6, 1], [8, 4, 6, 3], [8, 4, 6, 9], [9, 3, 4, 1]]
For length 5
[[8, 4, 6, 3, 1]]

IVCNZ 11

I’ve just returned from three days at the IVCNZ 2011 (that’s Image and Vision Computing NZ) conference up in Auckland. It’s my first academic conference in a while (10+ years) and I’d have to say a good time had by all. It’s quite a small conference (~120 participants, one track), so it’s almost more of a club gathering than a conventional conference.

I’m trying to be good, working through my notes and whatnot.

Not 100% sure about the protocol when it comes to reproducing published material, so I won’t. Perhaps I’ll just give a shout-out to the presenters and material that caught my eye.

Once IEEE has the papers on line I can back-annotate with DOIs.

Adding a MogileFS proxy

As noted previously, I’m using MogileFS as the data store for some high(-ish)-performance computing. So far, it does the job without complaints. I haven’t benchmarked it, but I’m not asking that much.

One issue is that it uses its own proprietary access protocol … sortof. A simple Mogile read goes something like:

  1. Computer A decides it wants a file from the Mogile cluster. It contacts one of the trackers using the Mogile protocol. [Computer A needs to know the hostnames of the tracker(s) in advance.]
  2. The tracker has a think, checks the database, and returns the path to the requested file on one of the storage hosts. This path is a normal URL. It passes this URL back to the client.
  3. Computer A uses HTTP to get the file from that URL on the storage host.

The fly in the ointment, of course is that the original client needs to speak Mogile to the tracker. I’m using a hacked version of SeattleRB’s Mogile-client Ruby gem in all of my various programs and clients to access Mogile directly.

One of those programs is a web front-end for browsing the data. When it wants to insert a Mogile asset into a web page (an image, typically), it will contact the tracker, get the storage URL, and insert it directly into the page, so you end up with HTML that contains code like:


<img src="http://my.mogile.storaged:7500/dev1/0/000/405/0000405859.fid">

The problem is that all of my Mogile trackers and storage nodes are behind a firewall (as they should be). When I’m on campus, this works fine, as my desktop can see both the web server and the storage nodes.

When I work off campus, I can use an SSH tunnel to get to the web server, but all of the storage node paths break terribly.

I’d like to add a web proxy, preferably running on the same web server as my web app, which can handle the Mogile requests. Instead of returning raw Mogile URLs, the web app can pass paths to the proxy, the proxy can get the data from Mogile, and send it back as HTTP. As my web app is mostly for browsing, the proxy can even be read-only.

How I did it

As a first attempt, I was more interested in getting a solution running than any sense of performance. As my web app is in Rails 3, I knew I could basically delegate a path to a separate, self-contained Rack server. The Resque control panel does this, for example.

As a sketch, then, I want any path of the form:

http://webserver.com/mogproxy/some/key/something

to return whatever Mogile has stored under the key /some/key/something

The first step was to add a route to the application in routes.rb:

mount MogProxy::Server, :at => "/mogproxy"

view raw routes.rb This Gist brought to you by GitHub.

I put my application code at lib/mogproxy/ which resulted in a bit of a search on autoloading files and require paths. The solution I arrived at looked something like this in application.rb

module AerialImageDb
  class Application < Rails::Application
    # ...snip...

    # Custom directories with classes and modules you want to be autoloadable.
    config.autoload_paths += %W(#{config.root}/lib)

    # ...snip...

  end
end

require 'mogproxy/mogproxy'

It may be possible to do away with the require through clever use of the autoload path, but I didn’t feel the automatic-ness was really worth it. Note the require has to go at the end after the autoload_path is set.

The application below is written in Sinatra because I’ve used it before and it’s handy. It’s pretty simple — take whatever path comes in and look it up in Mogile. Return the result.

require 'sinatra/base'

module MogProxy
  class Server < Sinatra::Base
    include MogileModule::Mogilable

    configure(:production, :development) do
      enable :logging
    end

    get %r{/([\w\d\/_]+)} do
        path = '/' + params[:captures].join
      begin
        logger.info "MogProxy fetching mog path #{path}"
        mog.get( path )
      rescue
        logger.info "Mogproxy: Hm, problem fetching #{path}"
        raise Sinatra::NotFound
      end
    end

  end
end

view raw mogproxy.rb This Gist brought to you by GitHub.

[Yes, I know, my regexp-fu is weak...]

As a side note, the MogileModule::Mogilable and mog bits are code I wrote to make a class (*cough* global) variable Mogile client instance, which can be configured once and used throughout the application.

It’s hardly efficient because it loads the asset from Mogile then spits it back out at the client. And the error checking is basically absent. But it works.

On the other side, I then rewrote my client code to return “/mogproxy/….” paths instead of looking up the mog paths themselves.

A small MSP430 board

I couple of months back I acquired a TI Launchpad, their loss-leader development board for their MSP430 family of microprocessors. I’ve had a long-term interest in the MSP430 as an alternative in the realm of 8-/16- bit micros.

Out of the box, I was able to the set up the mspgcc toolchain, and the other installation details were amalgamated from the interwebs. I believe the msp toolchain is now in the Ubuntu repositories, so all of the compilation la-di-da is a thing of the past.

Sadly, I was without an application.

The next week my coworker asked me where he could find a signal generator — I made the mistake of inquiring and it turned out he needed a 30Hz (later 25Hz) synchronization pulse for a cluster of computer vision cameras. Clearly a signal gen would be massive overkill (and would tie up the generator for months to come), so I offered to make a signal generator for him.

My first version was built on the Launchpad, and worked fine. Last week I finally got around to constructing a self-contained version.



Total time to first prototype was perhaps four hours, though I went through three iterations of the clocks — first using the internal very-low power oscillator (VLO), then the internal digitally-controlled oscillator (DCO), and finally using an external 32.768 kHz watch crystal. Each step was an attempt to find a little bit more timing accuracy.

For the curious, the schematics are here

It’s not the minimum number of components to run an MSP430, but close enough. Nor is it the smallest MSP that would do that job. But the G2211 came with the Launchpad.

The USB port only provides power (it’s convenient and there’s no chance of getting the voltage or polarity wrong). The C connecting reset to ground is recommended, but I’m not sure it’s mandatory. It should provide a little highpass filtering on the reset line. As built there’s no in-circuit programming, hence the socketed processor. It wouldn’t be too hard to add an ISP but I didn’t see the point.

The code is as follows (or here)

/******************************************************************************
 * 25Hz flasher.  Ripped unashamedly from the Launchpad demo app.
 * This version is targeted at the 430G2211 with a watch crystal
 ******************************************************************************/

//#include   <-taken care of by includeing io.h and setting -mmcu=msp430x2012 in cflags
#include 
#include 

#define     LED0                  BIT0
#define     LED1                  BIT6
#define     LED_DIR               P1DIR
#define     LED_OUT               P1OUT

// The Launchpad has two LEDs
// but the app board has just one
#undef      TWO_LEDS

#define     TA_OUT0               BIT2
#define     TA_DIR                P1DIR

void main(void)
{
  WDTCTL = WDTPW + WDTHOLD;                 // Stop WDT

  TA_DIR |= TA_OUT0;
  P1SEL  |= TA_OUT0;                        // Should set Pin 1.2 to TimerA out1

#ifdef TWO_LEDS
  LED_DIR |= LED0 + LED1;
  LED_OUT &= ~LED1;
#else
  LED_DIR |= LED0;
#endif
  LED_OUT |= LED0;                          // To enable the LED toggling effect
  // Use a 32.768 KHz watch crystal
  BCSCTL1 = XT2OFF;
  BCSCTL2 = SELM_3 | SELS;                   // MCLK = SMCLK = LFXT1CLK
  BCSCTL3 &= ~(LFXT1S_3);                    // Clear LFXT1Sx for 32678 crystal on LFXT1
  BCSCTL3 |= XCAP_3;                      // Set ~12 pF capacitance 

  // As set up, MC_1 means count to CCR0
  // OUTMOD_7 means reset at CCR1, set at CCR0
  // ISR make the LEDs behave similarly -- RED on at CCR1, GREEN on at CCR0
  //
  // Uses toggle mode, OUT1 toggles every time TimerA == TACCR1
  // And TimerA is in Up mode, so it counts 0->TACCR0, then resets
  // So the half-period of the output is set by TACCR0
  //
  // 32768 / 655 = 50.02
  //
  // So ~655 for 25MHz
  TACCR0 = 655;                             // By calibration
  TACTL = TASSEL_1 |  MC_1;                  // TACLK = SMCLK, Up Mode
  TACCTL0 = CCIE;
  TACCTL1 = OUTMOD_4;                // Toggle mode
  TACCR1 = 0;

  WRITE_SR(GIE);                     // Enable interrupts

  /* Main Application Loop */
  while(1) {
    __bis_SR_register(LPM3_bits + GIE);          // LPM0 with interrupts enabled
    //nop();
  }

}

interrupt(TIMERA0_VECTOR) ta0_isr(void)
{
  static unsigned int count = 0;

  TACCTL0 &= ~CCIFG;

  if( ++count >= 25 ) {
#ifdef TWO_LEDS
    LED_OUT ^= (LED0 | LED1);
#else
    LED_OUT ^= LED0;
#endif
    count = 0;
  }

}

Not the tidiest code in the world. The 25Hz square wave is driven by the TimerA, while the flashing LEDs (the Launchpad board has two blinkenlights, while my board has just one) are driven my a more conventional counter-in-an-ISR. I’m quite sure I could fix it up so the LED is controlled by a timer as well, but I haven’t bothered.

Resistor search in Ruby

Here’s a short snippet I wrote the other day. I think it’s a good illustration of Ruby’s potential for short scripts.

I needed to set the resistor divider to get 5.4V out of an adjustable LDO regulator (a REG102-A). The output voltage is set as:

V_{out} = \left( 1 + \frac{R1}{R2}\right) V_{ref}

For the REG102, V_{ref} = 1.26V, and the bias current was about right if R1 and R2 were both in the order of 1-10K.

On the other hand, I had a big bag of completely unsorted SMD resistors. Somewhere in there was a nice combination that would give me the voltage I wanted.

The script reads a list of resistor values (one per line) from a file (by default ‘r.txt’). The output looks like:

Have these resistors: 0.91, 2.2, 4.7, 6.81, 8.06, 10.0, 11.5, 15.0, 22.0, 24.9, 30.0, 33.0, 35.7, 40.2, 42.2, 47.0, 51.0, 53.6, 56.0, 68.0, 100.0, 215.0, 226.0
Searching for Vtarget = 5.4
R1	R2	vout	error
24.90	8.06	5.15	-0.05
68.00	22.00	5.15	-0.05
6.81	2.20	5.16	-0.04
35.70	11.50	5.17	-0.04
47.00	15.00	5.21	-0.04
215.00	68.00	5.24	-0.03
15.00	4.70	5.28	-0.02
22.00	6.81	5.33	-0.01
33.00	10.00	5.42	+0.00
226.00	68.00	5.45	+0.01
100.00	30.00	5.46	+0.01
51.00	15.00	5.54	+0.03
40.20	11.50	5.66	+0.05

The code loads the resistors from the file, and removes duplicates, then calculates Vout for every permutation. It then deletes any permutations which don’t get within 5% of the desired voltage, and sorts and prints the results.

As a side note, it doesn’t matter than the resistors are given in Kohms (or Ohms, or MOhms, etc) because they’re divided in the script, as long as they’re in the same units.

Yes, it’s non-optimized because it does build an enormous list of possibilities, then throws the vast majority away. Even a simple permutation like not calculating Vout if R2 > R1 would eliminate many possible pairings.

But, there you go.

#!/usr/bin/env ruby

TARGETV = 5.4

filename = ARGV.pop || "r.txt"
r = ($File.open( filename ) { |f| f.readlines.map { |l| l.to_f } }).sort.uniq

puts "Have these resistors: " + r.join(', ')
puts "Searching for Vtarget = #{TARGETV}"

vals =  r.permutation(2).map { |res|
  vout = 1.26 * (1 + res[0]/res[1])
  error = (vout-TARGETV)/TARGETV
  res+[vout, error]
}.delete_if { |res| (res[3]).abs > 0.05
}.sort { |a,b| a[2] <=> b[2] }

puts %w( R1 R2 vout error ).join("\t")
vals.each { |v|
  puts "%.2f\t%.2f\t%.2f\t%+.2f" % v
}

Disabling shutdown when the power button is pressed

I run a pretty sparse Linux installation on my desktop. It’s based on Xubuntu 11.04 (right now), but typically I use musca in preference to the full xfce4 experience.

By default, if no other power management software (like xfce4-power-manager) is running, pressing the power button will initiate a machine shutdown. I turn my desktop off most evenings, but when I do leave it on, I’d always stumble in the next morning, stab the power button … and watch it shutdown.

Here’s the quick fix. When the power button is pressed, acpid loads the event at /etc/acpi/events/powerbtn. By default this calls the script /etc/acpi/powerbtn.sh.

This script checks to see if any power management programs (again, gnome-power-manager, kpowersave, xfce4-power-manager, etc) are running. If they are, it just exits. If not, it gets to the last line:

# If all else failed, just initiate a plain shutdown.
/sbin/shutdown -h now "Power button pressed"

Well there’s the problem. I commented out the shutdown, reloaded acpid (this probably isn’t necessary), and away I went.

Allowing Ruby Benchmark reports to return values

I’ve recently started using Benchmark to record basic timings. It’s hardly profiling, more just a running record of where the time goes.

One annoyance (for me) was that Benchmark::report returns timing information, not the return value from the block being benchmarked. This means adding benchmarking to a function like:

def a_function
  a = an_expensive_function
  b = another_expensive_function
  c = a_third_function(a,b)
  puts c
end

Requires pre-defining a,b,c outside of the report blocks … otherwise they get scoped out:

require 'benchmark'

def a_function
  Benchmark.bm do |bm|
    a = b = c = nil

    bm.report("A") { a = an_expensive_function }
    bm.report("B") { b = another_expensive_function }
    bm.report("C") { c = a_third_function(a,b) }
    puts c
  end
end

which is an annoyance.

I’ve redefined Benchmark::report as:

module Benchmark
  class Report
    def report( label="", *format, &blk )
      retval = nil
      item( label, format ) { retval = blk.yield }
      retval
    end
  end
end

Which lets you do the (slightly more intuitive):

def a_function
  Benchmark.bm do |bm|
    a = bm.report("A") { an_expensive_function }
    b = bm.report("B") { another_expensive_function }
    c = bm.report("C") { a_third_function(a,b) }
    puts c
  end
end

Reconsidering my foolishness

I took a few minutes to try option B on this serial-to-network problem.

Here’s a much simpler option.

On the Ministation, run a script like (this is the lightly tarted up version):

#!/bin/sh

PORT=/dev/ttyACM0
SERVER=localhost
SERVER_PORT=1234

stty -F $PORT 57600

while [ true ];
do
cat $PORT | nc $SERVER $SERVER_PORT
sleep 1
done

The client netcat (nc) will die if it can’t connect to the server, hence the while loop. The sleep prevents flapping in that case. [Though as I discovered below, this only works as long as bits are coming into the serial port...]

And on the other end:

#!/bin/sh

PORT=1234
OFILE=/tmp/ofile.txt

nc -kl $PORT >> $OFILE

Time to try it out, I think…

Update:

Here’s an even more tarted up version which lets you send data back out the serial port by poking it to the server through a FIFO.

Client:

#!/bin/sh

PORT=/dev/ttyUSB0
SERVER=localhost
SERVER_PORT=1234

stty -F $PORT 57600

while [ true ];
do
cat $PORT | nc $SERVER $SERVER_PORT > $PORT
sleep 1
done

Server:

#!/bin/sh

PORT=1234
OFILE=/tmp/ofile.txt
OUTFIFO=/tmp/foofifo

if [ ! -p $OUTFIFO ]
then
mkfifo $OUTFIFO
fi
tail -f $OUTFIFO | nc -kl $PORT >> $OFILE

This seems pretty robust to most combinations of server or client problems except if the client can’t reach the server and there’s no data coming in the serial port. The client’s nc will die but the associated cat won’t die till it tries to write to the broken pipe.

Does this matter? Yes. If the client starts and can’t connect to the server, the nc will die, and if the widget is mis-configured to not output any data on the serial port, the (FIFO)->(server’s nc)->(client’s nc)->(serial port) redirection won’t work because the client’s nc is dead, but because the cat is still running, the while(1) won’t cycle. There’s probably a clever way around this, but I’ve definitely reached the limits of my shell scripting abilities.

Of course, if you have a network connection you could always SSH to the client and poke the serial port manually.

Ah, and these scripts will require old-skool logfile rotation:

cp ofile.txt ofile.txt.2 && cat /dev/null > ofile.txt

I assume logrotate can handle that elegantly.

2nd side-note. The Ministation SDK doesn’t build busybox’s stty by default. You’ll probably want that for setting the baud rate on the serial port. It’s easily added by changing the relevant line in conf/xs2/busybox.config to:

CONFIG_STTY=y

ser2syslog

For the weather buoy project we need a way to take data in over our Ministation’s serial port and push it upstream over the network. As the ministation is pretty limited in flash and ram, I assumed a compiled (as opposed to scripted) solution would be best.

I think my ideal system would be store-and-forward, with the serial port as a “push,” something along the lines of:

  1. Open the serial port.
  2. Forever:
    1. Read an buffer the data.
    2. When you have one “line” of data, try to poke it out over a network interface.
    3. If you succeed, forget that line of data.
    4. If you fail, save it to try next time. When you reach some maximum amount of backlog, start dropping the older messages.

Where the other end of the conversation would be a network-to-disk server:

  1. Open a network port
  2. As data comes in, write it to a file.

It sounds like a simple brief, but I couldn’t find a canned solution which does the job.

One obvious choice is ser2net, though it’s actually designed the other way around. As a “server,” it listens on a network connection (many, actually), opening serial ports as necessary when a connection comes in.

I’m a bit rusty on Linux network programming, so as an interim, I thought I’d try a serial-to-syslog gateway, operating as above but pushing data to syslog (who can then handle the network forwarding).

It’s not a terrible solution, with a couple of caveats:

  • It only really works because our data is read-only, and it’s NMEA-like strings, so it’s ASCII, nicely delimited into lines. Syslog wouldn’t work with binary data, I don’t think.
  • The Ministation SDK runs busybox’s syslogd, which only supports UDP forwarding, not TCP. OK, so it would only provide one additional ACK of security, not true store and forward, but it’s would be a start.

On the other end of the connection, I’ll have a full-fat computer running rsyslogd which can handle a bit of store and forward to servers off in the cloud.

In any case, I wrote a ser2syslog. I won’t pretend it’s polished, but works for me. As per the README, I was originally going to build it from the framework of ser2net, but given ser2net is a mature, featureful product, and arranged backwards (as above), I stripped 95% of the ser2net code out. I believe (hope!) the copyright and attribution are correct.

Next up is testing it on a Ministation.

Update:

Hm. I just realized I could do the same thing very simply on the command line. Something like:

$ cat /dev/ttyS0 | logger -p local7.info -t “/dev/ttyS0″

And I could probably achieve what I need network-wise with netcat. Sigh….