Friday, March 30, 2012

Reverse engineering a motion-capture file format (or, the answer to my prayers... a week ago)

So, in a surprising turn of events, I am posting about something that I actually did for my research.  Part of the work I do involves motion capture; I use cameras and strobes and markers affixed to bony landmarks on the rat hindlimb to record the motion of the limb in space during behavior.  One of the motion capture files that I recorded was corrupted with noise, and could not be un-corrupted using the programs and tools from the system manufacturer.  Being the clever and industrious fellow that I am (read: I didn't want to do the analysis that I actually had scheduled), I spent a day to completely reverse-engineer the motion capture data file format and use that knowledge to create a program which completely removed the corruption from the file in question and allowed normal data analysis to occur.

The Problem

As I said above, part of the analysis I am doing for my research involves recording the kinematics of hindlimb locomotion.  The system that the lab purchased to get this data is passive and camera-based; that is, bits of shiny stuff (markers) are affixed to the subject and illuminated, and the grayscale images of the shiny stuff are used to infer the location of bony landmarks in space over time.

Since the shiny stuff is so shiny, it's usually easy to set a threshold on the grayscale images to get rid of non-marker sources in the image.  However, sometimes there is something else similarly shiny in the image; in those cases you 'mask' the offending pixels (always setting them to zero).  Of course, that also means that, if a legitimate marker moves into the masked area, it will not be detected.

My situation was that I had masked the offending reflections in the image... but then had to shift the treadmill around a bit.  As a result, some large reflections were present in the data.  They were such that the post processing (converting the grayscale images into labeled 3-D trajectories) just wasn't working; bits of whiteness from the reflections were being erroneously labeled.

Unfortunately, the system we use does not have a native facility for re-masking data after it's been recorded. So, I needed to roll my own.  To do this, I needed to understand the native data file format.

DISCLAIMER: The system we use is the Vicon Nexus.  This is NOT RECOMMENDED by them.  DO NOT USE THIS INFORMATION TO DO ANYTHING.  In fact, stop reading now.  I make no guarantees as to the usability or safety of the software provided here.

The Solution

I opened my my trusty hex editor (HxD by Maël Hörz) and took a look at a couple of the raw data files.  Long story short, they all shared almost identical initial segments (the first 770 bytes, specifically) which I assume are header information and contain ASCII sub-strings with the camera type and specs in plain text.  There were also two 4-byte-long sections of this header which described A: the number of images frames in the file and B: the offset (number of bytes from the beginning of the file) at which the 'index' began.  This header was followed by the second section (the largest by far) which contained the grayscale image and blob center data.  The final section was the 'index', which contained a series of 12-byte-long records describing the frame numbers (first four bytes) and the offset at which each frame began in the file (last eight bytes).

Sections of the data segment were arranged hierarchically; each object on a given level started with two bytes of 'start sentinel', four bytes describing the length of the object, and four bytes giving some other important number (e.g., camera number, number of blobs in a frame, number of grayscale scan lines in a blob).  The top level for each frame was, well, the frame; that is, all of the data taken during one sample.  Below the frame level was the camera subframe level; each of those contained the data for the given frame from one of the cameras.

A bit of indirection below the camera subframe, and we come to the meat of the file: the grayscale image data.  Each subframe specifies how many bytes long it is, and how many 'blobs' it contains.  Blobs are just contiguous sections of non-zero in the grayscale image.  Each blob then specifies how many bytes it contains, and how many horizontal scan lines of grayscale data it is made up of.  These lines then specify the X- and Y-coordinates of their left-most pixel, how many pixels long they are, and then proceed to actually post the grayscale data.  Using the file read and write commands makes traversing this hierarchy simpler, because the file pointer helps to keep track of where you are.

I keep things vague, because I don't want to ruin the fun for anyone else, and because I am a coward.

The Goods

Using my detailed notes on the structure of the data file and its many headers and start sentinel codes, I implemented several useful functions to make it possible to quickly and painlessly re-mask my data.  These functions are included in this .zip archive; the files are MATLAB m-files and use mostly generic, easily-ported syntax.  One of the functions uses the MATLAB sparse matrix data type; I have kept the non-sparse version of the code in the comments, so porting should be straightforward.

I reiterate from above: DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE DOING.  This is in no way endorsed by Vicon.  Always make a backup.  Et cetera.  Contact me if you have concerns and absolutely need to re-mask some bad Vicon data.  I post this only in the spirit of giving, in the hope that someone in the future, in the same spot that I was in a week ago, will be helped by my efforts.

The functions are as follows:
  • extractFrameIndices: This function extracts the offsets for all of the frames in the record.
  • makeSparseFrameRaster: This function extracts a specified frame from the record, for viewing.
  • remaskInPlace: This function dances through the record, zeroing out all of the pixels in the record the user desires
  • testFunc: An example masking function that I used to test this out (also, coincidentally, exactly the masking function that I needed applied to my data)
The re-masking function is designed to be as mutable as possible; the form of the masking can be as complicated as the user desires, since a handle (MATLAB version of pointer) to the masking function is passed to remaskInPlace, rather than parameters defining a restricted domain of masks.  The masking function receives as inputs from the calling function the X- and Y-location of the pixels in question as well as the index of the current frame, allowing the masks to be functions of time as well as space.  Additionally, user-specified parameters can be passed transparently through remaskInPlace to the masking function.

Sunday, March 11, 2012

Generating and outputting white, pink and red noise (or, all that effort just to generate signals you'd usually rather be without)

I've completed the design and verification of the noise generation software and hardware.  I have finalized the software for generating colored noise from white noise samples and I have mocked up the headphone amplifier circuit and verified that it works and performs as expected

CPU use by noise generation algorithms

One important last step in implementing the colored noise generators is determining how long they take to execute.  If it takes 300 cycles to generate a single sample of pink noise, and you've only got 200 cycles to do it per sample, you're going to fall behind.  Alternatively, the algorithm could take on average 100 cycles to complete, but 210 cycles in the worst case; what do you do to handle that worst case?  Additionally, how many extra cycles do you need per sample to do housekeeping tasks and run other subroutines?

In my situation, I need to generate a noise sample at regular intervals.  If the generator, on average, costs more cycles than you've got, you can increase your clock frequency, decrease your sample rat or try desperately to increase the efficiency of your code.  If the average execution time is good, but the worst-case takes too long, you can perform the above interventions or write some sort of wrapper function that keeps a sample or two generated ahead of time, to allow for the worst-case samples.

Looking at the actual execution statistics, however, leads me to believe that I have a more than acceptable margin even in the worst cases.  To get real, live cycle costs, I coded up a firmware that looks at a counter immediately before and after use of the algorithms, then transmits the difference over the serial port.  I then left the thing to run for about a second's worth of data (about 44100 samples for each of the three generators); the summary stats are:

  • White Noise: 10 cycles average, 10 cycles worst case, 10 cycles best case
  • Pink Noise: 90 cycles average, 99 cycles worst case, 88 cycles best case
  • Red Noise: 56 cycles average, 70 cycles worst case, 52 cycles best case

Since I'm running the CPU at 32MHz, and my sample rate is 44100Hz, that give me about 725 cycles per sample to get thing done; several times more than a comfortable margin even in the worst cases.  Since I am likely to only be generating the samples in a mode where other interactions/tasks are minimized (that is, not much else should be going on while the user has the device on), it may even be possible to reduce the CPU speed to improve battery life.

Verification of amplifier hardware design

I'm using Maxim's MAX9724B fixed-gain headphone amplifier IC.  I chose this part because:

  • It's monolithic: feedback resistors are internal and issues with stability and such have been taken care of by Maxim's engineers.
  • It contains a charge-pump voltage inverter to allow for bipolar output simply and easily from a positive rail; this allows for louder maximum output and removes the need for large output DC-blocking capacitors.  Additionally, if I end up needing a negative rail for other things, the device is designed to provide a little more current than it needs.
  • Click-and-pop suppression, RF noise rejection, very small package size.

Here's the chip, connected to the necessary capacitors and headphone jack.  I soldered onto that extra-tiny 12-pin TQFN by hand (this was a ridiculous experience).


I plugged everything in, and it worked.  The voltage on the charge pump output was -3.3V, and the headphone outputs sounded good.  I hooked up the scope to the outputs to see if the amplifier was rounding out the DAC switching transitions.  They were slightly rounded, but still very obvious.  However, as I just said, the outputs sounded okay, so I don't feel the need to alter the design to include lowpassing to get rid of the last of the switching transitions.

Next Steps

The only bits of hardware left to verify are the display and the REM detector; of course, these are likely going to be more difficult than the preceding hardware and software.  The display should be fairly straightforward in hardware, but coding the drivers will be more involved; It may be possible that someone has already implemented such software and I can crib off of that.

From the little bit I've already done prototyping the REM detector, it's likely to be somewhat temperamental.     The level of illumination provided by the IR LED has to be large enough to result in a big REM signal, but small enough that the zero-frequency current offset saturates the transimpedance amplifier.  It seems likely that the optimal illumination level will change between users and possibly even for single users between uses. As a result, I'll likely need to code in a feedback controller to optimize the illumination level for whatever the current operating conditions are.  I also need to nail down the properties of the REM differential current signal, to determine how often I need to sample the signal and whether it will be possible to gate the illumination signal in time with the sampling to minimize power usage.

(There is still the matter of the battery charge circuit; however, I trust the part data sheets and the IC has about a million pins, so manual soldering is going to suck if I try to mock it up.)

Thursday, March 8, 2012

A few things to know about using interrupts on the XMEGA (or, RTFM, seriously)

In my further adventures in XMEGA migration, I hit a couple of roadblocks (which will likely be hilariously obvious to most).  They relate to the use of interrupts, and they are as follows:

A: the XMEGA has a multi-level interrupt handler.  This is pretty neat; it means that you can set different interrupts to different priorities, with higher-priority interrupts able to interrupt lower-priority ones.  Of course, this means that enabling interrupts is slightly more involved than just setting the global interrupt enable bit (sei()); specifically, you need to enable each of the levels of interrupt (Low, Medium and High) by setting the appropriate bits in PMIC_CTRL.  This one is an obvious case of 'didn't read the manual'; I assumed that the default behavior of the interrupt controller would be identical to the TINY/MEGA controller.

B: don't EVER enable interrupts without specifying the corresponding interrupt routines.  This one is a little less certain than the last bit of advice, because it is borne out of my investigation of some flaky behavior I observed.

Specifically, I was trying to verify that I understood/could correctly use the real-time clock (RTC) peripheral.  My test would consist of initializing the RTC (and clocks, and ports, and USARTs) and transmitting a few bytes over the serial port every time the RTC overflow interrupt occurred. To make sure that the device was working otherwise, every 250ms it would output another character to prove that any problems were with the RTC implementation, and not the device setup code.

The check bytes were being sent... but the RTC interrupt was not being called.  To check that interrupts in general were working, I changed the interrupt routine vector to another counter's overflow, which I had not disabled from a previous bit of investigation.  It worked; and when I added another interrupt handler (one for both the RTC overflow and the counter overflow), both worked.  All I can figure is that the lack of a valid interrupt routine for the counter overflow (in the case where I had programmed a routine for the RTC overflow) meant that the interrupt handler was never apprised of the completion of the counter overflow interrupt routine (though a RETI instruction), so any subsequent interrupts were not handled because the controller thought that the original interrupt was still being executed.  This is predicated on the assumption that all unspecified interrupt vectors are by default handled by a simple RET instruction (which would leave the interrupt flag set, which makes a certain sort of sense as a default behavior); if true, it explains the behavior I observed.

The simplest solution is (at least for the AVR Studio/AVR-GCC environments) is to always specify a BADISR routine (that is, a default routine for otherwise unspecified interrupts).  This seems to take care of my problems, even with the counter overflow interrupt still enabled.


Saturday, March 3, 2012

Colored noise generation using a hardware CRC (or, 'it sounds right' is basically as good as mathematical rigor, right?)

Toward the 'noise maker' aspect of the sleep mask I'm designing, I've done some work toward generating white and colored noise on the AVR XMEGA hardware.  My goal was to generate white, pink and red/brown noise so that any of the three could be chosen by a user to aid their sleep.  So far, I have generated white, pink and red noise in simulation (using a model of the XMEGA's CRC generator) and have translated these simulations successfully to the actual microcontroller for the white and red cases. UPDATE: pink case also successfully implemented.

Background

Many devices and applications exist which generate various sounds/noises to aid sleep and drown out the irregular sounds which can interrupt deep sleep.  Three commonly provided sounds are white, pink and red noise; white noise has the most high-frequency content, pink less and red the least.

Colored noise

White noise refers, generally, to noise whose samples are totally independent of each other; knowledge of one sample gives no information about any other sample, previous or subsequent.  As a result, a white noise signal's autocorrelation is unity at zero lag and zero elsewhere.  Additionally, a white noise signal has equal power at all frequencies.  Different types of noise are commonly referred to by their 'color': white is so-called because it contains all frequencies, analogous to the way that white light contains all colors.  Pink noise is similar to white noise, except that it has a decreasing amount of power as frequency increases; where white noise has a constant power level (P(f) = 1, let's say), pink noise has 1/f power (P(f) = 1/f).  Red noise takes it a step further, having even less power at high frequencies (P(f) = 1/f^2).

Pink and red noise can be generated from white noise.  Red noise can be generated by integrating white noise (this is where its other name, brown/brownian noise, comes from: the path of the signal in time is a Brownian walk).  Pink is more difficult to generate; where red noise can be generated using integration or a first-order lowpass, generating pink noise would require some sort of half-order filter.  From my researches, there are two commonly-implemented methods for generating pink noise computationally from white noise: using a specially-specified higher-order filter, and the Voss-McCartney method.  These methods are quite capably and completely explained and expounded upon here and here; since the special filter method involved a lot of multiplication and tweaking and maybe even floating point math, I chose the Voss-McCartney method. It is very clever, and only relies on addition, subtraction and keeping track of a very small buffer of past-generated white noise samples.  I will go over it further when I describe my implementation below (I assure you, though, the sites above are quite top-notch).

Random and no-so-random numbers

The difficulty in generating white noise is finding a signal source whose output at one point in time is totally independent of its output at a subsequent time.  Or course, this is impossible with any normal computer system; its state at one clock cycle is deterministically derived from its state at the previous cycle.  It is possible to access randomness in the form of analog signals in the environment around the computer; if we have a bit of radioisotope, its decay will serve as a great random process.  A reverse-biased diode can also be used, as can the randomness in the timing of user interactions (if we happen to have a user close at hand).  The problems with the above hardware random number generators (and hardware random number generators in general) are twofold: first, the extra hardware can be expensive; second, the rate of random bit generation can be very slow (especially true in the case of the user input).

Faced with these costs, it would be nice if we could, deterministically within the computer, generate numbers which are random-like.  While the computer's state completely predicts the next pseudorandom number, the string of numbers in question may appear random enough for a specific purpose.  Since our success criterion is 'does it sound good', even a poor pseudorandom number generator (PRNG) will be good enough.  There are a variety of algorithms extant; Linear Feedback Shift Registers (LFSR) are one such class of PRNG.  They work by shifting a register and exclusive-OR-ing some of its members every clock cycle.  The XORshift algorithm is a cheap and capable method invented by George Marsaglia.  I implemented and tested a version of it (for the 8-bit platform, developed by William Donnelly) before I realized that the Cyclic Redundancy Check hardware on the XMEGA, intended to check the integrity of the program flash memory and USB transmissions, was also an LFSR and might afford me four bytes of decent pseudorandom numbers every clock cycle.

Hardware and software implementation

The central question going into this was 'will the hardware CRC peripheral be able to produce samples of data that sound sufficiently white?'.  I had already implemented the 8-bit Xorshift algorithm provided by William Donnelly, and it performed well.  However, it took over 70 cycles to generate one sample, so the possibility of ONE cycle samples was too good not to investigate.

Setting up and using the CRC to generate pseudorandom numbers

The CRC peripheral on the USB-capable XMEGAs is fairly straightforward: a control byte, a status byte, a data input byte and four output bytes.  It can take as input the flash memories, the DMA channels or a 'manual' single-byte IO channel, so that user applications can more easily take advantage of the hardware.  I had some difficulty with getting the thing to take my IO data and getting it to perform the full 32-bit CRC (instead of getting stuck in the 16-bit mode), so here's my sequence for initializing it for my nefarious purposes:

//set up the CRC thingy to accept data from the IO bus, CRC-32
//also set the current state of the CRC generator to all 1's (if //it resets with all 0's, this will go nowhere)
CRC_CTRL |= CRC_RESET_RESET1_gc;
//set up the input to be the data byte
CRC_CTRL &= CRC_SOURCE_gm;
CRC_CTRL |= CRC_SOURCE_IO_gc;
//set to 32 bit width; first set busy byte to allow the change
CRC_STATUS |= CRC_BUSY_bm;
CRC_CTRL |= CRC_CRC32_bm;


Every time I want to get a new random number set in the outputs, I feed all zeros into the CRC (other bytes may be valid), and read out the changed checksum a mere single cycle later.

//change the checksum
CRC_DATAIN = 0b00000000;
//get the bytes
uint8_t crc_out3 = CRC_CHECKSUM3;
uint8_t crc_out2 = CRC_CHECKSUM2;
uint8_t crc_out1 = CRC_CHECKSUM1;
uint8_t crc_out0 = CRC_CHECKSUM0;

Bear in mind that this set-up only works for my situation of steady-state generation of pseudo-random checksums by passing more and more constant bytes into the CRC hardware.  If the CRC hardware is required for other tasks (like, for example, checking out USB packets), it will be necessary to save the current state and reset the generator (including, possibly, changing the data source and bit width) before using it.  Before going back to using the CRC as a PRNG, it is necessary to pause the generator, load it with the saved state, and set the input and bit width as above.

To verify that I understood how the CRC hardware worked and how to access it, I created a firmware which generates a new CRC checksum (by adding an all-zero byte to the input) and transmits it over the serial peripheral.  The source is available here (that component is actually commented out in a big block in the middle).

Simulating a sequence of random numbers; implemented coloration

Using the data output over the serial channel above, I tested a MATLAB function which implemented the activity of the CRC as detailed in the device datasheet (available in this zip file, 'benCRC32guess.m').  Once I was satisfied that my model of its working was correct (bear in mind, that file assumes that the input to the CRC is all zeros), I used it to generate a sequence of bytes as the hardware would.  I then did the simplest test of white-ness possible: auto- and cross-correlation.

The first plot is the autocorrelation for the first byte of the CRC checksum; It is high for only one sample and low elsewhere (not sure why the scaling was all wonky).
The second test was cross-correlation between the bytes; even getting only one random byte per cycle would be great, but if I could use all four (that is, if they were also independent of each other) would be super-great.  As shown below, the cross-correlations were all nada (same weird scaling as above).
Note that the above two plots are exemplars (top was for byte 1, bottom was betwen bytes 1 and 2); the autocorrelations of the other 3 bytes and cross-correlations between the other 5 pairs were similarly acceptable.  The final (and most important) test was the by-ear test; does it sound right?  As linked above (and here), the white noise from the simulation sounded right.

Now that I was confident that the CRC PRNG method was acceptable, I moved on to implementing the red and pink noise generators.  The brown was the simplest; simply integrate the input from the white sample generated above (taking care to prevent values outside a finite range).  This first pass was okay... but exhibited some nasty clipping-ish atrifacts, due to the fact that I just put a hard limit on the valid range (that is... once you hit the ceiling/floor, you hit hard, and saturate).  To soften this, I set up 'buffer' ranges within the valid range.  The middle range allowed for integration as normal; however, progressively farther out ranges scale down input samples which would move the integrand away from the middle of the range.  This was implemented in the MATLAB file 'brownMakerSoftedge.m' in the zip file linked above (and here).  While I have no idea what this does to the statistics of the signal, it makes it sound better, and thus is awesome.

Implementing the Voss-McCartney algorithm for pink noise generation was slightly more involved (but only slightly).  While the links above explain in more detail the ins and outs of the algorithm and its statistical properties, I will describe it briefly here.  Basically, the output of the generator is the sum of 8 white noise generators.  Each generator is sampled progressively more slowly; the first generator is updated every sample, the second every two samples, the third every four samples...  Since each generator exhibits more and more low-frequency power, the sum of the set ends up having a more-or-less 1/f power spectrum.  By scheduling the updates appropriately, only two new white noise bytes are required every sample.  My implementation is in the MATLAB ZIP file, as 'pinkBuffer.m'; the sample sound is here.  As above, it sounds right, so it is right.

Porting the colorizers to the hardware

The MATLAB business above was more-or-less directly ported to C for compilation and loading onto the ATXMEGA32A4U.  The source is in these three files (various bits of code will need to be commented/uncommented).  I set up the generators to continuously transmit samples over the serial channel; these samples were read into MATLAB using the 'getSerialCRCtest.m' file in the ZIP archive.  They were then converted to floating point values, zero-meaned and scaled so that they could be exported as WAV files.  The hardware-generated white noise is here, and the hardware-generated red noise is here.  They sound good, so I'm satisfied.  Currently, the hardware pink output does not sound good (and, upon visual analysis, also does not look good), so I am still working on that.  UPDATE: I changed the source slightly, so the pink works now (and sounds right).  Modified source is here.

Next steps

The first step is to figure out whats going wrong with the XMEGA implementation of the Voss-McCartney algorithm.  Probably some sort of overflow issue.

The next step is to read up on the XMEGA DACs and start passing the generator outputs to the outside world.  After achieving that, I'll mock up the headphone amplifier and see if lowpassing is necessary to eliminate DAC switching transients in the output (if not, I can use the internal-resistor amplifiers, which would reduce overall parts count).

Once all of that is settled, I'll tidy up the code to make it more modular/portable, probably including a save/load state feature to allow CRC hardware use by other functions to be interleaved with the use here.

UPDATE: The pink noise generator works now.  Instead of updating the buffer sum, I completely re-calculate it with each iteration.  The modified source is here.

Thursday, March 1, 2012

Getting started with Atmel's XMEGA (or, making all the old mistakes all over again)

Normally, I wouldn't communicate such a small change, but I had a few little issues in the migration from MEGA/TINY to XMEGA, so this might conceivably help someone somewhere somehow.

Because the XMEGAs use the PDI interface for programming rather than ISP, it was necessary to get a new programmer.  I settled on the Zeptoprog; it was cheap (~25USD, including postage) and MADE IN AMERICA which, beyond jingoism, means that it arrived almost instantaneously.  Furthermore, the programmer/designer/manufacturer replied very quickly to my emails.  Furthermost, the device can also enumerate as an ISP/TPI programmer, GPIO/SPI/frequency counter and serial bridge as well as allowing for firmware upgrades over USB using Atmel's FLIP application.  As a bonus, the device shows up as an AVRISP mkII.

To test out the XMEGA, I soldered one onto a header and broke out the programming and power pins along with two GPIO pins, the two DACs and a USART.  It is pictured below in all its kludgy glory.

And now, to the whole point of this post: DON'T MAKE YOUR LEADS TOO LONG FOR PDI PROGRAMMING.  I had a good foot-and-a-half between programmer and device, and I wasn't able to talk to the device.  After a prompt set of emails with the Creator (of the Zeptoprog), I tried shortening the leads to the extra-shortness shown in the pic below, and it worked.  I could have made longer leads, but I wasn't taking any chances (I was in debug mode).  In fairness to the Zeptoprog, its drivers are stronger even than those in the Atmel-made AVRISP mkII; so any situation the official programmer can handle, the Zeptoprog can, and then some.

Regrading the XMEGA itself: I love it.  It's so beefy, and has so many peripherals, and everything is quadruply configurable.  My chosen device (ATXMEGA32A4U) has two 12-bit DACs, a whole mess of PWMs and ADC channels, a couple of SPI and USART ports AND a built-in USB peripheral.  Best of all, it's got a built-in hardware CRC peripheral, which I'll be using to great effect in generating pseudo-random numbers in almost no cycles at all.