Code Review: February 2010

Sunday, 28 February 2010

Backing Up Your Development Environment

Back in the good old bad old days, backing up an embedded project was simple. You put the source files and makefile on a floppy disk. And for good measure, you could put a copy of the compiler, assembler, linker, and any other tools on the floppy too. The complete toolchain needed to recreate the shipped code, neatly wrapped up in its own little time capsule.

And that toolchain is an important detail. The source code on its own is not enough to recreate a build, you have to factor in the tools too. The output from a compiler can vary by a surprising amount between versions.

These days things are not so straightforward. The compiler, assembler, and linker still exist, but they generally lurk behind the comforting facade of a glossy integrated development environment (IDE). You don't need to figure out the arcane invocations needed to convince the command line tools to do your bidding, you just tick some checkboxes in an options dialog. And I for one don't have a problem with this, I'm perfectly happy for such details to be sorted out behind the scenes.

However this ease of use can be a double-edged sword. That sparkly shiny IDE is likely a significant chunk of code, and will be squirreling away information in all sorts of nooks and crannies on your PC - in the registry, in initialisation files, in configuration files, and down in the depths of your Documents and Settings folder.

Add in some vendor patch releases and user modifications, and suddenly keeping a snapshot of the state of your tools starts getting tricky. Not only do you need to know that the code was built with v1.2.3.4 of the Acme Super Awesome IDE, but also that at that point you'd applied three vendor patches, and hand-modified a chip configuration file that you'd found a bug in. You'd reported it to Acme MegaCorp technical support, but there hadn't been a bug fix release yet.

Now what do you do?

My current solution is to store a simple text file called 'build instructions.txt' or similar with each project. It lists the IDE version, any applied patches, and any funnies such as the aforementioned modified chip configuration file. This approach is not exactly high tech or complicated, but works fine.

I can't help thinking that there must be a better way though - this is still a bit handraulic and error-prone. I suggested to our IT guy that we could create a virtual machine (VM) image containing a bare bones XP install, plus the IDE, plus any 'non-standard' files (patches and the like). That way we could simply specify which VM image to use with the source and project files hauled out of version control. But that made him go a bit green and start shaking. He was mumbling about licensing issues with that many Windows images.

So - if anybody knows of a better way of doing this, do please let me know in the comments.

Sunday, 14 February 2010

Zero Tolerance

This week I picked up some code from a colleague for review. To me, part and parcel of the review process is rebuilding the code - I want to check that I get the same hex (i.e., output) file as has been checked in to our version control system for release. And sure enough, this time I did.

But I also got 16 warning messages. The compiler spat out a variety of messages, including warnings of integer truncation, and what looked like notes the developer had written to himself - "This could be done faster with pointers" and the like.

What did this mean? Was I using a different compiler version than the developer? Was my version smarter about flagging potential coding errors? It seemed unlikely, as I got the same hex file. This almost certainly indicated that we were using the same compiler. Also, there was the matter or all those notes-to-self. Those were clearly there for a reason. How did he know it would be faster with pointers? Presumably he'd coded that up to test it, as otherwise he couldn't know that it would be faster? If he had, and it was, why hadn't he used it?

I spent a while scratching my head over this before going to see the developer. "Oh yeah, I had a look at those, the real warnings are all fine. Nothing to worry about. And the others are just reminders to myself.". And the comments about pointers and the like? "It's always faster to do that code with pointers, I just didn't have the time. That's just a reminder in case I look at this again."

Well, that was fine for him, as he'd written the code. But at some point somebody else was going to look at it. Maybe a customer would report a bug, and a colleague would be assigned to fixing it. Maybe we'd want to add some new features. Maybe we'd want to use this code as the basis for another product with similar functionality. Or maybe, as in this case, a reviewer would pick up the code.

In all of these scenarios, Joe Bloggs engineer would pick up some allegedly working code, hit build, be faced with a screenful of warnings, and freeze like a rabbit trapped in headlights. Hampered as he would be by a lack of omnipotence, he wouldn't know that all the warnings are benign, and that a colleague knows a better way to write his own code.

The warnings might indicate real problems. They might be the compiler being picky. But our hypothetical engineer is going to have to assume that each is a land mine requiring individual attention and possible defusing. All of which could have been avoided if the original developer had simply added some casts or whatever else was needed to placate the compiler.

The only acceptable number of warnings in a build is zero. Anything else is going to waste people's time. Any cost saving to the developer in not dealing with the issues as they crop up will be more than offset by the time wasted by a colleague (or possibly themselves) further down the line.

Possibly more importantly, a developer is fooling themselves if they think they can live with a raft of warning messages.

If you have 16 warnings every time you build, you are much less likely to notice the silent arrival of a seventeenth, this one a genuine harbinger of doom. If however your build has always been clean, and is suddenly polluted by a warning - well, you'll see that.

This is indicative of a professional attitude to development, as well as saving time in the long run. If all warnings are examined and addressed as they surface, the ongoing code quality and workmanship are high. If attending to such items is left as a last-minute operation before release, or even worse pencilled in for the often mythical post-release clean-up, then it will either (a) be a huge job, or (b) not get done.

The same comments apply to writing code that is MISRA-compliant, or that gets a clean bill of health from Lint. It's always faster and easier to do this as you go, rather than banging a keyboard for several days and then trying to impose some semblance of quality as an afterthought.

We are what we repeatedly do. Excellence, then, is not an act, but a habit. (I didn't say that, Aristotle did. And according to Wikipedia, it's actually a misattribution. But as misattributions go, it's got to be one of the best.)

Sunday, 7 February 2010

Generating Random Numbers using LFSRs

On a recent project I needed a simple random number generator. It didn't need to be anything fancy, nothing mathematically pure or statistically valid, just a quick'n'dirty little generator that would spit out numbers that looked random to the casual eye.

In a previous incarnation I worked for a mobile radio consultancy. One of the things I came across in that particular life was the concept of Pseudo Random Binary Sequences (PRBSs).

These are sequences of 0s and 1s that are generated deterministically, yet are random enough for many practical purposes. They are used extensively in communications to spread the bandwidth occupied by a signal. This makes the signal more immune to jamming, as the signal is smeared across a bigger chunk of spectrum than would otherwise be the case. It is also more difficult to intercept, since any receiver must know and be synchronised to the spreading sequence at the transmitter. A more subtle usage is that multiple signals can be spread across the same spectrum by multiple PRBSs, yet be teased apart at multiple receivers by correlation with the correct individual spreading sequences.

PRBSs are typically generated using Linear Feedback Shift Registers (LFSRs). These are cheap and easy to build in hardware, and easy to code in software.

At any time, a function of the LFSR contents is generated as the output bit value. This is fed back into the shift register as the next input. The output bit sequence is essentially random, as is the sequence of states formed by the LFSR contents.

The 'function of the LFSR contents' mentioned above is called a generator polynomial. It determines which LFSR cells are combined and fed back to form the LFSR input for the next state.

For instance, the Wikipedia article on LFSRs lists the following generator polynomial for an 8-bit PRBS:

x^8 + x^6 + x^5 + x^4 + 1

We now need to know how to convert this into code.

As we are generating an 8-bit polynomial, we start off with the 8-bit unsigned value 0x00u as our feedback function. If a power of x, say m, is used in the polynomial, assign a 1 to polynomial bit (m-1). The +1 in the polynomial corresponds to the output bit, and does not appear in the feedback settings. Here we have an 8-bit generator polynomial with bits 7, 5, 4, and 3 set, giving a value of 0xB8u.

Given this, and also using the example code in the Wikipedia article, we can now write a function that returns the next value in a pseudo-random 8-bit sequence.

#define LFSR_8_INITIAL_VALUE 0x01u
#define LFSR_8_POLYNOMIAL    0xB8u

uint8_t next_lfsr_8( void )
{
 /* seed LFSR */
 static uint8_t lfsr = LFSR_8_INITIAL_VALUE;

 /* get LFSR LSB */
 uint8_t lsb = (uint8_t)( lfsr & 0x01u );

 /* shift LFSR contents */
 lfsr = (uint8_t)( lfsr >> 1u );

 /* toggle feedback taps if we output a 1 */
 if( 1 == lsb )
 {
    lfsr ^= LFSR_8_POLYNOMIAL;
 }

 return lfsr;
}

An n-bit PRBS is said to be maximal length if the number of distinct output values it generates is 2^n - 1. That is, a maximal length 8-bit PRBS consists of 255 values. Our generator polynomial is indeed maximal length, and so the values returned will start to repeat after 255 calls.

In the above code I have set the initial value of the LFSR to 0x01u. The LFSR will generate a fixed sequence of output values, and the initial value merely determines where in the sequence the reported values will start.

Note that because the feedback taps uses the XOR operator, the LFSR will get stuck in the all-zeros state if it ever gets there. Care must be taken that an LFSR is not initialised with this value.

The corresponding 16-bit function is as follows.

#define LFSR_16_INITIAL_VALUE 0x0001u

#define LFSR_16_POLYNOMIAL 0xB400u

uint16_t next_lfsr_16( void )
{
 /* seed LFSR */
 static uint16_t lfsr = LFSR_16_INITIAL_VALUE;

 /* get LFSR LSB */
 uint8_t lsb = (uint8_t)( lfsr & 0x0001u );

 /* shift LFSR contents */
 lfsr = (uint16_t)( lfsr >> 1u );

 /* toggle feedback taps if we output a 1 */
 if( 1 == lsb )
 {
    lfsr ^= LFSR_16_POLYNOMIAL;
 }

 return lfsr;
}

Using the above approach, you can generate PRBSs of arbitrary length, given tables of generator polynomials (your search engine of choice will unearth plenty).

Code Review