29

I'm doing some data transmission from a dsPIC to a PC and I'm doing an 8-bit CRC to every block of 512 bytes to make sure there are no errors. With my CRC code enabled I get about 33KB/s, without it I get 67KB/s.

What are some alternative error detection algorithms to check out that would be faster?

FigBug
  • 2,379

9 Answers9

46

While there may be faster options than CRC, such as Fletcher, if you use them then you are likely to end up sacrificing some degree of error detection capability. Depending on what your performance and error detection requirements are, an alternative may be to use CRC code optimised to your application instead.

For a comparison of CRC with other options, see the excellent answer by Martin Thompson.

One option to help with this is pycrc which is a tool (written in python1) which can generate C source code for dozens of combinations of crc model and algorithm. This allows you to optimise speed and size for your own application by selecting and benchmarking different combinations. 1: Requires Python 2.6 or later.

It supports the crc-8 model, but also supports crc-5, crc-16 and crc-32 amongst others. As for algorithms, it supports bit-by-bit, bit-by-bit-fast and table-driven.

For example (downloading the archive):

$ wget --quiet http://sourceforge.net/projects/pycrc/files/pycrc/pycrc-0.8/pycrc-0.8.tar.gz/download
$ tar -xf pycrc-0.8.tar.gz
$ cd pycrc-0.8
$ ./pycrc.py --model=crc-8 --algorithm=bit-by-bit      --generate c -o crc8-byb.c
$ ./pycrc.py --model=crc-8 --algorithm=bit-by-bit-fast --generate c -o crc8-bybf.c
$ ./pycrc.py --model=crc-8 --algorithm=table-driven    --generate c -o crc8-table.c
$ ./pycrc.py --model=crc-16 --algorithm=table-driven   --generate c -o crc16-table.c
$ wc *.c
   72   256  1790 crc8-byb.c
   54   190  1392 crc8-bybf.c
   66   433  2966 crc8-table.c
  101   515  4094 crc16-table.c
  293  1394 10242 total

You can even do funky things like specify using dual nibble lookups (with a 16 byte look-up table) rather than single byte look-up, with 256 byte look-up table.

For example (cloning the git repository):

$ git clone http://github.com/tpircher/pycrc.git
$ cd pycrc
$ git branch
* master
$ git describe
v0.8-3-g7a041cd
$ ./pycrc.py --model=crc-8 --algorithm=table-driven --table-idx-width=4 --generate c -o crc8-table4.c
$ wc crc8-table4.c
  53  211 1562 crc8-table4.c

Given your memory and speed constraints, this option may well be the best compromise between speed and code size. The only way to be sure would be to benchmark it though.


The pycrc git repository is on github, as is its issue tracker, but it can also be downloaded from sourceforge.

Mark Booth
  • 14,352
16

The Effectiveness of Checksums for Embedded Networks by Theresa C. Maxino is a really good paper comparing the performance of various checksums and CRCs in an embedded context.

Some quotes from the conclusions (based on studies of undetected error probabilities):

When burst errors dominate

XOR, two’s complement addition, and CRC checksums provide better error detection performance than one’s complement addition, Fletcher, and Adler checksums.

In other applications

a “good” CRC polynomial, whenever possible, should be used for error detection purposes

If computation cost is very constrained

(as in your case), use (in order of effectiveness):

Other quotes:

The Fletcher checksum has lower computational cost than the Adler checksum and, contrary to popular belief, is also more effective in most situations.

and

There is generally no reason to continue the common practice of using an XOR checksum in new designs, because it has the same software computational cost as an addition-based checksum but is only about half as effective at detecting errors.

Mark Booth
  • 14,352
11

Simple one bit parity (basically XORing the data over itself over and over) is about as fast as one can get. You do lose a lot of the error checking of a CRC though.

In pseudocode:

char checksum = 0;
for each (char c in buffer)
{
    checksum ^= c;
    SendToPC(c);
}
SendToPc(checksum);
Billy ONeal
  • 8,083
5

I'm not aware of anything that's as effective at error detection as a CRC and faster - if there were, people would be using it instead.

You could try a simple checksum, but that's far less likely to detect errors.

Bob Murphy
  • 16,098
5

The Adler checksum should be sufficient for checking for transmission distortions. It's used by the Zlib compression library, and was adopted by the Java 3D Mobile Graphics Standard to provide a fast but effective data integrity check.

From the wikipedia page:

An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the string plus one, and B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo 65521 (the largest prime number smaller than 2^16 or 65536). The bytes are stored in network order (big endian), B occupying the two most significant bytes.

The function may be expressed as

 A = 1 + D1 + D2 + ... + Dn (mod 65521)
 B = (1 + D1) + (1 + D1 + D2) + ... + (1 + D1 + D2 + ... + Dn) (mod 65521)
   = n×D1 + (n-1)×D2 + (n-2)×D3 + ... + Dn + n (mod 65521)

Adler-32(D) = B × 65536 + A

where D is the string of bytes for which the checksum is to be calculated, and n is the length of D.

Gnawme
  • 1,333
3

Well the checksum logic itself is good and people can help with faster algorithms.

If you want to improve the speed of your component, you might need to look at changing your overall technique to seperate out the transfer component from the validation component.

If you have these as two independant items (on different threads) then you can get your full speed of transfer and only resend failed packets.

Algorithm would look something like:

  • Server breaks up into known packet sizes (say 1K chunks). Puts them in a queue of "to be sent".
  • Each packet is sent over with a 16 or 32 bit ID AND its checksum.
  • The client receives each packet and puts it in a queue to be processed.
  • On a seperate thread the client takes of a packet at a time, does the validation.
    • On success it adds it to the final collection of packetss (in ID order) being
    • On a fail it reports the failed ID back to the server, which queues up that packet to be resent.
  • Once you have recieved and validated the packets and you have the IDs in the correct sequnce (starting at 1) you can start writing these to disk (or do what ever is required).

This will let you trasmit at the highest possible speed and if you play with your packet size you can work out the optimium fail rate VS validate/resend rate.

Robin Vessey
  • 1,587
2

Checksums are traditional

(reduce #'+ stream)

XOR as given above would work as well

(reduce #'XOR stream)

A slightly more elaborate (slower) scheme is the standard parity check for serial connections.

At this level, you are trading correctness for speed. These will occasionally fail.

At the next most sophisticated level, you can use some crc/hash type stuff.

Another design would be to increase the size of the block used for the stream.

You should have an estimation of actual error rate to tune your algorithm selection and parameters for block size.

Paul Nathan
  • 8,560
  • 1
  • 34
  • 41
0

Table-driven CRC algorithms should be extremely fast because no actual computation is involved. I would also suggest using at least a 16- bit CRC.

0

Many people think Adler32 is a very fast checksum because it is a simple one but that is only partly true. Adler32 is certainly faster than CRC32 but some hash functons are even faster, like Murmur3F or FNVJ32/FHVJ64. See this comparison chart. And hash functions can also be used for checksumming.

And not only are certain checksums faster, at the same time they produce way better results than Adler32, even better than CRC. See this answer to a similar question on Software Engineering. The more random the outcome, the less likely two different blocks of data will result in the same checksum and the better the bits are evenly distributed in the final checksum result (this is important if you plan to truncate the result).

Meanwhile even much faster alternatives exist. Please checkout this answer on Stack Overflow. The fastest hash probably available that is also a very good one, is XXH3 but it can only be implemented that fast on CPUs with SSE2 support. The crazy thing is: It is faster than a plain memory copy. Of course, only if the data is already in the CPU cache (everything else would be magic).

So plenty of better alternatives exist, the question is only, which of these can be implemented on the dsPIC side and will it still be faster when implemented there? As many alternatives are not fast in general, they are fast because they are optimized for the way how modern desktop/server CPUs are processing data.

Mecki
  • 2,390
  • 1
  • 16
  • 20