Tuesday, November 5, 2013

File Transfer Over Soundcard: Director's Cut

I've changed templates on this blog, a process which horribly mangled some of my older posts. This was necessary partially because the mobile device support of my old theme was awful.

I can't actually update my old posts, because they're in some form of you-have-an-ancient-blogger-account compatibility mode, which inserts random line-breaks everywhere. So I'm having to re-compose them entirely. This takes time, so I'm only going to do it for my most popular posts.

This one combines my two posts on how to build a simple software modem from back in 2009. The full sources are available on a github repo.

I'll soon be emptying the original articles, and replacing them with a link to this post. I'll still keep them up, because there is to this date occasionally comments going on in them.

File transfer over soundcard: Part 1
File transfer over soundcard: Part II

Part 1

Before I even start, a word of warning: Never try these programs with your headphones on. THEY MAKE LOUD NOISES! It is possible to configure these programs to make noises way louder than you ever imagined your headphones could make. You can damage your hearing when playing with audio programming. Tinnitus isn't fun.

This will be you if you don't take my warning seriously:

Background and problem

I have an old laptop laying about. A while back, I was in a bit of trouble getting it to work. It's so old that several parts of it has shut down. The USB system is fried, and it can't read modern burned DVD:s. I needed to move software to it to get the network up and running again. The only peripheral that was working (besides the keyboard and the screen) was the sound card. It was running an old version of Slackware.

Ah, I thought, and wrote a program that encoded data in sound. When I did it back then I used a bad algorithm. It was very noise sensitive and tried to do too much at the same time. As of then, I've improved (and simplified) the concept to using a sort of pulse width modulation (an idea I got when I read about the ZX Spectrum Tape Loader.)

The basic protocol is trivial:

  • For every character:
    • For every bit:
      • Send a short pulse if the bit is 1
      • Send a long pulse if the bit is 0
      • Send a silence
    • Send a very long pulse (four times as long as the shortest pulse)
  • Send a silence

This is nice and not very error prone. The end-of-byte signal means that errors don't taint their neighbors. 

The crux isn't the signal generation (which is laughably trivial), it is the analysis on the receiving end; or rather dealing with the noise in the signal. The naive implementation would be to sum the square of the signal amplitude over time periods--the presence of a sine wave would converge towards some value and a silent signal would converge towards 0. In a noisy signal, it almost always converges towards something non-zero, so no such luck.

So, the second approach would be to use a Fourier transform, to select the part of the spectrum where our signal resides (400 Hz is what I chose).

A simple implementation of such a function looks like this:

Where x_in is a series of numbers between -1 and 1, and n is the modified frequency (which is to say: length * frequency / rate). This function would give you a number corresponding to how much of a given frequency is in a sample. But you can do one better: Harmonics. Almost every loudspeaker will produce some level of harmonics even though the signal broadcasted is a plain sine wave with no harmonics.

So, to check if our signal is in a given segment, the following code can be used:

To check if the signal is present in a given signal, you must compare this against some form of threshold. What's a good threshold varies with noise. A bad threshold value may either cause the program to interpret random noise as meaningful data, or reject good data as random noise.

The protocol described above can be realized with the following code:

This does work. It's not just some crazy pipe dream. Following is all the code you need for transferring files from two computers using their soundcards.

Garbled transfers like this may soon arrive through a soundcard near you. 

Utility programs

Before I get deeper into to the main program, I'm going to contribute some utility programs. record and playback. They are both wrappers for OSS, and reads and digests data from the soundcard; or writes digested data to the soundcard at given sample rates. They deal in signed char arrays only, and may convert them for the soundcard. Exactly what they do and how they work is a bit off topic, so I'll just post the code listings.


The broadcasting end
As previously discussed, the broadcasting part of the program is pretty simple. The only real gotcha is the sample rate factor in the frequency. Since we're generating a signal with N bytes per second, we must decrease our frequencies by a factor 1/N. Beyond that, it's really quite trivial.


The math parts
Almost there now. We just need some fourier transforms and things of such nature. The analysis end of the program can also make a XPM file of the frequency spectrum of the input, which is why you see a bunch of XPM code.


Finally... the receiving end

Most of what this one does has already been discussed. The threshold is hard-coded. You may want to change it or whatever.


... one file to compile them all, and into objects link them


To compile, you just run
> make

Using the programs

Before I get to actually using the programs, I repeat my warning: Never try these programs with your headphones on. THEY MAKE LOUD NOISES! It is possible to configure these programs to make noises way louder than you ever imagined your headphones could make. You can damage your hearing when playing with audio programming. Tinnitus isn't fun.

Not there yet. It's quite an elaborate matter to use all these programs. First you'll want to generate your raw sound data. Let's transfer /etc/fstab (don't do this as root! Something might go horribly wrong!)

First, attach the microphone on the receiving end to the speakers on the transmitting end. Use duct tape or whatever. I had an old hands free headset that I wrapped so that the microphone was held in place by the earpiece.

Experimental set-up

On the transmitting computer, run the following command:
> ./generate -b 25 -r 48000 -o out.data /etc/fstab

Enter, but do not start the following on the transmitting computer:
> ./playback -r 48000 < out.data # don't press enter yet!

Now, on the receiving computer, run the following command:
> ./record -r 48000 -o out.recdata
Note that out.recdata will grow very fast. In this case, 48000 bytes/second.

Run the command pre-typed in the transmitting computer's terminal. Be very quitet, and listen to the noise coming from of the speaker. This may take quite some time. Practice your Zen. Find enlightenment. When the noise stops, press Ctrl+C on the receiving computer.

Run the following command on the receiving computer:
> ./analyze -b 25 out.recdata

Watch a semi-garbled /etc/fstab be printed across your screen. The -b switch is to be taken with a grain of salt. It is proportional to how fast the transfer is. It must (obviously) be the same on the receiving and the transmitting end. I've gotten it to work at -b 50 -r 48000. Sample rate (-r) increases the processing time, but it also allows faster transfers. There are a few fixed possible sample rates, 8000 always works, others that are supported by most soundcards are 11025,16000,22050,24000,32000,44100 and 48000.

So, in summary: -b determines how fast the transfer is, and the maximum possible transfer speed is limited by the sampling rate.

If it doesn't work, try playing what you recorded with playback. If you can hear and distinguish the beeps, then so should the computer be able to. If record or playback fails, chances are you don't have permissions to access /dev/dsp. If all you get is character salad, fiddle with threshold in analyze.c.

Part 2

I've played around further with the file transfer over sound card idea, and developed a more advanced method that uses a technique called Phase Shift Keying. Similar techniques are used in wireless network connections.

Instead of coding data in the amplitude or frequency of the carrier signal, phase shift keying (as the name indicates) encodes it in the phase. It is significantly faster, partially because it doesn't waste as much time with silences, but also because it's more reliant.

Flawless transmission

I must admit it's a good thing I wasn't drinking coffee when tinkering with this technique, because it's very likely I would have blown coffee through my nose and onto the keyboard when I first saw transfer rates over around 200 baud, with no random garbling. I'm sure it's possible to go higher with better equipment than my cheaper-than-dirt headsets that came with my webcams.

How it works

The mathematics of the method is as following, if the signal is expressed as

\[S(t) = A_0 \sin(\omega t + \phi)\]

Then the Fourier transform of S would give you

\[F_\omega(S) = A_0 e^{i\phi}\]

Normally, one would simply discard the phase factor by taking the norm of the frequency coefficient, but for this we're going to make use of it. You can't just yank the phase out of the expression as it is though (phase is always relative to something). But! You can compare this phase with the phase of the last sample you got (this is called differential phase-shift keying, by the way).

So, I propose the following scheme:
0 Still the same sample as last time
\(\frac{\pi}{2}\)  Next bit is 1
\(\pi\) Next bit is 0
\(\frac{3\pi}{2}\) New byte

This has several nice features. You can jump into the signal almost anywhere and at most one byte will be garbled. Furthermore, everything has an uniform length, so bit rate doesn't depend on how many ones and zeros is in the byte.

Modifying the fourier code from the last blog post on sonic file transfers, the following function will allow you to make use of the phase:

So how do we figure out the phase difference? Let φ be the phase of the current sample, and ψ be the phase of the previous sample.

\[e^{i\phi}e^{-i\psi} = e^{i(\phi - \psi)} = \cos(\phi - \psi) + i sin(\phi - \psi)\]

The real term will dominate if φ - ψ ~ nπ , and the imaginary term will dominate if φ - ψ ~ (n+1)π/2 and their sign will further tell you if n is odd or even.

The demodulation algorithm is fairly short:

For several reasons, it's a good idea to use a pretty high carrier frequency for this method. At some point, your speaker or microphone will not be able to process the information, so you'll want to stay under that, but a high frequency will result in less perturbation of the signal since fairly few objects have eigenfrequencies in the 5 kHz range or higher, and even oscillation at harmonic frequencies drops off pretty significantly at such frequencies.


The complete source code for the program:


You may have grabbed these the last time, but they are slightly altered now, so get them again:


You also need the playback and record programs I posted in the previous post. They haven't changed though.



Same old warning: Never try these programs with your headphones on. THEY MAKE LOUD NOISES! It is possible to configure these programs to make noises way louder than you ever imagined your headphones could make. You can damage your hearing when playing with audio programming. Tinnitus isn't fun.

To transfer a file (let's transfer /etc/fstab again), put the microphone next to the speaker, pre-type the following on the transmitting computer (without running it):

>./generate_psk -r 48000 -c 8000 -b 100 /etc/fstab | ./playback -r 48000

Type the following on the receiving computer:

>./record -r 48000 > mydata

Press enter on the transmitting computer. Be quiet (this should be fairly quick. Maybe 30 seconds?) When the high pitched noise stops, press Ctrl+C on the receiving computer's terminal.

./analyze_psk -r 48000 -c 8000 -b 100 mydata
on the receiving computer should retreive the message.

The parameters are
-r : Sample rate -- carrier frequency and signal quality
-c : Carrier frequency -- limits baud rate and signal quality
-b : Baud rate -- determines how fast the transfer is

What works and doesn't with the sampling rates and frequencies is a bit tricky. It all boils down to Nyquist-Shannon. That theorem is all innocent looking, until you make it mad. Then it turns green and grows three times it's size and goes on a furious rampage through your hopes and dreams.

Anyways, have fun experimenting with this technique.

No comments:

Post a Comment