## Wednesday, April 22, 2009

### File transfer over sound card II: Phase Shift Keying

I've played around further with the file transfer over sound card idea, and developed a more advanced method that uses a technique called Phase Shift Keying. Similar techniques are used in wireless network connections.

Instead of coding data in the amplitude or frequency of the carrier signal, phase shift keying (as the name indicates) encodes it in the phase. It is significantly faster, partially because it doesn't waste as much time with silences, but also because it's more reliant.

I must admit it's a good thing I wasn't drinking coffee when tinkering with this technique, because it's very likely I would have blown coffee through my nose and onto the keyboard when I first saw transfer rates over around 200 baud, with no random garbling. I'm sure it's possible to go higher with better equipment than my cheaper-than-dirt headsets that came with my webcams.

How it works

The mathematics of the method is as following, if the signal is expressed as

S(t) = A0 sin(ωt + φ)

Then the Fourier transform of S would give you

Fω(S) = A0 e

Normally, one would simply discard the phase factor by taking the norm of the frequency coefficient, but for this we're going to make use of it. You can't just yank the phase out of the expression as it is though (phase is always relative to something). But! You can compare this phase with the phase of the last sample you got (this is called differential phase-shift keying, by the way).

So, I propose the following scheme:
Δφ Meaning
0 Still the same sample as last time
π / 2 Next bit is 1
π Next bit is 0
3 &pi / 2 New byte

This has several nice features. You can jump into the signal almost anywhere and at most one byte will be garbled. Furthermore, everything has an uniform length, so bit rate doesn't depend on how many ones and zeros is in the byte.

Modifying the fourier code from the last blog post on sonic file transfers, the following function will allow you to make use of the phase:

double fourier1p(double x_in[], double n, int length, double* phase_r, double* phase_i) { double x_complex[2] = { 0, 0 }; int i; for(i = 0; i < length; i++) { x_complex[0] += x_in[i] * cos(M_PI * 2 * i * n / (double) length); x_complex[1] += x_in[i] * sin(M_PI * 2 * i * n / (double) length); } double norm = sqrt(x_complex[0]*x_complex[0] + x_complex[1]*x_complex[1]); *phase_i = x_complex[1] / norm; *phase_r = x_complex[0] / norm; return norm / length; }

So how do we figure out the phase difference? Let φ be the phase of the current sample, and ψ be the phase of the previous sample.

ee-iψ = ei(φ - ψ) = cos(φ - ψ) + i sin(φ - ψ)

The real term will dominate if φ - ψ ~ nπ , and the imaginary term will dominate if φ - ψ ~ (n+1)π/2 and their sign will further tell you if n is odd or even.

The demodulation algorithm is fairly short:

 double carrier_phase[2]; double carrier_strength = fourier1p(dbuffer, (float) length * carrier / (float)rate, length, &carrier_phase[0], &carrier_phase[1]); if(carrier_strength < threshold) continue; double delta_re = carrier_phase[0] * old_carrier_phase[0] + carrier_phase[1]*old_carrier_phase[1]; double delta_im = -carrier_phase[1]*old_carrier_phase[0] + carrier_phase[0] * old_carrier_phase[1]; if(delta_re * delta_re > delta_im * delta_im) { /* Phase difference is a multiple of pi */ if(delta_re > 0); /* No change */ else { bit_data = bit_data * 2; } } else { if(delta_im > 0) { bit_data = bit_data * 2 + 1; } else { if(isprint(bit_data)) printf("%c", bit_data); else printf("<%.2x>", bit_data); bit_data = 0; } } old_carrier_phase[0] = carrier_phase[0]; old_carrier_phase[1] = carrier_phase[1]; 

For several reasons, it's a good idea to use a pretty high carrier frequency for this method. At some point, your speaker or microphone will not be able to process the information, so you'll want to stay under that, but a high frequency will result in less perturbation of the signal since fairly few objects have eigenfrequencies in the 5 kHz range or higher, and even oscillation at harmonic frequencies drops off pretty significantly at such frequencies.

Sources

The complete source code for the program:

generate_psk.c
analyze_psk.c

You may have grabbed these the last time, but they are slightly altered now, so get them again:

fourier.h
fourier.c

You also need the playback and record programs I posted in the previous post. They haven't changed though.

playback.c
record.c

Using

Same old warning: Never try these programs with your headphones on. THEY MAKE LOUD NOISES! It is possible to configure these programs to make noises way louder than you ever imagined your headphones could make. You can damage your hearing when playing with audio programming. Tinnitus isn't fun.

To transfer a file (let's transfer /etc/fstab again), put the microphone next to the speaker, pre-type the following on the transmitting computer (without running it):

>./generate_psk -r 48000 -c 8000 -b 100 /etc/fstab | ./playback -r 48000

Type the following on the receiving computer:

>./record -r 48000 > mydata

Press enter on the transmitting computer. Be quiet (this should be fairly quick. Maybe 30 seconds?) When the high pitched noise stops, press Ctrl+C on the receiving computer's terminal.

Running
./analyze_psk -r 48000 -c 8000 -b 100 mydata
on the receiving computer should retreive the message.

The parameters are
-r : Sample rate -- carrier frequency and signal quality
-c : Carrier frequency -- limits baud rate and signal quality
-b : Baud rate -- determines how fast the transfer is

What works and doesn't with the sampling rates and frequencies is a bit tricky. It all boils down to Nyquist-Shannon. That theorem is all innocent looking, until you make it mad. Then it turns green and grows three times it's size and goes on a furious rampage through your hopes and dreams.

Anyways, have fun experimenting with this technique.

2011 update: Moved the sources to a github repo.

1. You could do lots better with FSK.

2. There's several ways of improving the PSK code as well. Huffman coding would obviously save a lot of time.
You can probably double the number of phases in the coding as well. Finally, it's possible to run several carrier signals in parallel.

4. This is really cool, but I would think that your baud rate could be a lot higher with the right signal processing. You may want to read up on heterodyne receivers and matched filters. I would also suggest some kind of forward error correction, but that can get very complicated very fast.

If it were me, I'd try to characterize the "channel" before selecting a modulation type. It sounds like you may have a lot of non-random noise (self-interference due to multipath and maybe some nonlinear effects). FSK, PSK, QAM, GMSK, and everything else are subject to the same theoretical limits, they just perform differently under different non-ideal conditions.

If you know Python, there's an open-source project called GNU-Radio that might be of interest; it's got all kinds of premade blocks for doing real-time processing on these types of signals.

Best of luck!

5. Have you ever looked into how 80's home computers used to load software from audio tapes? Speeds of 1000-2000 baud were typical.

see e.g. http://en.wikipedia.org/wiki/ZX_Spectrum_software#Tape

6. I gotta say, the comments of Part I got me down a little, so many people implying that you shouldn't even be bothering because 'it's been done before'.

7. Yeah, I try to not take it too personally. There's always people who take stuff like this too seriously, usually they're working with similar technologies professionally and can't distance themselves from it far enough to see why people would do it for enjoyment.

I can understand it because I have been known to get in similar states over physics.

8. I was thinking of realizing the same project for almost the same problem (no serial port on the eeepc, no floppy, no cd.. and my poor old p1 laptop hasn't got a NIC)...
Probably I'll take this as a starting point, and make it portable/faster.

Great project anyway, you rock!

9. Wow, that's neat!

I keep hearing "It's been done before" and I'm scratching my head going "Where?".

Anyway I look forward to seeing this type of system being integrated into a single program that would make it simple to transmit files via sound cards.

Good Luck!

10. Hi. I would like some more explanations on the the subject, particularly at generate_psk

for(i = 0; i < length; i++)
{
data[i] += (int)(64.0 * sin(2*M_PI*carrier*(i)/ (float) rate + car_phase));
}
car_phase += 2*M_PI*carrier *i / (float) rate;
car_phase -= 2*M_PI * (int) (car_phase / (2*M_PI));
fwrite(data, length, 1, outfile);
free(data);

is i meant to stand for time ? .. any explanations will help a lot.if you could leave me a message at jegunemilosu@yahoo.com I would be forever grateful

In short: Yes. Deconstructing the code

data[i] += (int)(64.0 * sin(2*M_PI*carrier*(i)/ (float) rate + car_phase));

This part can be generalized as following:

data(at time) = sin(frequency * time + phase)

First you need to translate the frequency to something the soundcard can understand. We're making "rate" samples per second, and the sine function completes an oscillation in 2pi (that is sin(x+2*pi) = sin(x)), so the frequency we need to feed to sin is 2*pi*frequency / sampling rate.

car_phase += 2*M_PI*carrier *i / (float) rate;

We need to save the value of (frequency * time + phase) so that when we generate more data, it matches up with the previous data. This phase is also modified by the algorithm to encode data.

car_phase -= 2*M_PI * (int) (car_phase / (2*M_PI));

Since sin(x+2*pi) = sin(x), we can round this off to remove any multiples of 2*pi that fits into car_phase.

I hope that helps.

12. I hate to be the only Linux n00b commenting but I really want to get this working and I can't seem to compile it:

cc generate_psk.c -o generate_psk
/tmp/ccErEN7b.o: In function generate':
generate_psk.c:(.text+0xb4): undefined reference to sin'
collect2: ld returned 1 exit status
make: *** [generate_psk] Error 1

Trying to compile fourier also gives me "undefined reference to cos'" and "undefined reference to sqrt'"

That's got me stumped. I got the first version of the program working ok.

Ubuntu 9.04

13. You need to link with the math library, -lm. So you do

cc foo.c -lm -o foo

14. Re: Viktor

That was just the ticket, thank you.

15. I've been considering an approach like this for a long time as a quick alternative to carrying a USB drive everywhere; I often need to transfer between web-enabled desktops in college and my laptop which has a lovely Mac microphone..

I'd love to see this stuff packaged as a single cross-platform tool! I'm fiddling with Processing at the moment though; if I manage to duplicate something like this I'll let you know.. (Small odds!)

Thanks though, this is cool!

16. Is there Windows versions of this?

17. Instead of a microphone and headphones, can I use a jack to jack cable to connect computers?

18. (I worry if it might short circuit or something)

19. No, that's generally quite safe unless there's an amplifier involved (well, there are no guarantees of course) and does obviously reduce the noise level, and I've since used such cables. It was just that at the time I was feeling cheap (and audio cables are ridiculously expensive).

20. hi /dev/dsp is not found in new ubuntu distro. Is there a work around for that?

Thanks
Ram

21. There's a program 'aoss' in the alsa-oss package that can make OSS programs use ALSA (you run "aoss ./theProgram").

22. hi i tried that.
I get an error like this

ramk@marvin:~/Workspace/file-transfer-over-soundcard\$ aoss ./record -r 48000 > mydata
SNDCTL_DSP_SETFMT: Invalid argument

23. It appears your sound card doesn't support 8 bit signed audio. Give me half an hour, I'll whip up a patch that records 16 bit audio instead.

24. Try re-downloading playback.c and record.c now. I added a flag '-w' that (hopefully) should force it to use 16 bit audio. But I don't have any computer with speakers around, so it's completely untested.

25. hey viktor,
Thanks a lot!

The new files also give the error. I tried doing this..
aplay -r 48000 generated_file |arecord -t raw -r 48000 > recorded_file

analyze did not work properly. Any thoughts on this ?

26. Analyze and generate expect raw 8 bit signed data. Try sending aplay and arecord the flags --format=S8

27. hi,
looks like format S8 is not supported in my card. I tried S16. It seems to work fine only that i get lots of junk Characters in the end.

28. Yeah, you need to convert that to 8 bit signed integers. But that's rather simple to do.

and it will build "upsample" and "downsample" for you. So you have to run

arecord -t raw -r 48000 --format=S16 | ./downsample > out

and

cat out | ./upsample | aplay -t raw -r 48000 --format S16

29. Really nice project man! and nice description!

I was thinking what if we the distance between sender speaker and receiver microphone is around 8 meters? will it work?

and which one is better PSK or FSK for such setup?

30. The problem with an actual speaker-mic setup is going to be sound reflection (i.e. echoes).

In a real world application, that's mainly going to mess with phase, so I imagine either frequency shift keying or some form of pulse modulation would be the best options.

31. Yep, i am trying frequency shift keying and it is working if i place it close to speaker.

Can we somehow increase the accuracy of fourier1 function?

Thanks!

32. You can exchange transfer rate for accuracy by using an error correction scheme. Hamming(7,4) will go a long way.

33. Ha can i compile this codes?
Online Compiler ( http://www.onlinecompiler.net/ccplusplus )says, that it can't compile.

34. Hi, can you please explain what does the fourier function exactly do ? because i couldn't understand how you did use it in the decoding class,
Thanks,

1. Let's say you have a signal generated through

S(t) = A * cos(10*2*pi*t) + B * sin(10*2*pi*t)

Then the Fourier function, given the frequency n=10, will give you A, B and sqrt(A*A+B*B).

I have an other question please, i don't understand how to synchronize the sending and recording if we use 2 computers, how can we know when exactly the playing starts because i couldn't play and at the same time record from the other PC,
Thanks

36. Thanks for this great post,
How can we choose bps, sampling rate and carrier frequency to not lose information?

37. Besides sampling rate which should be something your souncard approves of (44100 works for most cards, but 22050 and 8000 are super-safe), the easiest way is probably to hone in on working values by trial and error.

But if you want to use this technique in a practical application, I strongly recommend using Hamming code to be able to recover from single bit errors, rather than trying to eliminate them http://en.wikipedia.org/wiki/Hamming_code

1. Thanks a lot :)
Please I'm working it in matlab, so which is the optimal coding scheme, have you any idea ?

38. Is there any chance of embedding data in actual audio, perhaps at a frequency over 20,000hz (higher than the human ear can hear?

I would like to be able to play a soundfile (text to speech) and mix in some data at the same time. So the computer can speak to me and send data to other computer devices at the sametime.

1. i've been working on the same...for some time...did u find out any method to do so ???...its screwing my brain too much...i tried to generate a sound in the range 18 to 20 khz...so tht its not audible and still can be used to send data by superimposing over the required carrier...but then am not even able to generate an inaudible sound...as for given in this code if u put frequency of 18 or so and then adjust the sampling rate according to the nyquist shannon...its no way inaudible...if any one has even the slightest of idea pls do post...and my dear frnd viktor the geek...where are you...need your help dude...check sonic notify...even cheap speakers are able to generate sound in the frequency range till 20 and a cheap microphone can catch the frequency and analyze it...so pls do think something in this regard and help me out

39. Theoretically, I guess.

But in practice, you'd need specialist hardware for that (soundcard, speakers, microphone), as hardware manufacturers tend to cut corners in faithfully representing parts of the signal that are imperceptible to the human ear. Even then you'd probably still have issues with the scheme.

40. why don't you use Trellis modulation ??

--chris

41. Hello dear Viktor,
I like the idea that you tried to develop a software modem. :-)

I have done something also very interesting with a friend of mine when I was 18 years old.

I also had an idea lately and transformed it into something funnier...

So, I felt like sharing with you and all the friends here.

http://rev-eng.blogspot.gr/

Enjoy!

42. hello
interesting project !
just wanted to know if it is possible to simple toggle the audio pin high or low and implement a software UART over it ?

43. how can u make the sound inaudible...? what do think should be the arguments ?

44. I don't think you can make the sound inaudible, at least not without special hardware. Sound cards and speakers aren't designed to produce sound above the human hearing range.

If you want to have both voice and data transfer over the same line, you're going to have to use some form of filter to separate the two. That's how DSL works (see http://en.wikipedia.org/wiki/DSL_filter)

45. brother...yamaha and sonic notify are doing it...though they have also produced special hardware for extensive use but the microphone of a normal phone can catch it...check the concept of sonic notify and if possible do check the video where they have encoded data on their website...i tried putting the frequency pretty high and sample rate 192...the sound is pretty low...but then am making an android project and sample rates of a smartphone is 48 khz...i generated the sound at frequency between 17 and 20 khz with sample rate set to 48000...there was sound...so i recorded it(with least envirnmental noise) and then checked its spectogram...the frequecny was produced between the range...not pretty clear though...but then am using fsk(will change to psk soon as i see some light)...now the main problem it the the frequency is not only generated between the range 17 to 20 but also between the range around 10...its like the data is not only producing frequency at the specified range but also at frequency below the range....any advice...the next thing i'll be doing now is that i'll use a high pass filter before the playback...do u think the residual will have enough strength to carry data...and dude...thanks for waking up from ur cryosleep :)

46. If you're not using equipment designed to produce high frequency sounds, it's possible that it fails to accurately represent your signal. It's possible it's turning your sine wave into a square wave, which will add all sorts of harmonics.

Beyond that, any signal carrying data will create sidebands in adjacent frequencies to the carrier frequency (unless it's an unmodulated sine wave).

http://en.wikipedia.org/wiki/Sideband

47. yes dude...well its working now...been able to generate high frequency sound to carry data...and yes its inaudible now...as i am working on android i used sourcedataline and some other package...and its been working gr888...well now i can move to the analyze part of it...thanks again brother...wonderful blog...i cant explain how much it saved my time...

48. A few of your header files don't run on windows platform.Please, tell me how to run it on TURBO C++ for header files like getopt.h

49. It will likely never run on Windows without significant modification. It's based on OSS, which is a Linux sound interface.

You can probably write your own basic version of getopt pretty easily. It isn't a very complicated function.

50. I have functions like optarg,optind that i found in your program in GITHUB repository.Please, tell me what functions should they be replaced on windows??

51. what about soundcard.h in windows?

52. There is nothing similar to in windows. You'll have to write completely new functions for reading and writing to the soundcard. I do not know how to do this.

53. This seems to be a POSIX-compatible getopt-replacement for windows. https://gist.github.com/superwills/5815344

I haven't tested it myself, so your mileage may vary.

54. Hi, victor
you did a great job, i want to ask if i am making this kind of application for real time use, like that of sonic notify etc. so i should go with the psk/ hamming, and what else i had be needing. any help is very appreciable.
Thanks

55. Is it possible to build / run this on a mac osx? It is complaining about linux/soundcard.h not found.

56. Hi Viktor! First of all, thanks for all your work and support. I have just one question: How do I recover the fsbtab file from the receiver computer? I have mydata, and I do analyze_psk -o NEWDATA and all the other variables, but the output NEWDATA sizes 0 bytes. Please help me out since I'm doing this for a College proyect and it's very important for me. Thanks again.

57. Hm, hard to say. Can you recover the data if you just analyze the generated file directly, without sending it as sound?

You also need a pretty solid audio transmission for PSK to work (few echoes and other distorting effects, I basically put a microphone right next to a speaker). To that end, also ensure no features like "mic boost" or anything like that is enabled, as it may distort the sound transmission.

Does it work when you use the PWM-scheme from the previous blog post? That's probably more resistant to distortions than PSK.

I guess you could perhaps use a spectrum analyzer and see if you can make sense of what's going on with the signal (comparing the sent signal to the received one). I've used baudline (which open source) a couple of times, seems fairly powerful.

1. This comment has been removed by the author.

2. I should have said in my first question, I'm not using speaker-mic as channel, I'm using a 3.5 to 3.5 jack audio cable. Also, I have an oscilloscope and I can actually see the signal changing phases, so the transmitter is working properly afaik.
-- Further working with your suggestion about using analyze_psk locally, I did the following and take a look at the message I got:
http://i.imgur.com/5vno1lE.png
As I expected, I can't get analyze_psk to work as it should. I'm sure I'm missing something basic as the newbie I am. Any other suggestion here?
Again, thanks for let me steal your time.

3. Well I just found out that all that text is actually /etc/fstab. Feeling dumb right now. So the analyze_psk works as expected locally. But when I send it, in the other end, I receive a bunch of data not representing what fstab. Here is a photo of the output (I cannot do an screegrab because the UI is not started).

http://imgur.com/YszXF25

As you can see, there's a bunch of HEX data together with some random characters. My only guess is my wired channel is introducing some errors. What do you think?

4. This comment has been removed by the author.

5. Yeah, seems like the transmission isn't making it across.

Could be your soundcard doesn't support the playback or recording settings. Try messing around with them. Try different values for sample rate -r (44100, 22000, or 16000). Just make sure you use the same settings for generating as playing back (and recording/analyzing).

You could also test lowering transfer rate -b, or using the -w flag (16 bit).

6. I tried everything you said, even using a small (3 words) txt, still no luck. Thanks for your support, I really appreciate it.

7. Have you tried using the modulation scheme in part 1 of the blog series? Maybe try that, and hook it up to a speaker, to make sure your sound card actually encodes it properly (it's closer to morse code with beeps and pauses, so it's very obvious to the human ear if it's encoded properly -- you should expect a clear tone, not a distorted one).

58. This comment has been removed by the author.

59. Hi Viktor, it's me again. After trying different approaches to my problem, what I never told you it's basically transmit a 256 byte key to the other end via 3.5mm jack port, couldn't solve anything. So here I am again taking your time. Now I just want to use your modulation code, and try to send the data from a char* and not from a file. The channel has a lot of noise and we need to improve it. So, inside generate_psk and analyze_psk, can you please point exactly what are the codes for modulation and demodulation? Everything else I should not use. Thanks again Viktor.

60. I am 67, I run since about 40 years a private business in designing and producing various electronic controllers that my local market may need. But my MSs in electronics was in communications. Lately, I developed what we may call cosine raised signal to transmit bits via a voice-limited channel (as 300 to 3K Hz for example). Its technique is based on two points. First, it could be seen as PSK because if the 1's are sent as V*cos(wt), the 0's are sent as V*cos(wt+180) or -Vcos(wt). For instance, this modulation is also known as DSB-SC; the modulating signal is a square-wave and its frequecy fa equals fc/2. Second, to send a '1' then a '0', the amplitude +V at the end of bit-1 stays constant for a half cycle then drops to -V following the cosine function. Similarly, to send a '0' then a '1', the amplitude -V at the end of bit-0 stays constant for a half cycle then goes up to +V following the cos function. Therefore, the transmitted signal for 1010101010... looks like a square-wave, but its edges are now cosine raised. This eliminates the higher harmonies. So, if fc = 2450 Hz (chosen for MCUs, since 18*2450=44100 Hz of RIFF files), the bit rate is also 2400 bit/sec. I also designed a rather simple circuit to generate this cosine raised signal from a continuous square-wave (or triangular-wave) one than runs at fc.
Hope this helps.
Kerim (I live now in Aleppo city - Syria)