Sunday, June 21, 2009

Re-doon-danciy? What is this strange and alien concept you speak of puny earthling?!

Would it be so hard to add data redundancy to media files? My computer has just south of 1 terabyte of hard drive space. Most people have at least a couple of hundred gigabytes. Storage is so cheap, and transfer speeds so fast, that making all media files 10% bigger is a pretty painless cost for self-reconstructing media files.


Figure 1: Image corruption may well be the end of us all!


Take an image. Split it up in a bunch of quadratic sub-images. Let each of them store low resolution copies of it's Von Neuman neighborhood's contents. Give the blocks checksums, and compress them individually. If a block fails to pass the checksum, you can now reconstruct it using interpolation from it's neighbor's low-res copies. It wouldn't be picture perfect, but you'd be able to deduce the contents of the image even at pretty bad corruption (information recovery at 75% data loss at a best case scenario).

Do the same with video, while you're at it. Seriously, give me a video codec that does this, and you will make the world a better place.

Better yet, add redundancy to text. Add a plaintext Huffman coded copy of the textual contents of a document to it's metadata.

Wednesday, June 17, 2009

The virtue of documenting obscure behavior

In the hope that people in the same situation as me can find this through google: Some shells run signal(SIGINT, SIG_IGN) before they execute your program if you run it in the background. This interferes with sigwait.

I think the reason for this is dubious implementation of background processes. Instead of checking which processes are background process and foreground processes, and only sending SIGINT to the foreground processes, the background processes are told to ignore SIGINT, and the signal is sent to all processes when ^C is received.

You can clear this by putting signal(SIGINT, SIG_DFL) in the beginning of your program, if you want it to behave like a regular foreground process, despite being run in the background.

This is a quite poorly documented fact, that took major hair-pulling to deduce. There are two lessons to be learned:
  1. Don't make assumptions about the standard behavior of anything.
  2. Document obscure behavior.


Thus endeth the frustrated rant.

Monday, June 15, 2009

How to become a programmer

A lot of people aspire to learn programming, and it's not an easy task (there are countless pitfalls that aspiring programmers typically parade into like lemmings), so I figured it would be useful to outline a path acquire that goal.

1. Learn an imperative scripting language. Become confident in a simple language in which it is easy to write functional stuff fast. Not only does this serve as an introduction to imperative programming, it will also be useful down the line when you want to get stuff done quickly without all the red carpet other languages typically come with. Python is probably the best language to start off with, but there are other options as well (ruby, perl, lua, etc.), and they're all good choices for a first language. My motivation for suggesting python is the great community surrounding it, and the libraries it ships with that makes rewarding projects such as simple game programming relatively easy and quick.

(The order in which you take the following two doesn't really matter, do both simultaneously if you want.)

2a. Learn a functional language. Once you've gotten imperative programming down, it's time to learn something that will blow your mind, and fundamentally change the way you think about programming. Lisp or Haskell are good choices. Pick your poison.

2b. Learn a low level language. Low level programming will teach you what actually goes on under the hood. C is probably a good place to start. You may want to try assembly as well: It's a pretty easy language, but only really worth pursuing if you enjoy it.

3. Learn an object-oriented language. Now that you know a fair deal about programming, it's time how to write a lot of code while still keeping it manageable. That's where OOP comes into the picture. You'll probably want to learn some of the following: C++, C# or Java (the two are more or less identical).

Now that we've dealt with what to learn, it's time to discuss how to learn it. It is my opinion that tutorials are dangerous as hell to new programmers. There is no way to verify who wrote the tutorial (it may be some confused 14 year old, no offense to the 14 year old programmers out here), they come in no particular order and are generally inconsistent. The proper way to learn any language is to:

  • Buy a book. Check the reviews as well. You can also ask around the programming community as to what book is best on the subject. You may get away with official online documentation in some languages (e.g. python), but as a general rule, you want a book.

  • Practice. Write programs. Make them small, make them large, make them silly, make them serious.


Taking classes in programming may or may not be useful to you. It can be helpful, but what really makes a difference is how much you practice.

Something is also to be said about the company you keep. You can soak up a lot of programming skills by hanging out at places like the coding forum at www.forums.xkcd.com, and that goes for people at any skill level.

Finally, there are some horrible traps you can walk into.

  1. Don't start in web design. First you need to learn HTML (which has nothing to do with programming at all), and then some abomination like PHP (which teaches terrible coding practices.) A background in web design is salvageable, but it is a damn tar pit.
  2. Don't start in C++ or Visual Basic. Starting in C++ is like putting a 5 year old in the pilot's seat of an airliner: It isn't going to understand 1% of the features, and will probably crash and burn (badum-ching); and Visual Basic is a crime against the mind.


This of course represents my opinion. There's as many answers as to how to become a programmer as there are programmers.

TL;DR: Learn python first.

"Fast" division

This is mostly a note-to-self on how to divide a number in base b with some integer n using nothing but "fast" operations like addition, subtraction and shifting (occasionally multiplication) -- if you bake them together, you typically end up with something of linear complexity or slightly better:

To divide a number in base b with some integer n, find the recursive relationship

1/n = 1/b + (b-n)(1/b)(1/n)

Where 1/b of course can be expressed as a digit shift. So, to express division by 3 in a way that is convenient to a binary computer, you get the relationship

1/3 = 1/2 - 1/2 * (1/3) => 1/3 = 1/2 - 1/4 + 1/8 - 1/16 + 1/32 - 1/64 ...

Conversely, to express division by 2 that is convenient to a trinary computer, you get

1/2 = 1/3 + 1/3 * (1/2) => 1/2 = 1/3 + 1/9 + 1/27 + 1/81 + 1/243 + 1/729 + ...

So now you know... It's pretty easy to vectorize the algorithm you get out of this, but you need to tip-toe around the minefield that is the decimal parts (well, actually, calculate their carry and add that to the integer part) to avoid rounding errors.