irodata: August 2011

Friday, August 12, 2011

Autonomous Robots for War (seems like a bad idea)

IEEE Spectrum talking about the possibility of robots being used for war:
http://spectrum.ieee.org/robotics/military-robots/autonomous-robots-in-the-fog-of-war

The entire idea of war being a Bad Thing aside, why do we keep blindly rushing toward Robot Apocalypse (yes, this is a theme for me)? It would be easier to handle if the experts were considering the possibilities, but they are not. Text from the article linked above shows that the author is entirely too dismissive of the possibilities.

<<
I've been working with robots for more than two decades, starting with underwater vehicles, then moving to air and ground vehicles, and most recently addressing collaborations among robots like those we demonstrated at the Robotics Rodeo. I can attest that while robots are definitely getting smarter, it is no easy task to make them so smart that they need no adult supervision. And call me a skeptic, but I doubt they'll be cloning themselves anytime soon.
>>

OK, I get it. You write for IEEE Spectrum, you are smart. However, sometimes the consequences are so bad that they need to be taken seriously even if they are unlikely (think expected value and black swans).

Also, there is all kinds of research being done right now on machines that can repair themselves (I blogged about this here back in January), so I think that this is something that we really should start taking more seriously.

Good grief.

Doesn't anybody read Philip K. Dick anymore? Second Variety should scare you enough to know better, and that was published in May 1953.

Thursday, August 11, 2011

Quine in Java

I was doing some reading on self replicating code (the bad kind) and stumbled onto this, which is very cool in an old school sort of way.

From Wikipedia:
A quine is a computer program which takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are self-replicating programs, self-reproducing programs, and self-copying programs.

Here is a nice walk through of the thought process of creating a quine in Java:
http://blogs.adobe.com/charles/2011/01/my-adventure-writing-my-first-quine-in-java.html

The finished product of this blog looks like this (This was tested in eclipse. Note that the main function is only two lines. The wrapping is just so that it is easier to read on this blog):

public class Quine {
    public static void main (String[] args) {
        String s = "public class Quine {%3$c%4$cpublic static void main (String[] args) {%3$c%4$c%4$cString s = %2$c%1$s%2$c;%3$c%4$c%4$cSystem.out.printf(s, s, 34, 10, 9);%3$c%4$c}%3$c}";
        System.out.printf(s, s, 34, 10, 9);
    }
}

Tuesday, August 9, 2011

Entropy

ent is a very cool tool to quickly determine the entropy of a given file or stream.

Here is a quick demonstration:

The entropy of 'AAAAA':

The entropy of '28183' (5 byte pseudorandom number generated by the rand program):

Now, let's consider these two outputs (much of the content below is from the ent man page):

Entropy
Entropy (according to ent) is the information density of the contents of the file expressed as a number of bits per character. In the first example we see that 'AAAAA' is reported as having 0 entropy and that '28183' has 1.92. The takeaway here is that although '28183' is more dense than 'AAAAA', it is much less dense than one might expect.

Chi-Square Test
The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect.
Note: This test clearly shows that ent does not consider '28183' to be not very random at all.

Arithmetic Mean
This is simply the result of summing the all the bytes (bits if the -b option is specified) in the file and dividing by the file length. If the data are close to random, this should be about 127.5 (0.5 for -b option output). If the mean departs from this value, the values are consistently high or low.
Note: If a file is printable characters only (or some other subset of possible byte values), but still pseudorandom inside that set, then the value that represents the pseudorandom mean will be different. Additionally, it is unclear what general impact subsets of data have on the results reported by ent.

Monte Carlo Value for PI
Each successive sequence of six bytes is used as 24 bit X and Y coordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a "hit". The percentage of hits can be used to calculate the value of Pi. For very large streams (this approximation converges very slowly), the value will approach the correct value of Pi if the sequence is close to random. A 32768 byte file created by radioactive decay yielded: Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).

Serial Correlation Coefficient
This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (which can be positive or negative) will, of course, be close to zero. A non-random byte stream such as a C program will yield a serial correlation coefficient on the order of 0.5. Wildly predictable data such as uncompressed bitmaps will exhibit serial correlation coefficients approaching 1.
Note: Interesting to see that '28183' ends up with a serial correlation somewhere between that of a C program and wildly predictable data.