irodata

Saturday, January 25, 2020

Hello, World.

Stay tuned, changes and new content to come.

Friday, August 12, 2011

Autonomous Robots for War (seems like a bad idea)

IEEE Spectrum talking about the possibility of robots being used for war:
http://spectrum.ieee.org/robotics/military-robots/autonomous-robots-in-the-fog-of-war

The entire idea of war being a Bad Thing aside, why do we keep blindly rushing toward Robot Apocalypse (yes, this is a theme for me)? It would be easier to handle if the experts were considering the possibilities, but they are not. Text from the article linked above shows that the author is entirely too dismissive of the possibilities.

<<
I've been working with robots for more than two decades, starting with underwater vehicles, then moving to air and ground vehicles, and most recently addressing collaborations among robots like those we demonstrated at the Robotics Rodeo. I can attest that while robots are definitely getting smarter, it is no easy task to make them so smart that they need no adult supervision. And call me a skeptic, but I doubt they'll be cloning themselves anytime soon.
>>

OK, I get it. You write for IEEE Spectrum, you are smart. However, sometimes the consequences are so bad that they need to be taken seriously even if they are unlikely (think expected value and black swans).

Also, there is all kinds of research being done right now on machines that can repair themselves (I blogged about this here back in January), so I think that this is something that we really should start taking more seriously.

Good grief.

Doesn't anybody read Philip K. Dick anymore? Second Variety should scare you enough to know better, and that was published in May 1953.

Thursday, August 11, 2011

Quine in Java

I was doing some reading on self replicating code (the bad kind) and stumbled onto this, which is very cool in an old school sort of way.

From Wikipedia:
A quine is a computer program which takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are self-replicating programs, self-reproducing programs, and self-copying programs.

Here is a nice walk through of the thought process of creating a quine in Java:
http://blogs.adobe.com/charles/2011/01/my-adventure-writing-my-first-quine-in-java.html

The finished product of this blog looks like this (This was tested in eclipse. Note that the main function is only two lines. The wrapping is just so that it is easier to read on this blog):

public class Quine {
    public static void main (String[] args) {
        String s = "public class Quine {%3$c%4$cpublic static void main (String[] args) {%3$c%4$c%4$cString s = %2$c%1$s%2$c;%3$c%4$c%4$cSystem.out.printf(s, s, 34, 10, 9);%3$c%4$c}%3$c}";
        System.out.printf(s, s, 34, 10, 9);
    }
}

Tuesday, August 9, 2011

Entropy

ent is a very cool tool to quickly determine the entropy of a given file or stream.

Here is a quick demonstration:

The entropy of 'AAAAA':

The entropy of '28183' (5 byte pseudorandom number generated by the rand program):

Now, let's consider these two outputs (much of the content below is from the ent man page):

Entropy
Entropy (according to ent) is the information density of the contents of the file expressed as a number of bits per character. In the first example we see that 'AAAAA' is reported as having 0 entropy and that '28183' has 1.92. The takeaway here is that although '28183' is more dense than 'AAAAA', it is much less dense than one might expect.

Chi-Square Test
The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect.
Note: This test clearly shows that ent does not consider '28183' to be not very random at all.

Arithmetic Mean
This is simply the result of summing the all the bytes (bits if the -b option is specified) in the file and dividing by the file length. If the data are close to random, this should be about 127.5 (0.5 for -b option output). If the mean departs from this value, the values are consistently high or low.
Note: If a file is printable characters only (or some other subset of possible byte values), but still pseudorandom inside that set, then the value that represents the pseudorandom mean will be different. Additionally, it is unclear what general impact subsets of data have on the results reported by ent.

Monte Carlo Value for PI
Each successive sequence of six bytes is used as 24 bit X and Y coordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a "hit". The percentage of hits can be used to calculate the value of Pi. For very large streams (this approximation converges very slowly), the value will approach the correct value of Pi if the sequence is close to random. A 32768 byte file created by radioactive decay yielded: Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).

Serial Correlation Coefficient
This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (which can be positive or negative) will, of course, be close to zero. A non-random byte stream such as a C program will yield a serial correlation coefficient on the order of 0.5. Wildly predictable data such as uncompressed bitmaps will exhibit serial correlation coefficients approaching 1.
Note: Interesting to see that '28183' ends up with a serial correlation somewhere between that of a C program and wildly predictable data.

Thursday, April 14, 2011

iRobot Create

Pretty cool...

http://store.irobot.com/shop/index.jsp?categoryId=3311368

Wednesday, April 13, 2011

Simple Trigonometric Functions with LabVIEW

The Unit Circle:

Needs no explanation.

The Block Diagram:
This is a simple loop that utilizes the trig functions that are in the Mathematics -> Elementary and Special Functions -> Trigonometric Functions palette.

Three Screen Shots:
Here are a few screen shots from the front panel that show this simple VI in action.

Monday, April 11, 2011

Named Entity Extraction with Java

In February I posted on embedding weka in a java application based on content from this book by Mark Watson.

Recently I have been doing some work that requires some entity extraction, and I found another great tool and example set from the same book.

Here is some simple code that uses the classes provided:

package com.irodata.entity_extraction;

import com.markwatson.nlp.propernames.Names;

public class EntityExtraction {
    public static void main(String[] args) {
        Names names = new Names();
        System.out.println("Hello World, Is New York a real place?");
        System.out.println("New York: " + names.isPlaceName("New York"));
        System.out.println("Hello World, Is Oz a real place?");
        System.out.println("Oz: " + names.isPlaceName("Oz"));
    }

}

The output is this:

Hello World, Is New York a real place?
New York: true
Hello World, Is Oz a real place?
Oz: false

I am going to be using and expansion of this to do some identification of places and proper names in some database text data to assist with analysis. If anything good and simple (and therefore appropriate for this blog) turns up I will be sure to share. For now, all I can say is that this is a good set of classes if you need to do some quick text work in Java.

I only had one small gotcha that I should reference for anyone that wants to try this out:

The text says this:

The “secret sauce” for identifying names and places in text is the data in the file test data/propername.ser – a serialized Java data file containing hash tables for human and place names.

When I first tried to build a test implementation, I kept getting this error:

java.io.FileNotFoundException: data/propername/propername.ser (No such file or directory)

After taking a look at the Names class, it appears that the location of the serialized data file was hard coded. It is possible that I am missing a simple Java convention, but the easiest solution for me was to copy the .ser file to the location that Names was looking for it (/data/propername/) and then import this file system resource into the eclipse project. It is possible that there is a better way to do this, if you know of one, please send me an email and I will update this post and give you credit. Otherwise, this worked. The resulting project looks like this: