Example of a large dataset re-processor (text file IO in Java)

With almost 100,000,000 records to process in two data files, that do not have unified formats, a processor tool had to be crafted to handle both formats, then distill it down to a 3rd smaller and more concise format.

The following source code is an example of how I accomplished that task using Java.

/*
------------------------------------------------------------------------
 $Id: fileRepair.java,v 1.1 2012/03/21 18:31:47 ddemartini Exp $
 $Revision: 1.3 $
 $Date: 2012/03/21 18:31:47 $
 $Name:  $
------------------------------------------------------------------------
 Loader 
------------------------------------------------------------------------
    colname[0]   = "seq_num";	
    colname[1]   = "ip";  
    colname[2]   = "mode";   - drop
    colname[3]   = "property";  
    colname[4]   = "threat";    
    colname[5]   = "desc"; - drop
    colname[6]   = "meta";
    colname[7]   = "detected";  - drop
    colname[8]   = "det_method"; 
    colname[9]   = "reported"; - drop
    colname[10]  = "rpt_method";
    colname[11]  = "target"; - drop
    colname[12]  = "source";
------------------------------------------------------------------------
*/

import base.*;   /*  this contains base and util classes */
import java.io.UnsupportedEncodingException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.io.File;
import java.io.BufferedReader;
import java.nio.ByteBuffer;
import java.util.Hashtable;
import java.util.regex.Pattern;

public class fileRepair{

  private static int skip          = Integer.parseInt(Util.envOrProp("skip")); // number of lines to skip from top
  private static int tStep         = 600000;  // milisecond step for status update (600000 = 10 minutes)
  private static long totInserts   = 0;  // counter for total inserts
  private static int [] fuse       = { 0,1,3,4,6,8,10,12 }; // these are the only fields that will be loaded.
  private static String inFile     = Util.envOrProp("in");
  private static String outFile    = Util.envOrProp("out");
  private static Hashtable srcMap;
 
  public static void main (String[] args) throws Exception {
    System.out.println("inFile  "+inFile);
    System.out.println("outFile "+outFile);

    srcMap = new Hashtable();
    /* OK, this is likely the most rudimently way to od this, but so be it... it's done */
    srcMap.put("19","SINKHOLE_KF3");
    srcMap.put("51","SINKHOLE_KQ11");
    srcMap.put("1000","EXTERN");
    srcMap.put("1009","INTERN_BN3");
    srcMap.put("1011","INTERN_BN9");
    srcMap.put("1012","INTERN_PD4");
    srcMap.put("1014","INTERN_PD1");
    srcMap.put("1025","EMAIL");
    srcMap.put("1027","CLIENT_AP82");
    srcMap.put("1033","CLIENT_AP49");
    srcMap.put("prop_id","source");
    srcMap.put("","UNKNOWN");
    srcMap.put(" ","UNKNOWN");
    srcMap.put("0","UNKNOWN");

    /* Process the file */
    processFile(inFile,outFile); 

  }  /* main */

  public static void processFile(String iFile, String oFile) throws Exception {
    // configure IO
    long tStart;         // get system clock time for a timer
    long tNext;          // get system clock time for a timer
    long tempIns   = 0;  // temporary insert counter
    long currline  = 0;  // line counter
    Pattern delim  = Pattern.compile("[\t]");  // pre-compile the splitter

    try {
      FileReader fRead      = new FileReader(iFile);
      BufferedReader bRead  = new BufferedReader(fRead);
      FileWriter fWrite     = new FileWriter(oFile);
      PrintWriter pWrite    = new PrintWriter(fWrite);

      String line; // used for temporary storage
      
      // loop through the intake file
      tStart = System.currentTimeMillis();   // get system clock time for a timer
      tNext  = tStart + tStep; // set next step unit.

      while ((line = bRead.readLine()) != null) {
        /* implement line skipper */
        if(currline++ < skip) {
          continue;
        }
        // Parse he line on tabs.
        String [] fields = delim.split(line);  // split records based on pre-compiled regex object
   
        // Count the columns  --  if only 11 columns, add the source to the stream. 
        if(fields.length < 13) {
          // re-parse with the extra required data dropped onto the end //
          fields = delim.split(line+"\t0\t"+srcMap.get(fields[2]).toString());
        }

        // Synthesize the desired output line
        StringBuffer newLine = new StringBuffer(fields[fuse[0]]);   // list of strings in columns
        for(int f=1;f tNext) {
          System.out.println("\t"+tempIns);
          tempIns = 0;
          tNext += tStep;
        }
      }
      /* close the file handles */      
      fRead.close();
      fWrite.close();

    } catch ( Exception e ) {
      System.out.println("Fatal Error ("+e+") Encountered - quitting");
      return; 
    }
    // Report total records written, and time consumed.
    long nanEnd = System.currentTimeMillis();
    System.out.println(totInserts+" lines processed");
    System.out.println("Completed loading data at "+nanEnd);
    System.out.println("Total Load Time ("+(nanEnd-tStart)+") - "+((nanEnd-tStart)/1000)+" seconds");    
    return;
  }  /* processFile */

} /* fileRepair */

The first run against a 71 million line file completed in under 10 minutes.

     [java] inFile  ~/workbench/all_ips.dta
     [java] outFile ~/workbench/ips.1.dta
     [java] 71,244,397 lines processed

Total time: 9 minutes 41 seconds

Simple Java Utils Library source code.

Today I’m including a couple of general purpose Java Source code files.

/*
------------------------------------------------------------------------
 $Id: Util.java,v 1.1 2012/03/14 16:16:49 ddemartini Exp $
 $Revision: 1.1 $
 $Date: 2012/03/14 16:16:49 $
 $Name:  $
------------------------------------------------------------------------
 Utilities

 Added file utilities
------------------------------------------------------------------------
*/

package base;
import java.io.UnsupportedEncodingException;
import java.io.File;
import java.util.*;

public class Util {
      
  /* retrieve environment or -D options passed by implementer */
  public static String envOrProp(String name) {
    if (System.getenv(name) != null) {
      return System.getenv(name);
    }
    else if (System.getProperty(name) != null) {
      return System.getProperty(name);
    }
    else {
      return null;
    }
  } /* envOrProp */

  /* Open a file stream and test for read */
  private static File fhRead(String inFile) throws Exception {
    File rf = openFile(inFile);
    try {
      // verify file exists and is readable
      if(! rf.exists() || !rf.canWrite()){
        System.out.println("Unable to read file "+inFile);
        return null;
      }
    } catch (Exception e) {
       System.out.println("Fatal error attempted to open stream from "+inFile+"\n"+e);
       return null;
    }
    return rf;
  } /* fhRead */

  /* Open a file steam for writing */
  private static File fhWrite(String outFile) throws Exception {
    File of = openFile(outFile);
    try {
      // verify file would be writable 
      if(!of.canWrite()){
        System.out.println("Unable to read file "+outFile);
        return null;
      }
    } catch (Exception e) {
       System.out.println("Fatal error attempted to open stream to "+outFile+"\n"+e);
       return null;
    }
    return of;
  } /* fhWrite */

  /* Attempt to open a file handle */
  private static File openFile(String inFile) throws Exception {
    File file;
    try {
      file = new File(inFile);  // open file handle
    } catch (Exception e) {
       System.out.println("Fatal error attempted to open stream to "+inFile+"\n"+e);
       return null;
    }
    return file;
  } /* openFile */

} /* Util */

Subsequent source code using this utility file will be documented later.

Java multi-get demonstrator for Cassandra NoSQL db

This simple demonstrator that makes a connection to a Cassandra cluster and inserts a user-defined number of row, then extracts them to demonstrate performance boost with multiGet.

It’s genesis was the result of working towards a Thrift API loader for a Cassandra evaluation implementation. You can read bout that, here: Cassandra – A Use case examined (IP data)

/*
------------------------------------------------------------------------
 $Id: useMultiGet.java,v 1.2 2012/03/15 12:56:45 ddemartini Exp $
 $Revision: 1.2 $
 $Date: 2012/03/15 12:56:45 $
 $Name:  $
------------------------------------------------------------------------
 MultiGet demonstrator 

 This class makes use of our custom Utility class
------------------------------------------------------------------------
*/

package c01;
import base.*;
import java.util.*;
import java.nio.ByteBuffer;
import org.apache.cassandra.thrift.*;

public class useMultiGet {
  
  public static void main (String[] args) throws Exception {
    /* Use to specify host,port,keyspace,columnfamily and insert count */

    String host = Util.envOrProp("host");
    int port    = Integer.parseInt(Util.envOrProp("port"));
    int inserts = Integer.parseInt(Util.envOrProp("inserts"));
    String ks   = Util.envOrProp("ks");
    String cf   = Util.envOrProp("cf");
    int slice   = Integer.parseInt(Util.envOrProp("slice"));

    /* instance db connector and open connection */
    CassDB db   = new CassDB(host,port); 
    db.open();

    /* first check for the existance of the keyspace */
    

    /* use ColumnParent to insert data */
    ColumnParent parent = new ColumnParent();
    parent.setColumn_family(cf);
    
    /* use ColumnPath to get data */
    ColumnPath path = new ColumnPath();
    path.setColumn_family(cf);
    path.setColumn("acol".getBytes("UTF-8"));  /* RESEARCH */

    /* use defined the number of inserts to make in this demonstrator */
    Column col = new Column(); 
    db.getClient().set_keyspace(ks);
    col.setName("acol".getBytes());
    long timestamp = System.currentTimeMillis();
    for (int j = 0; j < inserts; j++) {
      ByteBuffer key = ByteBuffer.wrap((j+"").getBytes()); 
      col.setValue(key);
      col.setTimestamp(timestamp);  /* you MUST add a timestamp!! */
      col.setTtl(600);  /*  set a TTL 10 minutes into the future */
      db.getClient().insert(key, parent, col, ConsistencyLevel.ALL);  /* insert! */
      db.getClient().get(key, path, ConsistencyLevel.ALL);  /* get data back */
    }

    /* create a timer */
    long getNanos = System.nanoTime();
    for (int j = 0; j < inserts; j++) {
      ByteBuffer key = ByteBuffer.wrap((j+"").getBytes()); 
      col.setValue(key);
      db.getClient().get(key, path, ConsistencyLevel.ONE);
    }
    long endGetNanos = System.nanoTime()-getNanos;

    /* se MultiGet, requires a SlicePredicate which can either be a list of columns or a slice range */
    SlicePredicate pred = new SlicePredicate();
    pred.addToColumn_names(ByteBuffer.wrap("acol".getBytes()));
    long startMgetNanos = System.nanoTime();
    /* loop in batches of 5 */
    for (int j = 0; j < inserts; j=j+5){
      List wantedKeys = new ArrayList();
      for (int k=j; k

Cassandra DB Connetor in Java, using Thrift API

This simple connector class for making a connection to a Cassandra cluster. It’s genesis was the result of working towards a Thrift API loader for a Cassandra evaluation implementation. You can read bout that, here: Cassandra – A Use case examined (IP data)

/*
------------------------------------------------------------------------
 $Id: CassDB.java,v 1.1 2012/03/13 23:20:02 ddemartini Exp $
 $Revision: 1.1 $
 $Date: 2012/03/13 23:20:02 $
 $Name:  $
------------------------------------------------------------------------
 Sample connector program.  Requires two environment variables be setup
 to executed
    host:    hostname of Cassandra cluster node member
    port:    connector port - default is 9160, this should be default
------------------------------------------------------------------------
*/

package base;
import org.apache.cassandra.thrift.Cassandra;
import org.apache.thrift.protocol.*;
import org.apache.thrift.transport.*;

public class CassDB {
      
  /* Delcare private properties used for client server communications */
  private TTransport transport;
  private TProtocol  protocol; 
  private TSocket    socket;

  /* Constructor takes supplied host and port */
  public CassDB(String host, int port) throws Exception {
    try {
      socket    = new TSocket(host,port);
      transport = new TFramedTransport(socket);
      protocol  = new TBinaryProtocol(transport);
    } catch(Exception e){
      System.out.println("Exception "+e);
    }
  }

  /* Opener */
  public void open() throws Exception {
    try {
      transport.open();
    } catch(Exception e) {
      System.out.println("Exception "+e);
    }
  }

  /* Closer */
  public void close() throws Exception {
    try {
      transport.close();
    } catch(Exception e) {
      System.out.println("Exception "+e);
    }
  }

  /* getClient method */
  public Cassandra.Client getClient() {
    Cassandra.Client client = new Cassandra.Client(protocol);
    return client;
  }

 /* end */
}

Calling and using this class is pretty simple. Here are some excerpts from a program I’ll post later:

[...]
package c01;
import base.*;
[...]
    String host = Util.envOrProp("host");
    int port    = Integer.parseInt(Util.envOrProp("port"));
[...]
    /* instance db connector and open connection */
    CassDB db   = new CassDB(host,port); 
    db.open();
[...]

Creating a simple Utils class

This simple Utils class is more of a demonstrator than a widely-reusable class. It’s genesis was the result of working towards a Thrift API loader for a Cassandra evaluation implementation. You can read bout that, here: Cassandra – A Use case examined (IP data)

Here is the source code:

/*
------------------------------------------------------------------------
 $Id: Util.java,v 1.1 2012/03/14 16:16:49 ddemartini Exp $
 $Revision: 1.1 $
 $Date: 2012/03/14 16:16:49 $
 $Name:  $
------------------------------------------------------------------------
 Utilities
------------------------------------------------------------------------
*/

package base;
import java.io.UnsupportedEncodingException;
import java.util.*;
import org.apache.cassandra.thrift.*;

public class Util {
      
  /* Returns a list of keyspaces, good for keyspace lookup */
  public static List listKeySpaces(Cassandra.Client c) throws Exception {
    List results = new ArrayList();
    for (KsDef k : c.describe_keyspaces()) {
      results.add(k.getName());
    }
    return results;
  } /* listKeySpaces */

  /* create KsDef CfDef ready for use with system_add_keyspace() */
  public static KsDef createSimpleKSandCF(String ksname, String cfname, int replication) {
    KsDef newKs = new KsDef();
    newKs.setStrategy_class("org.apache.cassandra.locator.SimpleStrategy");
    newKs.setName(ksname);
    newKs.setReplication_factor(replication);
    CfDef cfDef = new CfDef();
    cfDef.setKeyspace(ksname);
    cfDef.setName(cfname);
    newKs.addToCf_defs(cfDef);
    return newKs;
  } /* createSimpleKSandCF */ 
  
  /* retrieve environment or -D options passed by implementer */
  public static String envOrProp(String name) {
    if (System.getenv(name) != null) {
      return System.getenv(name);
    }
    else if (System.getProperty(name) != null) {
      return System.getProperty(name);
    }
    else {
      return null;
    }
  } /* envOrProp */

} /* Util */

Java build env to prepare for Cassandra development

Getting it all ready…
PREV: Cassandra – Getting a 3 node cluster built

First, I wanted to see how much of a system footprint 3 instances of Cassandra had on this little system. Here you can see the 3 instances patiently waiting for something todo. Sitting idle for about 24 hours (note, TIME+ is system time, not wall clock), total memory utilization has crept up from 11% to 14% per process.


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4554 bigdata 20 0 891m 134m 4388 S 7.6 14.4 7:47.96 java
4632 bigdata 20 0 917m 133m 4340 S 0.7 14.3 7:45.64 java
4593 bigdata 20 0 896m 133m 4168 S 0.3 14.3 7:40.37 java

Keep in mind this test box has a single core CPU with a whopping 1GB of memory. If I can get it to work on this box without pushing it over, you should be able to run a single instance on any box with a reasonable expectation of function.

The data model I wanted to use is pretty basic: IP traffic, consisting of the following elements:

* IPv4 address
* destination port
* timestamp
* TTL (this is a Cassandra construct to allow auto-tombstoning of data when it’s usefulness has expired)

To get this data, I’m thinking of simply running TCPdump on a box, or possibly my laptop, to generate some traffic, then stream that into a program to insert into Cassandra as fast as the packets go by.

With the limited disk space on the box (see below) I can’t run it indefinitely, but I should be able to run it for an afternoon to load a keyspace, then start to figure out how to get the data back out!


Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 75956320 4344788 67753152 7% /
none 470324 640 469684 1% /dev
none 478024 420 477604 1% /dev/shm
none 478024 108 477916 1% /var/run
none 478024 0 478024 0% /var/lock

One thing I could do is load the data into the database, then run a 2nd pass processor on it and mutate the data with reverse lookups. Sort of a poor-man’s Wireshark type of tool. Now, if I wire this into my eventually to be setup RPZ enabled DNS resolver, I could track all data on my network, including all the requests from my Apple TV device. It might be interesting to see what it’s *really* doing on the network.


Downloading Support Packages for Development Environment

Before staring to code though, it looks like I need to ensure my JDK / Java libs are all up to date… and also to facilitate working with the documentation I’m reviewing.. Apache ANT will be installed too.

Java JDK – Java Software Development Kit

The JDK is a development environment for building applications, applets, and components using the Java programming language.
The JDK includes tools useful for developing and testing programs written in the Java programming language and running on the Java&™; platform.

Package URL: http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz


mkdir jdk
cd jdk
wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz

Extract the package:

tar xvzf jdk-7u3-linux-x64.tar.gz

Although I could simply run the JDK from the local user location, I decided to go for the ‘System Install’ option, and created a jdk location in user/lib, then copied the parts there according to the info in the docs. In this case I just downloaded the JRE again… you could skip that step and copy the .gz file already downloaded above. Your call.


sudo mkdir /usr/lib/jdk
cd /usr/lib/jdk
sudo wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz
sudo tar xvzf jdk-7u3-linux-x64.tar.gz
sudo rm jdk-7u3-linux-x64.tar.gz

Oracle’s page says that it’s now ‘installed’ but I suspect there are a more than a few more steps required here! This is almost as good as Oracle technical support… I’ll try to be a little more helpful.

Setting the path in my ~/.bash_profile will resolve the path issue for Ant and JUnit. This is what I set in my file:

export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_03

ANT – Apache Ant

Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing to compile, assemble, test and run Java applications. Ant can also be used effectively to build non Java applications, for instance C or C++ applications. More generally, Ant can be used to pilot any type of process which can be described in terms of targets and tasks.

Package URL: http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz


mkdir ant
cd ant
wget http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz

Extract the package:


tar xvzf apache-ant-1.8.3-bin.tar.gz

Docs inside Ant say to go back to the web and read the installation instructions, located here: http://ant.apache.org/manual/install.html#installing I happen to like where my ant stuff was installed so I’m going to set ANT_HOME in my ~/.bash_profile to the location where I extracted the stuff. Ideal? Probably not but I’m doing this research on a perfectly good Saturday.. you get what you’re paying for.


export ANT_HOME=/home/bigdata/ant/apache-ant-1.8.3
export PATH=$PATH:$ANT_HOME/bin

Testing to see if the paths and parts are there worked. This error is actually expected (we’ll write the build.xml later).

$ ant
Buildfile: build.xml does not exist!
Build failed

JUnit – Test framework for test based development

JUnit is a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.


mkdir junit
cd junit
wget https://github.com/downloads/KentBeck/junit/junit-4.10.jar
wget https://github.com/downloads/KentBeck/junit/junit4.10.zip

Extract the source package, in case I need it:


unzip junit4.10.zip

I can’t say this is the best way to do this, it’s cookie-cutter implementation from documentation. If you see something that does not make sense or is flat out stupid, post comment and let me know!


Development Environment Setup

Create primary development folder and expected sub-folders. You’re naming conventions may vary:

mkdir cBuild
mkdir cBuild/src
mkdir cBuild/src/{java,test}
mkdir cBuild/lib

Populate the lib with libraries from the Cassandra distribution and Junit.

cp cassA-1.0.8/lib/*.jar cBuild/lib/.
cp junit/*.jar cBuild/lib/.

To employ JUnit testing harness via Ant Java builder, a build.xml file is required in the cBuild base directory. Here are sample contents. You’re paths may differ if you went your own way on the directories.


vi cBuild/build.xml

<project name="jCas" default="dist" basedir=".">
<property name="src" location="src/java"/>
<property name="test.src" location="src/test"/>
<property name="build" location="build"/>
<property name="build.classes" location="build/classes"/>
<property name="test.build" location="build/test"/>
<property name="dist" location="dist"/>
<property name="lib" location="lib"/>
<!-- Tags used by Ant to help build paths, most useful when multiple .jar files are required -->
<path id="jCas.classpath">
<pathelement location="${build.classes}"/>
<fileset dir="${lib}" includes="*.jar"/>
</path>
<!-- exclude test cases from the final .jar file, this defines that policy -->
<path id="jCas.test.classpath">
<pathelement location="${test.build}"/>
<path refid="jCas.classpath"/>
</path>
<!-- Define the 'init' target, used by other build phases -->
<target name="init">
<mkdir dir="${build}"/>
<mkdir dir="${build.classes}"/>
<mkdir dir="${test.build}"/>
</target>
<!-- 'compile' target -->
<target name="compile" depends="init">
<javac srcdir="${src}" destdir="${build.classes}">
<classpath refid="jCas.classpath"/>
</javac>
</target>
<!-- 'test compile' target -->
<target name="compile-test" depends="init">
<javac srcdir="${test.src}" desdir="${test.build}">
<classpath refid="jCas.test.classpath"/>
</javac>
</target>
<!-- setup policies that tell JUnit to execute tests on files in test that end with .class -->
<target name="test" depends="compile-test,compile">
<junit printsummary="yes" showoutput="true">
<classpath refid="jCas.test.classpath"/>
<batchtest>
<fileset dir="${test.build}" includes="**/Test.class"/>
</batchtest>
</junit>
</target>
<!-- on a good build, dist target creates final JAR jCas.tar -->
<target name="dist" depends="compile">
<mkdir dir="${dist}/lib"/>
<jar jarfile="${dist}/lib/jCas.jar" basedir="${build.classes}"/>
</target>
<!-- run target allows execution of the built classes -->
<target name="run" depends="dist">
<java classname="${classToRun}">
<classpath refid="jCas.classpath"/>
</java>
</target>
<!-- clean target gets rid of all the left over files from builds -->
<target name="clean">
<delete dir="${build}"/>
<delete dir="${dist}"/>
</target>
</project>

Testing this build environment
Having created the build.xml file, it needs to be tested to make sure it even works.

Create a test case and build case

cd cBuild/src
vi Test.java

import junit.framework.*;
public class Test extends TestCase {
public void test() {
assertEquals( "Equality Test", 0, 0);
}
}

Create a really simple program..

vi X1.java

public class X1 {
public static void main (String [] args) {
System.out.println("This is Java.... drink up!");
}
}

Now the rubber meets the road if everything is setup properly and we can build a file!

Run ant with target set to ‘test’

~/cBuild$ ant test
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile-test:
[javac] /home/bigdata/cBuild/build.xml:33: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

compile:
[javac] /home/bigdata/cBuild/build.xml:27: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to /home/bigdata/cBuild/build/classes

test:

BUILD SUCCESSFUL
Total time: 7 seconds

Run ant with target set to ‘diet’


~/cBuild$ ~/cBuild$ ant dist
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile:
[javac] /home/bigdata/cBuild/build.xml:27: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

dist:
[jar] Building jar: /home/bigdata/cBuild/dist/lib/jCas.jar

BUILD SUCCESSFUL
Total time: 1 second

It’s a good idea to check your .jar to make sure your class is actually in it. Ant, for some reason beyond understanding or logic, WON’T let you know if your lib was skipped (had it happen in my first build.. exceptionally ungood).


~/cBuild$ jar -tf dist/lib/jCas.jar
META-INF/
META-INF/MANIFEST.MF
X1.class

As you can see, there is no Whiskey, but X1 is in the jar.

RUN!!!


~/cBuild$ ant -DclassToRun=X1 run
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile:
[javac] /home/bigdata/cBuild/build.xml:27: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

dist:

run:
[java] This is Java.... drink up!

BUILD SUCCESSFUL
Total time: 1 second

SUCCESS!!!

All told it took me about 3 1/2 hours to get this setup, parts installed, these notes written up and a SIMPLE Java program executed. So.. let your own expectations accordingly. Hopefully you’ll save a lot of time with the build.xml file.. I typed that in char for char. You could just do a cut-paste, fix up anything you don’t like in my path names and let it rip.

Good luck.. more to follow on Cassandra!!! (even though this post was more about getting ready to write code to access it).

NEXT: Re-Configuring an Empty Cassandra Cluster * moved to my personal blog *