Hadoop Cluster Sizing Wizard by Hortonworks

Anyone who does any Hadoop development or systems engineering arrives at the “how should I size my cluster” question.

Hortonworks has a very nice cluster sizing calculator that takes into account the basic use-cases and data profile to help get you started with your hardware requirements.→ Continue reading “Hadoop Cluster Sizing Wizard by Hortonworks”

Generate a Random String of a Specified Size with a Shell Script

The following is a one-liner for generating a random string of a fixed size in bash, where the possible characters to use in the string are any digit, letter, and a newline.

By adding the newline, you are fairly sure to prevent getting one long line of text.

< /dev/urandom tr -dc "[:digit:][:alpha:][\n]" | head -c1000 file.out
Continue reading “Generate a Random String of a Specified Size with a Shell Script”

Connecting To a Test Kitchen Instance Via SFTP, SSH, or SCP

If you are using Chef and Test Kitchen to test your cookbooks you may have need to connect to the Test Kitchen VM in some other fashion other than $ kitchen login instance-name.

To do so:

Do a $ kitchen list to see the running vms

kitchen list
Instance                      Driver   Provisioner  Verifier  Transport  Last Action
default-centos-66             Vagrant  ChefSolo     Busser    Ssh        Converged

Then look in the .kitchen directory from where you ran your $ kitchen command and look for the corresponding → Continue reading “Connecting To a Test Kitchen Instance Via SFTP, SSH, or SCP”

Setting the Compiler Version for Maven from the Command Line

By default maven sets the compiler version for you.  Of course, you can always set it in the pom, but there are cases where you cannot modify the pom, and/or you might want to test compilation and tests with different versions of java.

Following are the specific arguments to pass the compiler version to maven from the command line:

mvn clean install -Dmaven.compiler.source=1.7 -Dmaven.compiler.target=1.7
Continue reading “Setting the Compiler Version for Maven from the Command Line”

Writing a BASH Script to Read from STDIN to a Variable

Let’s say you have some program that is generating output to STDOUT and you want to write a script to read that output from STDIN and use it as a variable in your script.

To do so:

#!/bin/bash

SOME_VAR=$(cat)
echo "SOME_VAR = $SOME_VAR"

Continue reading “Writing a BASH Script to Read from STDIN to a Variable”

How To Publish Artifacts to the Maven Central Repository

I have just finished releasing my first project to Maven Central Repository and wanted to capture my notes for the setup of the project and all of the steps required.

Resources

Create account on OSSRH

You will need an account to the sonatype JIRA for OSSRH.  From there, you can request the creation of a new project. See http://central.sonatype.org/pages/ossrh-guide.html for details.

Setup of the Project/pom and Pre-requisites:

PGP keys

http://central.sonatype.org/pages/working-with-pgp-signatures.html

Create a set of PGP keys

gpg2 
Continue reading “How To Publish Artifacts to the Maven Central Repository”

Debugging MapReduce MRv2 Code in Eclipse

Following is how to set-up your environment to be able to set breakpoints, step-through, and debug your MapReduce code in Eclipse.

All of the this was done on a machine running Linux, but should work just fine for any *nix machine, and perhaps Windows running Cygwin (assuming that you can get Hadoop and its naitive libraries compiled under Windows).

This also assumes that you are building your project with maven.

Install a pseudo-distributed hadooop cluster on your development box.  (Yes, → Continue reading “Debugging MapReduce MRv2 Code in Eclipse”

Unit Testing Private Static Methods With Primitive Array Arguments

When writing unit tests to cover your entire program you will undoubtedly come across the need to test private methods.  There are arguments that these methods should be tested via integration tests, but there are sometimes when it makes more sense to test all of the permutations in a unit test. This can be achieved using reflection in Java JUnit tests.

What is a little tricky, and was not completely obvious, was how to use reflection to test a private → Continue reading “Unit Testing Private Static Methods With Primitive Array Arguments”

One-Liner for Converting CRLF to LF in Text Files

If you have text files created under DOS/Windows and need to convert the CRLF (carriage return and line feed) characters to LF (line feed) character, here is a quick one-liner.

cat file.txt | perl -ne 's/\x0D\x0A/\x0A/g; print' file.txt.mod

You can also use dos2unix, however, especially under Cygwin I have seen dos2unix fail without giving any meaningful information about why it was unable to complete the task.  In that case, you can just do it by hand. → Continue reading “One-Liner for Converting CRLF to LF in Text Files”