Remove lines from the top and bottom of a file

September 21, 2016 / 0 comments

If you need to remove the first X lines from the top and last Y lines from the bottom of a file, try: Remove first X lines tail +X file > file.2 Remove last Y lines from bottom of file NOTE: this may not work on BSD Unix, i.e. MacOSX. Try using ghead via ‘brew…

Read more →

Create, share, and contribute to a Git repository without GitHub

October 26, 2015 / 0 comments

Description: Have you found yourself in a scenario where you can’t use GitHub or a shared server for exchanging Git contributions among a team? Well you’re in luck, because Git will easily allow you to work with a team of developers using old-fashioned SSH or e-mail exchanges of code. The below steps will show you…

Read more →

Configuring Zepplin to work with Spark

October 20, 2015 / 0 comments

Here are the difficulties I faced with getting Zepplin to work with Spark, if any are relevant to you, you’ll find this post useful: Zepplin on Spark, while running via Mesos, but with the Zepplin interpreter returning a JsonMappingException everytime a cluster map/reduce operation is invoked Zepplin on Spark Standalone, but with the Zepplin app…

Read more →

ExecutorLostFailure when trying to run Apache Spark on Apache Mesos

August 24, 2015 / 0 comments

I was running into errors when trying to run Spark on Mesos. Here’s the set up I had: 4-nodes, Linux Clustered Mesos 0.21.0 Clustered Hadoop 2.4 (to-be) clustered Spark 1.4.1 I’d followed directions and deployed Spark onto Mesos via a Spark binary placed on HDFS, and was successfully able to launch a spark-shell pointing at…

Read more →

Kafka log4j file maintenance

July 13, 2015 / 0 comments

By default, the Kafka 2.8.0 log directory gets populated with rolling hourly files. For significant real-time applications, the server-side log files contained within this directory can quickly fill up an entire disk. The log4j configuration file ($KAFKA_HOME/config/ uses the DailyRollingFileAppender to generate log files. Unfortunately, this particular appender does not have a clean-up mode that can…

Read more →

Creating and applying a patch in Git

April 23, 2015 / 0 comments

Creating a patch Navigate to the root directory of your codebase Crate a patch, i.e. git diff > /tmp/mypatch.patch Applying the patch NOTE: it’s best to create a new branch in the new codebase before applying the patch first. Navigate to the root directory of the new codebase Apply the patch, i.e. git apply /tmp/mypatch.patch

What does Unix `kill -0 PID` mean?

March 24, 2015 / 0 comments

Ran across this problem earlier today in my work, couldn’t understand what the use and purpose of the following statement was: kill -0 $PID I consulted the kill man page and online references but was left without a clear answer. Finally, I found a reference 1“Linux / UNIX: Kill Command Examples.” Linux Unix Tutorial for Beginners and…

Read more →

Spark Tips and Tricks

February 25, 2015 / 0 comments

Introduction In this post, I provide tips on how best to use Spark for specific use cases. Moreover, I also provide some lessons learned (tricks) that may be useful.  The content is divided by logical group. Enjoy! Data Loading Text File Data In cases where your data is split between many different text files, and…

Read more →

Apache Kafka

January 29, 2015 / 0 comments

Introduction In this blog post, I document useful notes and guidelines on Apache Kafka. If you have questions or comments, please don’t hesitate to e-mail me. Background Kafka is a solution… to deal with real-time volumes of information and route it to multiple consumers quickly 1Garg, Nishant. “Introducing Kafka.” Apache Kafka. Birmingham: Packt, 2013. Print….

Read more →

Apache Tika

January 21, 2015 / 0 comments

Introduction The purpose of this post is to provide notes and useful tips on Apache Tika. A lot of the content in this post comes from Chris Mattmann and Jukka Zitting’s Tika in Action – so really credit goes to them and their book. Notes Introduction to Tika Tika strives to offer the necessary functionality required for…

Read more →