Skip to content
Dec 28 12

Common Java Networking Exceptions

by Vaibhav Aggarwal
Exception Explanation
BindException Signals that an error occurred while attempting to bind a socket to a local address and port. Typically, the port is in use, or the requested local address could not be assigned.
ClosedChannelException Checked exception thrown when an attempt is made to invoke or complete an I/O operation upon channel that is closed, or at least closed to that operation. That this exception is thrown does not necessarily imply that the channel is completely closed. A socket channel whose write half has been shut down, for example, may still be open for reading.
ConnectException Signals that an error occurred while attempting to connect a socket to a remote address and port. Typically, the connection was refused remotely (e.g., no process is listening on the remote address/port) or connection timed out.
InterruptedIOException Signals that an I/O operation has been interrupted. An InterruptedIOException is thrown to indicate that an input or output transfer has been terminated because the thread performing it was interrupted. The field bytesTransferred indicates how many bytes were successfully transferred before the interruption occurred.
NoRouteToHostException Signals that an error occurred while attempting to connect a socket to a remote address and port. Typically, the remote host cannot be reached because of an intervening firewall, or if an intermediate router is down.
PortUnreachableException Signals that an ICMP Port Unreachable message has been received on a connected datagram.
ProtocolException Thrown to indicate that there is an error in the underlying protocol, such as a TCP error.
SocketException Thrown to indicate that there is an error in the underlying protocol, such as a TCP error.
SocketTimeoutException Signals that a timeout has occurred on a socket read or accept.
SSLException Indicates some kind of error detected by an SSL subsystem. This class is the general class of exceptions produced by failed SSL-related operations.
UnknownHostException Thrown to indicate that the IP address of a host could not be determined.
UnknownServiceException Thrown to indicate that an unknown service exception has occurred. Either the MIME type returned by a URL connection does not make sense, or the application is attempting to write to a read-only URL connection.

 

Dec 21 12

AWS Data Pipelline – Our service launch

by Vaibhav Aggarwal

http://aws.amazon.com/datapipeline/

The public launch AWS DataPipeline: http://channelnomics.com/2012/12/21/aws-data-pipeline-high-storage-instances/

http://aws.typepad.com/aws/2012/11/the-new-amazon-data-pipeline.html

Nov 30 12

Amazon Data Pipeline – Pre launch

by Vaibhav Aggarwal

Finally I can talk about Awesome Data Pipeline service.

http://gigaom.com/cloud/amazons-super-duper-data-pipeline-is-now-ready-for-its-close-up/

http://techcrunch.com/2012/11/29/amazon-web-services-launches-data-pipeline-an-orchestration-
service-for-data-driven-workflows/

 

 

Nov 29 12

Issues with Google Guice

by Vaibhav Aggarwal
  1. It is impossible to track who is creating an instance of an object
  2. With child injectors it is even harder to track the origin of an object as you might be fooled into believing that the top level injector is doing all the injections
  3. Guice makes it very hard (if not impossible) to specify two different bindings for a class in the same scope. You cannot use two different Gson with different type adapters
  4. There are some classes which create their own injectors (different from child injectors) and hence break the injection chain completely at an arbitrary spot in code
Apr 18 12

A great link to Java NIO

by Vaibhav Aggarwal

http://javanio.info/filearea/nioserver/NIOServerMark2.pdf

Mar 5 12

NextGen MapReduce

by Vaibhav Aggarwal

Hadoop 0.23 delivers the new version of MapReduce called YARN. It has some significant differences from old MapReduce.

1. Introduction of Resource Manager responsible for managing and assigning global cluster resources.

2. Introduction of per application Application Master. The application master interacts with resource manager to request compute resources.

3. Introduction of Node Manager responsible for managing user processes per node.

In earlier version of Hadoop the JobTracker was closely tied to MapReduce framework. It was responsible for both resource management and application management. JobTracker would allow running Hadoop MapReduce jobs only. The new resource manager allows running other services like MPI within the same cluster via Application Master.

In old Hadoop Map and reduce slots could not be interchangeably used. This would mean that the cluster would be largely underutilized during a pure map or reduce phase. In the newer avatar of Hadoop slots can be reused there is much better resource utilization.

 

Feb 17 12

Design patterns

by Vaibhav Aggarwal

Adapter pattern

An adapter patter is a pattern that exports a different interface of an object than the original object implementation. It allows an object to become compatible to a different interface expected by the calling object.

Proxy pattern

A proxy pattern is used to control access to a particular object, delegate the call to remote methods from the client which is seamless to the calling object, and do lazy loading.

Decorator pattern

A decorator pattern is used to add functionality to an object like providing more logging information for each method or collecting metrics.

Feb 9 12

Dynamic Proxy in Java

by Vaibhav Aggarwal

Dynamic proxy is a part of the java.lang.reflect package which was added to the JDK in version 1.3. It allows programs to create proxy objects, which can implement one or more known interfaces and dispatch calls to interface methods programmatically using reflection instead of using the built-in virtual method dispatch. It can be used to create a proxy object which implements a list of interfaces specified at runtime. It intercepts method calls and adds functionality dynamically. Difference between dynamic proxy objects and other proxy objects is that dynamic proxy dispatches call to interface methods reflectively instead of relying on virtual method dispatch. This makes dynamic proxy classes lot more re-usable without writing boilerplate code for each implemented interface.

They are particularly useful in the following:

1. Implementing common logging class.
2. Implementing metrics collection class.
3. Implementing access control to certain objects like your database access objects.
4. It is easy to add more methods to the original class without having to modify the proxy class.
5. Lazy loading of expensive objects.
6. Restricting access to the functionality of an object as the caller cannot upcast the object. So you can use this pattern to limit a caller to calling getter methods only which the original object might have setter methods also.

It is also used in remote method invocation to make a remote object appear available locally.

Some nice articles about Dynamic Proxy in java:

http://docs.oracle.com/javase/1.3/docs/guide/reflection/proxy.html
http://userpages.umbc.edu/~tarr/dp/lectures/DynProxies-2pp.pdf
http://www.ibm.com/developerworks/java/library/j-jtp08305/index.html

Jan 7 12

Pig Over Hive

by admin

Hive is one of my favorite tools to crunch data using Hadoop. Its SQL like interface makes it really easy to get started. It supports many many input file formats. It is pretty robust and has excellent features like dynamic partitions, filter pushdown etc.

Pig is another tool you can use to run analytics on Hadoop. I never quite knew why one would ever use Pig over Hive. I recently got a chance to explore Pig and I used this opportunity to find reasons which favor Pig over Hive. I did manage to put together a small list:

  1. Pig allows committing data at arbitrary points in script. Unless and until you call store or dump, it will not process data. Hive processes data and produces output for every query executed. The output is either stored or streamed to stdout.
  2. Pig supports unstructured/semi-structured data. Hive always requires imposing a schema on the input.
  3. For a series of steps which form an ETL process Pig’s procedural syntax looks cleaner than Hive’s declarative syntax. It becomes complex to express ETL processes as either series of Hive queries or one huge composite query.
Jan 4 12

Differences between Ivy and Maven

by Vaibhav Aggarwal

I have been reading quite a few documents/blog posts listing differences between Maven and Ivy and I could not find the answer in one place.

My understanding is:

1. Ivy is a dependency management tool while Maven is build + dependency management tool. It seems that there also exists something called ant targets for Maven which allows us to use Maven with Ant.

2. Maven imposes a fixed structure on the project. Maven also has a fixed set of scopes based on which dependencies are defined. Ivy on the other hand does not require a structure. It allows users to use arbitrary code layout and scope. Ivy is based on module configuration which defines a module and its dependencies.

3. It is easier to specify exclude dependencies in Ivy as compared to Maven.

4. Ant gives you lot more control and flexibility than Maven. This means that different projects in an enterprise can have different structures and targets which often leads to confusion. Maven has a standard way of downloading dependencies, executing tests, generating javadocs. When using Maven it is easier to migrate from one project to another.