Programming
Efficient Code
Loops
and Locks
Article by Sachin Mehra (sachinweb@hotmail.com)
One of the
worst things that a programmer can assume is that the compiler and middleware
will do the optimizations for you! Most applications are being targeted for
50-200 concurrent users, which is why we need to constantly be worrying about
the performance of our code.
Say you
have a search component (which will be on everybody’s desktop) which takes 3-5
seconds to load; and consequently ties down the database. Imagine what
will happen when 200 people try to load this at the same time. Simple
math would suggest (say using an average of 4 seconds): (4 x 200) / 60 = 13 –
THAT’S 13 MINUTES! And actually, when dealing with situations of high
contention, you cannot assume 100% efficiency and could be realistically
dealing with something in the range of 20-25 minutes of processing time
required.
There is no
‘silver bullet’ to making fast and efficient code. Middleware will not
solve the problem for you, databases will not solve the problem for you, it is
up to you as a computational process engineer (how’s that for a title?) to
understand and deal with the underlying inefficiencies in the software you
design. There are many things which you need to consider, and you need to
think your logic out carefully. One thing you should always be asking
yourself is “could this be done better?”.
In
programming, we find ourselves in loops a lot. In Java, we especially
find ourselves looping through Collection
objects an awful lot. This is one of the particular areas where many of
us need some improvement. When you use a Collection object, how do you decide what type of Collection to use, and how to apply it?
It seems to me that most Java Programmers are just using “whatever works”, and
they use the one which they “believe” to be the fastest. The fact of the
matter is, different types of collections are for different kinds of
applications. Do you truly know the differences between Vector and ArrayList. The most common misconception I have
heard is that a Vector
will automatically grow
in size, and an ArrayList will not, this is simply not
true. The only real difference is that Vector is thread-safe and ArrayList is not. And what does this mean?
Being
thread-safe is not always a good thing. When something is thread safe, it
means that the runtime must maintain locks on certain objects, when they are
being accessed to prevent concurrent modification. In many cases, this
additional check is unnecessary and very costly to performance. On the
other hand there are situations where it is very necessary to do thread-safe
operations. Many people understand the jist of synchronization, but don’t
truly understand how to take advantage of it properly. One thing I
have see people doing a lot is applying synchronized in places where they should not. Consider the following:
Example
1A:
synchronized void addUser(User user) {
this.list.add(user);
}
Another
common misconception is that the synchronized keyword will only protect that particular method. If
you think this, you should read on. This will however, effectively only
allow the instance of list to be accessed by only one Thread
at a time. But by doing this, you force the runtime to place a lock on the
entire object pool of the class instance, which essentially means, any instance
methods cannot be executed during the execution of addUser(). In most cases, this is
inefficient. Other threads may need access to other non-effected
items.
The
following example addresses this problem.
Example
2A:
void adduUser(User user) {
synchronized (this.list) {
this.list.add(user)
}
}
In this
example, we only lock the instance of list for the duration of the add() execution. This is much more efficient than Example
1A.
Now what
does this have to do with picking ArrayList or Vector? Well a lot really. In
instances where we are dealing with temporary sets of data or method-scoped
instances, using a Vector is very inefficient. In
situations where there is no chance of their being concurrent access, you
should most certainly choose an ArrayList. For using a Vector would serve absolutely no useful purpose, and would provide
unnecessary lock-checking. We’ll leave hashed-collections for later :).
Loop Iteration and Tail Recursion
As we said
earlier, our programs spend a lot of time in loops, and unfortunately loose a
lot of their performance in them as well. I will try to cover a few
pointers which may help you in certain situations shave some unnecessary
computational cycles off you’re code.
People tend
to think from beginning to end, and they tend to program in this forward
lineage as well. But this can often be inefficient. Sometimes the
computer can find its way from the end to the beginning much faster.
Consider
the code in Example 1.
Example
1B:
for (int i = 0; i < arrayList.size(); i++) {
Object obj = (Object) object.get(i);
obj.doSomething();
}
This is a
fairly straight-forward for-loop to iterate that iterates through an entire
collection to do something. But consider Example 2:
Example
2B:
for (int i = arrayList.size(); i != 0; i--) {
Object obj = (Object) object.get(i);
Obj.doSomething();
}
This
example is many times more efficient than Example 1. In example one, we
are making a call to arrayList.size() for every iteration through the
loop which is unnecessary, and also we are doing a direct XAND comparison to
determine if the loop should continue which is also more efficient. By
looping backwards through the ArrayList
we manage to increase processing efficiency but 50% or more!
Another
magical method to performing ultra-efficient loops has been long-since
forgotten. Yes, I am talking about “tail recursion”. This is one of
the best ways to do mathematical sums on large lists. It also works brilliantly
with Java’s Iterator and Enumeration interfaces. Consider the following example:
Example
1C:
public int getRecordsSum(Iterator iter) {
return _getRecordsSum(iter, 1);
}
public int _getRecordsSum(Iterator iter, int counter) {
if (iter.hasNext() {
return _getRecordsSum(iter, counter + ((Integer)i.next()).intValue());
}
else {
return counter;
}
}
Now for
those of you who are keen, you might be thinking StackOverflowException here. But actually, the
compiler will see the optimization opportunity here, just as C and C++
compilers will. The compiler will pick up on the tail recursion based on
the fact that _getRecordsSum() contains no method variables, and
is passing references back into itself. Therefore, this will not cause a
run-away stack, but rather a very efficient way of processing numbers.
Final Words
Programming
is all about problem solving. And as with other kinds of problem solving,
there are always many different ways to solve the problem. However, some
ways are more certainly better than others. You should take the time to
understand how the underlying components you are using actually work, and why
they work they way they do.
Article
By Sachin Mehra (sachinweb@hotmail.com)