
(Un)safe English locales by 0x000042
In a classical three-tier architecture (database, application-server, client), a choice will always have to be made: Should the database and the application-server reside on separate servers, or co-located on a shared server?
Often, I see recommendations for separation, with vague claims about performance improvements. But I assert that the main argument for separation is bureaucratic (license-related), and that separation may well hurt performance. Sure, if you separate the application and the database on separate servers, you gain an easy scale-out effect, but you also end up with a less efficient communication path.
If a database and an application is running on the same operating system instance, you may use very efficient communication channels. With DB2, for example, you may use shared-memory based inter-process communication (IPC) when the application and the database are co-located. If they are on separate servers, TCP must be used. TCP is a nice protocol, offering reliable delivery, congestion control, etc, but it also entails several protocol layers, each contributing overhead.
But let’s try to quantify the difference, focusing as closely on the query round-trips as possible.
I wrote a little Java program, LatencyTester, which I used to measure differences between a number of database<>application setups. Java was chosen because it’s a compiled, statically typed language; that way, the test-program should have as little internal execution overhead as possible. This can be important: I have sometimes written database benchmark programs in Python (which is a much nicer programming experience), but as Python can be rather inefficient, I ended up benchmarking Python, instead of the database.
The program connects to a DB2 database in one of two ways:
- If you specify a servername and a username, it will communicate using “DRDA” over TCP
- If you leave out servername and username it will use a local, shared memory based channel.
After connecting to the database, the program issues 10000 queries which shouldn’t result in I/Os, because no data in the database system is referenced. The timer starts after connection setup, just before the first query; it stops immediately after the last query.
The application issues statements like VALUES(…) where … is a value set by the application. Note the lack of