The newest feature in the software I’m developing at Just Software is a workstream, which is our aggregate name for microblog and activity stream functionality. Since we’re using a relational DB we’ve modeled this module in the way that every workstream message is attached to a user’s stream in a simple m:n relation manner. To reduce the number of INSERT statements the list of streams is assembled in the business service and then handed over to the database service which does sth. like this:
INSERT INTO workstream(msgid, userid) VALUES(5, 1), VALUES(5, 2), ...;
This however can result in a huge amount of VALUES(…) parameters when the platform has many users. It didn’t occur as a problem to me when implementing it; PostgreSQL can handle large data, no? Well, it’s at least limited. When you use a JDBC prepared statement then the number of parameters has a hard limit: 32767. If you don’t obey this limit you’ll see something like this:
java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 40000
Not a very concrete message, right? When I first saw this exception I was kind of baffled and disaffected. But after narrowing down the problem in a debug session and looking at the PostgreSQL JDBC driver’s source code the cause was obvious: the PostgreSQL client/backend protocol dictates that the number of parameters be send from the client to the Postgres backend as a 2 byte integer (aaah, now the above message actually makes sense). You’ll find details of the protocol here if you’re brave (my 2-byte friend is defined in the Parse message).
All right, got the bugger, what’s the solution now? Easy: Just split the parameter list up into chunks of say 30.000 parameters per INSERT statement.