0

I making a tuning for our production servers for a portal, we have 4 servers, 2 for web and 2 for app, and there is a firewall before and after web servers (so yes there is a firewall between app and web servers) the issue here started from dropping idle connections between app servers and web servers by firewall, tried with a lot of solutions and now seemed that issue moved from stuck broken connections that was in app because dropping from firewall, this issue was happens when I have low load to portal, and to solve it I need to restart all app servers, now I have issue with high load days instead, and urgent solution was simply a quick restart Apache web servers, how to solve this issue.

I made changes by helping of Jboss loadbalancing configuration generator : http://lbconfig.appspot.com/?lb=mod_jk&mjv=1.2.28&nca=64&ncj=64&nai=2&nji=2&njips=6&f=true&c=false&lr=false&lrl=&mpm=Prefork

And monitoring connections in both servers using netstat command and with google analytics Real Time overview, I got the following stats with ~ 40 visitors after 3 days of last restart:

Web side (2 servers but connections her "for each" not total):

ESTABLISHED ~700 - 750
TIME_WAIT: 100-200 (big jumbs for one second 150 another 200 another 170 and then 120 and so)

App Side (here I counted all connections, most of them ESTABLISHED and few CLOSE_WAIT 0 - 5 each time I check):

S1 (4 instances running) : 900-950
S2 (5 instances running) : 1000-1100

Servers details :

  • On web 2x servers: Apache 2.2.14 / mod_jk 1.2.37
  • on app 2x servers: Clustered Glassfish 2.1.1 with ajp13 (6 instances / each server)
  • All servers Solaris SPARC 64 V-CPUs 32GB ram.

My configurations : Mostly like the generator gave me (u can see link) :

httpd.conf:

KeepAlive On
ServerLimit         12800
StartServers        5
MinSpareServers     5
MaxSpareServers     20
MaxClients          12800
MaxRequestsPerChild 5000

ExtendedStatus Off

worker.properties:

worker.maintain=30
worker.template.type=ajp13
worker.template.session_cookie=JSESSIONID
worker.template.lbfactor=1
worker.template.ping_timeout=10000
worker.template.connection_pool_timeout=10
worker.template.socket_keepalive=True
worker.template.socket_timeout=600
worker.template.connect_timeout=10000
worker.template.prepost_timeout=10000
worker.template.connection_ping_interval=20
worker.template.ping_mode=A
worker.template.socket_connect_timeout=600000

From glassfish side time-outs 10 seconds from cluster configuration side, I have:

HTTP service property :

  • connectionTimeout= 10000

Request Processing:

  • Thread Count: 2133
  • Initial Thread Count : 20
  • Thread Increment : 10

Keep Alive (enabled):

  • Thread Count: 400
  • Max Connections 256
  • Time out : 10 seconds

Connection Pool:

  • Max Pending Count 4096 connections

So:

  • So Is my configurations is correct ?
  • How to solve high number of established connections or its safe?, I don't want down time again for apache if got high load again.

1 Answers1

0

regarding mod_jk / mod_ajp: we used this is a slightly bigger setup and stumbled upon bugs and errors every then and there, connections getting dropped, but never found a real solution to any of our problems (but we found some bugs, that still exists)

my advise: make an alternate setup and perf-tests: mod_jk vs proxy_http and if proxy_http is within acceptable ranges, skip mod_jk. i did this in 2 different setups now (and, additionally, are able to replace apache with nginx -> BIG WIN) and do not regret it.

pros

  • easier to debug
  • more variety of possible lb/frontend gateways (haproxy, nginx, varnish)
  • less heisenbugs

cons

  • didnt found some