Terracotta Java heap space

Wessel, Keith William kwessel at illinois.edu
Tue Nov 20 01:57:08 EST 2012

Hi, all,

I've got a (hopefully) very simple puzzle related to my Terracotta clustering and am looking for some wisdom from others here. Every couple of months when we have a network glitch between our two datacenters and thus a Terracotta hiccup, we run into a heap space issue. The network glitches that happen during the day aren't a problem; we run the run-dgc-if-active cron job as recommended on the Shib wiki at 2:30 AM when activity is down on our IDP cluster. The problem occurs when we have the hiccup after a busy day but before this cron job runs - like after 11:30 PM.

When the passive node again figures out it should be passize, it restarts to wipe its database then attempts to re-initialize. It enters passive-unitialized then, a few minutes later, results in an out-of-memory exception because it's used up all 1 GB of its heap space. My solution for now has been to manually run the run-dgc-if-active script on the active node then restart Terracotta on the passive node which always works. So, it's definitely an objectdb size issue from the busy day.

I have three solutions in mind, but I'm not sure which is the best way to go.

1.      We're still on Terracotta 3.2.1. I plan to upgrade us during winter break to the latest version. Since this only happens every couple months, we'll just hope for the best for the next few weeks. But if the newer version makes no difference here, and I doubt it will, I might as well go ahead with 2 or 3.

2.      I could run the run-dgc-if-active script more often. This seems unwise, though, as I don't want lengthy pauses in my IDP activity during the day if, say, I run this every six or twelve hours.

3.      The computer scientist in me says to just increase the heap space to 2 GB or even 1.5. Since this is hard coded in Terracotta's start-tc-server.sh, I'd have to remember to adjust this on upgrades, and the fact that there's no external way to change it makes me think it shouldn't need to be changed. It would most likely fix the problem or at least put off the problem until the DB gets really big, but would it introduce other issues caused by a very large heap space needing to be searched?
I'm sure others here have encountered this. What's the best way to go?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shibboleth.net/pipermail/users/attachments/20121120/34e42b12/attachment-0001.html 

More information about the users mailing list