Posts:
7
From:
US
Registered:
5/30/07
|
|
|
|
RGM Question - Forcing a resource offline after repeated failures
Posted:
Oct 6, 2009 9:19 AM
To: Communities » ha-clusters » discuss
Cc: OpenSolaris » help
|
|
Hi All,
I have a RGM question. Been some time and I can't exactly remember how best to do this..
On a single node cluster, I have a resource. What I want to achieve effectively is that if the application monitored by the resource is not able to stay online (after the designated retry counts), then Sun Cluster offline the resource, rather than leaving it online in a faulted state and quitting to probe it. The problem that I am facing if the resource stays online is that a dependent resource doesn't go offline when this failure happens.
Am I missing an easy way to do that ?
Sorry for the trouble and thanks in advance.. - Shubho.
|
|
|
Posts:
14
From:
Registered:
6/25/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after repeated failures
Posted:
Oct 6, 2009 9:44 AM
in response to: deepvrce
|
|
Hi Subhadeep,
I think you can use offline restart dependency for this. See this blog for details.
http://blogs.sun.com/SC/entry/disabling_a_depended_on_resource
thanks -venkat
On 10/06/09 11:19, Subhadeep Sinha wrote: > Hi All, > > I have a RGM question. Been some time and I can't exactly remember how best to do this.. > > On a single node cluster, I have a resource. What I want to achieve effectively is that if the application monitored by the resource is not able to stay online (after the designated retry counts), then Sun Cluster offline the resource, rather than leaving it online in a faulted state and quitting to probe it. The problem that I am facing if the resource stays online is that a dependent resource doesn't go offline when this failure happens. > > Am I missing an easy way to do that ? > > Sorry for the trouble and thanks in advance.. > - Shubho. _______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
7
From:
US
Registered:
5/30/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline after
Posted:
Oct 6, 2009 10:02 AM
in response to: vchennu
To: Communities » ha-clusters » discuss
|
|
Hi Venkat,
I am using offline restart dependency right now. But I guess since the faulty resource is not put in an offline state (its left online in a faulted state and probing is stopped on it) after the retry counts expire, the dependent resource also doesn't go offline and stays online.
Thanks, - Shubho.
|
|
|
|
Posts:
14
From:
Registered:
6/25/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after
Posted:
Oct 6, 2009 11:02 AM
in response to: deepvrce
|
|
Hi Subhadeep,
Sorry. I did not realize that you were using this already.
thanks -venkat
On 10/06/09 12:02, Subhadeep Sinha wrote: > Hi Venkat, > > I am using offline restart dependency right now. But I guess since the faulty resource is not put in an offline state (its left online in a faulted state and probing is stopped on it) after the retry counts expire, the dependent resource also doesn't go offline and stays online. > > Thanks, > - Shubho. _______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
75
From:
Registered:
3/9/05
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after repeated failures
Posted:
Oct 6, 2009 10:45 AM
in response to: deepvrce
|
|
Subhadeep Sinha wrote: > Hi All, > > I have a RGM question. Been some time and I can't exactly remember how best to do this.. > > On a single node cluster, I have a resource. What I want to achieve effectively is that if the application monitored by the resource is not able to stay online (after the designated retry counts), then Sun Cluster offline the resource, rather than leaving it online in a faulted state and quitting to probe it. The problem that I am facing if the resource stays online is that a dependent resource doesn't go offline when this failure happens. > > Am I missing an easy way to do that ?
Hmmm... i thought what you want above actually is the DEFAULT out-of-the-box behaviour for SC, no tweaking needed. Because that is the "most sensible thing" (tm) to do. :-)
If a resource fails to start, it gets into START_FAILED state. Which is what you want, right? Unless you have tweak things up, that is what should happen.
Am i missing something obvious?
-ashu
> > Sorry for the trouble and thanks in advance.. > - Shubho.
_______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
7
From:
US
Registered:
5/30/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline after
Posted:
Oct 6, 2009 11:49 AM
in response to: ashu
To: Communities » ha-clusters » discuss
|
|
Hey Ashu,
In my case the resource doesn't fail to start ever. The probe fails due to some rough weather the application runs into after it is started and then SC restarts the resource for retry_count times. It is then that the resource enters a faulted/unmonitored state where probing is stopped. But the final state of the resource is still "online".
Thanks, - Shubho.
|
|
|
|
Posts:
267
From:
US
Registered:
11/22/06
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after
Posted:
Oct 6, 2009 12:37 PM
in response to: deepvrce
|
|
Subhadeep Sinha wrote: > Hey Ashu, > > In my case the resource doesn't fail to start ever. The probe fails due to some rough weather the application runs into after it is started and then SC restarts the resource for retry_count times. It is then that the resource enters a faulted/unmonitored state where probing is stopped. But the final state of the resource is still "online". >
Shubo,
What is the value of the Failover_mode property for the resource?
Thanks, Nick
_______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
7
From:
US
Registered:
5/30/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline after
Posted:
Oct 6, 2009 1:14 PM
in response to: nsolter
To: Communities » ha-clusters » discuss
|
|
Hey Nick,
I have tried SOFT, HARD, NONE. Anything else that I should set it to ?
Thanks, - Shubho.
|
|
|
|
Posts:
75
From:
Registered:
3/9/05
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after
Posted:
Oct 6, 2009 1:41 PM
in response to: deepvrce
|
|
Hi Subhadeep,
I see. Strange situation indeed, sounds like either a badly designed probe or a badly behaving app or both.
The strange part being that starting the application is always OK, even though SC (rather the Agents built by Agent builder) typically would probe the app at start time as well to make sure it really did start. It must be related to load, as you suggested.
Badly designed app/probe or not, it would be good to have a knob in SC which can get you the behavior of getting the application into an errored/offline state if it keeps failing after successfully starting.
I do not believe that today we have such a knob in SC, but perhaps someone else on the list would prove be happily wrong.
-ashu
Subhadeep Sinha wrote: > Hey Ashu, > > In my case the resource doesn't fail to start ever. The probe fails due to some rough weather the application runs into after it is started and then SC restarts the resource for retry_count times. It is then that the resource enters a faulted/unmonitored state where probing is stopped. But the final state of the resource is still "online". > > Thanks, > - Shubho. _______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
18
From:
US
Registered:
7/23/08
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline
after
Posted:
Oct 6, 2009 2:00 PM
in response to: ashu
|
|
Hi Shubho,
You can't just set a knob to make this happen. However, if you are able to modify the code of the resource monitor, you could execute a call to scha_control with the RESOURCE_DISABLE tag when this case occurs. This will stop the monitor and also stops the resource itself, and persistently disables the resource (same as executing "clrs disable" command). The cluster administrator can re-enable the resource when the problem is fixed.
This scha_control RESOURCE_DISABLE tag is currently used in the ScalDeviceGroup and ScalMountPoint resource monitors, when an unrecoverable error occurs. Usually these resource types have offline-restart dependents, so disabling them takes the dependent offline too. I think this is the sort of thing you're trying to do.
This is documented on the scha_control(1HA) or (3HA) man page.
--Marty
On 10/ 6/09 01:41 PM, Ashutosh Tripathi wrote: > Hi Subhadeep, > > I see. Strange situation indeed, sounds like either a > badly designed probe or a badly behaving app or both. > > The strange part being that starting the application is > always OK, even though SC (rather the Agents built by Agent > builder) typically would probe the app at start time as well > to make sure it really did start. It must be related to load, > as you suggested. > > Badly designed app/probe or not, it would be good to have > a knob in SC which can get you the behavior of getting the > application into an errored/offline state if it keeps failing > after successfully starting. > > I do not believe that today we have such a knob in SC, > but perhaps someone else on the list would prove be happily wrong. > > -ashu > > > Subhadeep Sinha wrote: >> Hey Ashu, >> >> In my case the resource doesn't fail to start ever. The probe fails >> due to some rough weather the application runs into after it is >> started and then SC restarts the resource for retry_count times. It >> is then that the resource enters a faulted/unmonitored state where >> probing is stopped. But the final state of the resource is still >> "online". >> >> Thanks, >> - Shubho. > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss _______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
|
|
|
|
Posts:
7
From:
US
Registered:
5/30/07
|
|
|
|
Re: [ha-clusters-discuss] RGM Question - Forcing a resource offline after
Posted:
Nov 4, 2009 3:27 PM
in response to: mrattner
To: Communities » ha-clusters » discuss
|
|
Thanks Marty ! That worked perfectly fine.
- Shubho.
|
|
|
|
|