Home > Default > Child process dies, nfs locks not released, webserver hangs...

Child process dies, nfs locks not released, webserver hangs...

December 10Hits:3
Advertisement
Hi,
I have Sun One 6.1 sp 11 on a solaris 10 ldom.
The server is configured to write logs access and error to /logs which is an NFS mount to a separate solaris 10 box. The logging to an NFS mount is a business requirement.
Sun JWS is configured to have two httpd processes and the watchdog to restart them if one should fail.
Every now and then, about once a day (it varies), one of the child processes will die with messages like this in the error log: (1949 is the wdog pid)
[09/Dec/2009:14:19:06] failure ( 1949): CORE3107: Child process closed admin channel
[09/Dec/2009:14:19:06] fine ( 1949): CORE3061: signal_handler_thread: received signal 18
[09/Dec/2009:14:19:06] fine ( 1949): CORE3049: Primordial process detected child 1950 died: status 37
[09/Dec/2009:14:19:06] fine ( 1949): CORE3050: Is our child, will spawn replacement
[09/Dec/2009:14:19:06] fine ( 1949): CORE3062: Unlinking of /tmp/https-wv2-819e4c2d/.cgistub_1950 returned -1
[09/Dec/2009:14:19:06] fine ( 1949): CORE3047: Server spawned worker process 2011
[09/Dec/2009:14:19:06] fine ( 2011): HTTP5169: User authentication cache entries expire in 120 seconds.
[09/Dec/2009:14:19:06] fine ( 2011): HTTP5170: User authentication cache holds 200 users
[09/Dec/2009:14:19:06] fine ( 2011): HTTP5171: Up to 4 groups are cached for each cached user.
[09/Dec/2009:14:19:06] fine ( 2011): HTTP4207: file cache module initialized (API versions 2 through 2)
[09/Dec/2009:14:19:06] fine ( 2011): HTTP4302: file cache has been initialized
[09/Dec/2009:14:19:06] fine ( 2011): HTTP3066: MaxKeepAliveConnections set to 256
[09/Dec/2009:14:19:06] fine ( 2011): Installed configuration 1
[09/Dec/2009:14:19:06] fine ( 2011): HTTP4193: flex-rotate-init: rotate start time is 0h, 0m
At this point the webserver will not respond. The processes (2*httpd, 1*wdog) are running but do not respond. The access log shows a weird lock with output from pfiles:
21: S_IFREG mode:0777 dev:340,10 ino:34988 uid:111 gid:102 size:0
O_RDWR|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
advisory write lock set by system 0x2 process 280
which I think means the new http process is waiting for the lock to be released, but the lock is never freed.
But what I'm really curious about is why the process is dying in the first place. Anyone seen "status 37" before, or know where I can look it up? I couln't google up any reference on what it might mean...
any help appreciated
cheers
Kristin.

Answers

I found the following in http://docs.sun.com/app/docs/doc/816-4555/rfsrefer-134?l=ja&a=view :
In this situation, the SIGLOST signal is posted to the process. The default action for the SIGLOST signal is to terminate the process.
For you to recover from this state, you must restart any applications that had files open at the time of the failure. Note that the following can occur.
- Some processes that did not reopen the file could receive I/O errors.
- Other processes that did reopen the file, or performed the open operation after the recovery failure, are able to access the file without any problems.
Thus, some processes can access a particular file while other processes cannot.
Edited by: Arvind_Srinivasan on Dec 10, 2009 12:33 AM

Read other 5 answers

Tags:

Related Articles

  • Child process dies, nfs locks not released, webserver hangs...December 10

    Hi, I have Sun One 6.1 sp 11 on a solaris 10 ldom. The server is configured to write logs access and error to /logs which is an NFS mount to a separate solaris 10 box. The logging to an NFS mount is a business requirement. Sun JWS is configured to ha

  • "CORE3107: Child process closed admin channel" in sun webserver 6.1November 30

    Hi, Does anyone know of the following error in the Sun Java Webserver 6.1 logs: =========================================== [08/Mar/2010:20:41:12] failure (27868): CORE3107: Child process closed admin channel [08/Mar/2010:20:41:12] info (13932): HTTP

  • Coldfusion 10 Enterprise with Tomcat + mod_jk and Apache2 experiencing child process hangupsOctober 11

    I am experiencing the most bizarre thing that so far I am unable to reproduce with my own visits to the site. After restarting Apache2 my cacti graphs show that the child processes increment consistently over the course of a day without dropping back

  • Apache POST flex2gateway never closes or times out, reaches max child processesNovember 30

    We have been trying to pass an external PCI scan, and noticed some server lockups after starting a scan.  We are scanning a couple hundred IP addresses, which all resolve to the same servers.  The scans are actively looking for vulnerabilities on the

  • BPEL child process issueNovember 30

    Problem Description: Parent process invoking more than 10 concurrent child processes with non-blocking invoke = true. All the child processes are not invoked at the same time. Some of the child processes are waiting for others to complete and then in

  • Signal for non-child process deathNovember 30

    I am porting an NT system to Solaris. One process (HM) is responsible for starting groups of server processes, monitoring for death of a process, stopping/restarting/recovering the group. I know how to port this using fork/exec to start processes and

  • SFTP related processes do not die if parent process dies.November 30

    Java Version: 1.4.2_05 JRE: build 1.4.2_05-b04. Running on Solaris 9. Problem: If the process using the SFTP API dies, the children processes created by the SFTP API do not end. A side effect of this is that all the sockets the orginal parent had rem

  • SunMC - Process is forking and reaping child processes. What's that?November 30

    Hey folks, Im really new to the sysadmin world, and I think maybe my company really didn't think things well when they've decided to put me doing this, hehehe. I work with a general queue for which my team receives tickets with different kind of prob

  • What is "child processes" if using OS Fetchlet with "em_metric_timeout"?November 30

    On page 330 of document <Oracle® Enterprise Manager Extensibility Guide 10g Release 5 (10.2.0.5) B40007-02>, there are a description of using "em_metric_timeout" with OS Fetchlet as below, Parameter : em_metric_timeout Type : integer Descr

  • Anyone know how to keep a child Process from closing when the main Applicaiton is closedNovember 30

      I have a Web based application that needs to use an older version of Java to run it properly . I have been able to sequence this and use a shortcut to call on Iexplorer.exe to open the browser in bubble and have the old version of Java run in the s

Copyright (C) 2019 wisumpire.com, All Rights Reserved. webmaster#wisumpire.com 14 q. 0.353 s.