Using Solaris SMF | O'Reilly Media

Solaris SMF是Sun为解决传统Unix环境下启动脚本独立、缺乏错误检查和故障恢复能力的问题而引入的系统管理设施。它提供了一种框架,用于系统启动、进程管理和自我修复。SMF服务具有持久性,能够处理系统或用户请求,并依赖于FMRI进行标识和管理。服务启动依赖于多个因素,如本地文件系统挂载、网络服务运行等。通过并行启动服务并确保依赖关系满足,SMF加速了启动过程,减少了服务级失败的可能性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Using Solaris SMF

by Chris Josephes
04/13/2006

In most Unix environments, the startup process consists of a handful of autonomous boot scripts. They act independently of one another; unaware of what scripts have already run or which ones will run after them. When they are invoked, there is no serious error checking and no recourse if the script fails.

For Solaris 10, Sun introduced the Service Management Facility. SMF is a framework that handles system boot-up, process management, and self-healing. It addresses the shortcomings of startup scripts and creates an infrastructure to manage daemons after the host has booted.

A System V Unix host will start the sendmail daemon with the script S80sendmail from either the /etc/rc2.d or /etc/rc3.d directory. The script contains commands to start or stop sendmail, depending its invocation. The S portion of the filename denotes that this is a startup script, and the 80 is a sequence number that says when the script should run.

When S80sendmail runs, it won't be aware of any previous problems such as a NIS failure or /var never properly mounting. You could write tests into the script, but that increases startup time and the complexity of each script.

In the SMF environment, sendmail is a service. Solaris 10 defines a service as a persistent program that handles system or user requests. Services are expected to be fault tolerant and manageable by the operating system.

Services are identified by a URI known as a Fault Management Resource Identifier. The FMRI is broken up in a category hierarchy to help identify the service and what it is responsible for.

Here is the FMRI for sendmail, ssh, and other services running on a host:

svc:/network/smtp:sendmail
svc:/network/ssh:default
svc:/network/system/filesystem/local:default
lrc:/etc/rc2_d/S99audit

Here is the breakdown of the FMRI structure:

schemaservice nameinstance
category 
svcs:/network/smtp:sendmail
svcs:/network/ssh:default
svcs:/system/filesystem/local:default
lrc:/etc/rc2_dS99audit 

Each service has a manifest that describes the service and its management needs. It lists the service dependencies, the control scripts, and the actions to take when the service fails. The manifest starts out as an XML file that SMF imports into a central repository, which records the properties of all the services.

Sendmail will not run without the following dependencies:

  • Local filesystems are mounted
  • Basic network services are up
  • The host is aware of its domain name
  • The /etc/nsswitch.conf file exists
  • The /etc/mail/sendmail.cf file exists
  • Any nameservices in use (NIS, LDAP) are running
  • The auto filesystem, if in use, is running
  • Syslog, if in use, is running

Services in the SMF environment start up in parallel, but each service will become available only when all its listed dependencies are. This means the host will have a faster boot-up, and it will reduce the chances of a cascading failure of services. There is no explicit order to service startup, so sendmail or its dependencies could start up at any time.

Almost all services under the SMF are controlled by one service known as the restarter. The restarter controls the svc.startd daemon, which in turn starts the other services, tests their dependencies, and restarts them if they fail. When Solaris 10 boots up, svc.startd is one of the first programs spawned from /sbin/init.

It's still possible to use rcN.d scripts under Solaris 10; however, the programs started from these scripts will not be under SMF control. These are referred to as legacy run scripts. They have an FMRI, like normal services do, but the schema prefix is lrc:. Legacy run scripts are not initialized until all SMF services are up and running. When the host shuts down, they are the first stop scripts run before the SMF services are disabled.


Administering SMF

The two most common commands used to administer services are svcs and svcadm. The svcs command reports on the state of configured services, while the svcadm command controls the services.



$ svcs
STATE STIME FMRI
...
legacy_run Sep_22 lrc:/etc/rc2_d/S99audit
...
online Sep_22 svc:/system/svc/restarter:default
online Sep_22 svc:/system/filesystem/autofs:default
online Sep_22 svc:/system/system-log:default
online Sep_22 svc:/network/smtp:sendmail
online Sep_22 svc:/system/filesystem/local:default
online Sep_22 svc:/network/ssh:default
online Sep_22 svc:/system/dumpadm:default
online Sep_22 svc:/network/loopback:default
...

Running svcs without arguments lists all running (online) services. The STATE column reports the service status; the STIME refers to when the service state last changed; and the FMRI identifies the service. If you want to list all services, not just those that are running, use the -a option.

The svcs command can also examine a single service by using either a full or partial FMRI. You can add the -v or -x options for extended output on the service. The -d option will list all the dependencies of a service.

$ svcs svc://localhost/network/ssh:default
STATE STIME FMRI
online Sep_22 svc:/network/ssh:default

$ svcs -v svc:/network/ssh
STATE NSTATE STIME CTID FMRI
online - Sep_22 52 svc:/network/ssh:default

$ svcs -x network/ssh
svc:/network/ssh:default (SSH server)
State: online since Thu Sep 22 07:51:15 2005
See: sshd(1M)
See: /var/svc/log/network-ssh:default.log
Impact: None.

$ svcs -d ssh STATE STIME FMRI online
Sep_22 svc:/network/loopback:default online Sep_22
svc:/network/physical:default online Sep_22
svc:/system/cryptosvc:default online Sep_22
svc:/system/filesystem/local:default online Sep_22
svc:/system/utmp:default online Sep_22
svc:/system/filesystem/autofs:default

You can add the hostname localhost to an FMRI, or you can abbreviate it by removing the instance name and/or the categories. If the abbreviation results in multiple matches, they will all be listed. Here are two services that each have the name local in the last segment of the service name:

$ svcs local
STATE STIME FMRI
online Sep_22 svc:/system/device/local:default
online Sep_22 svc:/system/filesystem/local:default

You can also perform basic glob matching on service names:

$ svcs "*network*"
STATE STIME FMRI
disabled Sep_22 svc:/network/rpc/keyserv:default
disabled Sep_22 svc:/network/rpc/nisplus:default
disabled Sep_22 svc:/network/nis/client:default
.....
online Sep_22 svc:/network/nfs/client:default
online Sep_22 svc:/network/security/ktkt_warn:default
online Sep_22 svc:/network/telnet:default
online Sep_22 svc:/network/nfs/rquota:default
$

Services can manage a running process or an OS state. By using the -p option with svcs, you can identify the processes associated with a service.

$ svcs -p svc:/network/ssh
STATE STIME FMRI
online Sep_22 svc:/network/ssh:default
Sep_22 345 sshd

The time the process started is listed under the STIME column.

In some cases, services do not have running processes associated with them. Tasks such as bringing a network interface up or mounting a disk partition do not require continuously running processes. The svc:/system/filesyste/local:default service runs the mount command once to mount all local filesystems, and then the script exits. SMF refers to these as transient services.

$ svcs -p svc:/system/filesystem/local:default
STATE STIME FMRI
online Sep_22 svc:/system/filesystem/local:default

Finally, there are services that have running processes only when they are in use. When Sun designed the Service Management Framework, it merged the behavior of inetd and the way it handles network daemons. All the daemons that previously appeared in the /etc/inetd.conf file are now SMF-managed services. The difference is that these services use the inetd daemon as a starter, instead of svc.startd.

$ svcs -p rlogin
STATE STIME FMRI
online Sep_22 svc:/network/login:rlogin

$ rlogin localhost
Password:
Last login: Sun Feb 19 23:49:56 from localhost
Sun Microsystems Inc. SunOS 5.10 Generic January 2005

$ svcs -p rlogin
STATE STIME FMRI
online Sep_22 svc:/network/login:rlogin
23:50:41 23833 in.rlogind
23:50:41 23836 bash
23:50:48 23840 svcs

$ exit
logout
Connection to localhost closed.
$ svcs -p rlogin
STATE STIME FMRI
online Sep_22 svc:/network/login:rlogin

If you kill a process under the control of service management, the program that originally started it will restart it. Here's an example of an Apache2 service that has been running since January 5. First, I double-checked the service by grepping for the process IDs, which match the ones listed with the service. Then, I sent the TERM signal to the parent of all of the child processes.

# svcs -p http
STATE STIME FMRI
online Jan_05 svc:/application/http:apache2
Jan_05 12377 httpd
Jan_05 12378 httpd
Jan_05 12379 httpd
Jan_05 12380 httpd

# ps -ef | grep http
root 12377 1 0 Jan 05 ? 2:14 /opt/apache2/bin/httpd -DPERL
root 23521 23520 0 20:33:01 pts/1 0:00 grep http
http 12378 12377 0 Jan 05 ? 0:00 /opt/apache2/bin/httpd -DPERL
http 12380 12377 0 Jan 05 ? 0:00 /opt/apache2/bin/httpd -DPERL

# kill -TERM 12377

# ps -ef | grep http
root 23527 23520 0 20:33:25 pts/1 0:00 grep http
root 23580 1 0 20:33:09 ? 0:01 /opt/apache2/bin/httpd -DPERL
http 23581 23580 0 20:33:10 ? 0:00 /opt/apache2/bin/httpd -DPERL
http 23582 23580 0 20:33:12 ? 0:00 /opt/apache2/bin/httpd -DPERL
http 23583 23580 0 20:33:12 ? 0:00 /opt/apache2/bin/httpd -DPERL

# svcs -p svc:/application/http:apache2
STATE STIME FMRI
online 20:33:09 svc:/application/http:apache2
20:33:09 23580 httpd
20:33:10 23581 httpd
20:33:11 23582 httpd
20:33:11 23583 httpd

I then rechecked for the httpd processes to find that the svc.start daemon started new Apache servers. Then I examined the http service. It reported that the service time had changed, and listed the new process IDs.

The following table lists some SMF services, their associated processes, and their restarter FMRI:

ServiceProcessesRestarter
svc:/system/svc/restarter:sendmail svc.startd none
svc:/network/smtp:sendmail sendmail svc:/system/svc/restarter:default
svc:/network/ssh:default sshd svc:/system/svc/restarter:default
svc:/system/sac:default sac
ttymon
svc:/system/svc/restarter:default
svc:/network/inetd:default inetd svc:/system/svc/restarter:default
svc:/network/telnet:default in.telnetd svc:/network/inetd:default

If you want to know the restarter for a service, use svcs -l. Use svcs -R with a full FMRI to list all of the services a restarter service controls.

$ svcs -l network/ssh
fmri svc:/network/ssh:default
name SSH server
enabled true
state online
next_state none
state_time Thu Sep 22 07:51:15 2005
logfile /var/svc/log/network-ssh:default.log
restarter svc:/system/svc/restarter:default
contract_id 52
dependency require_all/none svc:/system/filesystem/local (online)
dependency optional_all/none svc:/system/filesystem/autofs (online)
dependency require_all/none svc:/network/loopback (online)
dependency require_all/none svc:/network/physical (online)
dependency require_all/none svc:/system/cryptosvc (online)
dependency require_all/none svc:/system/utmp (online)
dependency require_all/restart file://localhost/etc/ssh/sshd_config (online)

$ svcs -R svc:/system/svc/restarter:default
STATE STIME FMRI
disabled Sep_22 svc:/system/metainit:default
disabled Sep_22 svc:/network/rpc/keyserv:default
online Sep_22 svc:/system/svc/restarter:default
online Sep_22 svc:/network/pfil:default
online Sep_22 svc:/milestone/name-services:default
online Sep_22 svc:/network/loopback:default
....

Controlling Services



Enable or disable a service using the svcadm command:







# svcs -x telnet
svc:/network/telnet:default (Telnet server)
State: online since Thu Sep 22 07:51:11 2005
See: in.telnetd(1M)
See: telnetd(1M)
Impact: None.

# svcadm disable svc:/network/telnet:default

# svcs -x telnet
svc:/network/telnet:default (Telnet server)
State: disabled since Sun Feb 19 23:32:40 2006
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: in.telnetd(1M)
See: telnetd(1M)
Impact: This service is not running.


The configuration state of a service is recorded in the service
repository, so changes to that state persist across reboots. If you
disable telnet, rebooting the host won't bring it back
up. You must explicitly reenable it from the command line. Make a
temporary change to the state of a service by adding the -t option to svcadm:



# svcadm disable -t network/telnet


There are six different service states for configured SMF services.



online
The service is enabled and is running or available to run, or the tasks associated with this service are complete.
offline
The service is enabled but has not yet reached the online state. It
is either in the process of starting up, or the dependencies of the
service are not yet online.
disabled
The service is not enabled and should not be running.
degraded
The service is running but in a limited capacity. The Sun
documentation is very vague about what "degraded" means, and suggests
that the programs associated with the service are responsible for
making that determination.
maintenance
The service has a problem, and it cannot continue to run or
complete a task. A service in this state usually requires
administrative intervention. The restarter for the service won't try to
bring the service online until it has been cleared.
legacy-run
This is the default state for legacy run services.


The process of starting or stopping a service is listed in the
service manifest. Most services have a method script associated with
them that handle starting and stopping the service, just like an rc
script. The restarter service runs this script to bring the service
online or offline.



The svcadm command gives administrators a standard interface for controlling services. svcadm recognizes several service management commands:



enable
Brings the service online.
disable
Takes the service offline.
restart
Restarts the service process, either by performing a disable
followed by an enable, or a specific programmed method to restart the
service.
refresh
The refresh method rereads the service properties from the
repository. This is useful if someone made configuration changes to the
service definition. If that service is controlled by svc.startd and
that service also defines an internal refresh method, then the refresh
method runs. A program that is usually refreshed rereads its
configuration file.
clear
Resets a service that is in the maintenance state.
mark (degraded or maintenance)
Deliberately sets the state of a service to either degraded or maintenance. This is usually used for debugging a service.


The svcadm command is more picky about wildcards, unlike svcs. You can still use abbreviated FMRIs and wildcards, as long as they match only one full FMRI.



# svcadm refresh svc:/network/login
svcadm: Pattern 'svc:/network/login' matches multiple instances:
svc:/network/login:rlogin
svc:/network/login:klogin
svc:/network/login:eklogin

# svcadm refresh "svc:/*rlogin"

# svcs "*rlogin"
STATE STIME FMRI

Boot-up and Runlevels



Because rc scripts are no longer the preferred method used to manage
programs, Sun has enhanced the runlevel model with service milestones.







In Unix, runlevel one is single user mode, two is multiuser mode,
and three is multiuser mode with file sharing or network services. In
each runlevel, there is a core set of services that must be brought
online.



For example, levels one, two, and three all require a minimum amount
of local filesystems to be mounted, and network interfaces to be
online. Runlevel two requires all internet services to be online, and
users must be able to log on to the host. Runlevel three requires
everything level two does, plus the ability to share files by NFS.



Milestones are services that don't run any applications but do have
a dependent list of services. Once those services are online, the
milestone is marked online. The milestone ensures an expected group of
services are up and running, so you don't have to check each individual
service.







Here is a list of milestones currently online. In this case, seven
milestones are online because they all had their dependencies met.



$ svcs "svc:/milestone/*"
online Sep_22 svc:/milestone/name-services:default
online Sep_22 svc:/milestone/network:default
online Sep_22 svc:/milestone/devices:default
online Sep_22 svc:/milestone/single-user:default
online Sep_22 svc:/milestone/sysconfig:default
online Sep_22 svc:/milestone/multi-user:default
online Sep_22 svc:/milestone/multi-user-server:default


Here is a list of milestones and their equivalant rc levels.



MilestoneRC LevelDescription
svc:/milestone/devices:default
Devices
svc:/milestone/network:default
Network interfaces online
svc:/milestone/single-user:default 1Single-user mode
svc:/milestone/sysconfig:default
Basic system configuration
svc:/milestone/name-services:default
Any one of the NIS, NIS+, DNS, or LDAP services
svc:/milestone/multi-user:default 2Multiuser mode
svc:/milestone/milti-user-server:default 3Multiuser server mode


Consider the dependencies for svc:/milestone/multi-user:default:



$ svcs -d milestone/multi-user
STATE STIME FMRI
disabled Sep_22 svc:/network/smtp:sendmail
online Sep_22 svc:/milestone/name-services:default
online Sep_22 svc:/milestone/single-user:default
online Sep_22 svc:/system/filesystem/local:default
online Sep_22 svc:/network/rpc/bind:default
online Sep_22 svc:/milestone/sysconfig:default
online Sep_22 svc:/system/utmp:default
online Sep_22 svc:/network/inetd:default
online Sep_22 svc:/network/nfs/client:default
online Sep_22 svc:/system/system-log:default


Milestones are checkpoints in the operating system. Before multiuser mode can be online, network/smtp, milestone/name-services, milestone/single-user, rpc/bind, and the other services listed must be online as well.



One of the dependent services listed is milestone/single-user, which has its own list of dependencies:



$ svcs -d milestone/single-user
STATE STIME FMRI
disabled Sep_22 svc:/system/metainit:default
online Sep_22 svc:/network/loopback:default
online Sep_22 svc:/milestone/network:default
online Sep_22 svc:/milestone/devices:default
online Sep_22 svc:/system/filesystem/minimal:default
online Sep_22 svc:/system/manifest-import:default
online Feb_21 svc:/system/identity:node


Instead of making all milestones dependent on common services, the
milestones are set up as cascading checkpoints. When you change the
dependency list for milestone/single-user, you don't need to change the dependencies for milestone/multi-user-server.



To change the milestone level of the host, use the svcadm command:



$ svcadm milestone -d [milestone FMRI]


The -d option lets you set your choice as the default milestone. This option will persist across reboots.



As far as shutting down the host, the shutdown or init commands are still the preferred methods of performing a safe shutdown or reboot.



Debugging Problems with Services



Sometimes services fail due to unavoidable circumstances. For
example, a bad configuration file will prevent the Apache process from
starting. If the service fails, it will usually end up being marked in
the maintenance state. To correct this problem, you need to know where
to look for problems.



# svcs http
STATE STIME FMRI
maintenance 20:51:31 svc:/application/http:apache2

# svcs -x http
svc:/application/http:apache2 (Apache2 Server)
State: maintenance since Mon Feb 20 20:51:31 2006
Reason: Method failed.
See: http://sun.com/msg/SMF-8000-8Q
See: httpd(8)
See: /var/svc/log/application-http:apache2.log
Impact: This service is not running.


Each service keeps a log with the output from the method script.
Most errors will appear in this file, as long as the program writes out
errors to stdout or stderr.



# tail /var/svc/log/application-http/:apache2.log
Syntax error on line 23 of /etc/opt/apache2/httpd.conf:
Invalid command 'Kisten', perhaps mis-spelled or defined by a module not included in the server configuration
[ Feb 20 20:50:30 Method "stop" exited with status 0 ]
[ Feb 20 20:51:31 Method or service exit timed out. Killing contract 957 ]
[ Feb 20 20:51:31 Rereading configuration. ]


Another option is to check the log of svc.startd, as it is the restarter process for the Apache service.



# tail /var/svc/log/svc.stard.log
Feb 20 20:51:31/3: svc:/application/http:apache2: Method or service exit timed
out. Killing contract 957.
Feb 20 20:51:31/520: application/http:apache2 failed


After you have corrected the error, use the svcadm command to clear the maintenance state.



# svcadm clear application/http:apache2

# svcs -x http
svc:/application/http:apache2 (Apache2 Server)
State: online since Mon Feb 20 21:00:22 2006
See: httpd(8)
See: /var/svc/log/application-http:apache2.log
Impact: None.


The important thing to remember is that the Service Management
Facility isn't designed to block normal access to programs or
processes. If you really need to perform serious testing of Apache httpd
or other programs, it's still possible to invoke these commands from
the command line. If a service is in the maintenance state, then go
ahead and run http -t, or sendmail -bD, or whatever command you need to run. SMF will not interfere with processes that did not initiate from its own starter.




Chris Josephes
works as a system administrator for Internet Broadcasting.


online 23:24:17 svc:/network/login:rlogin


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值