System Administrator's Collection: 2008

Wednesday, July 23, 2008

Change SHMMAX without rebooting

Shmmax parameter is supposed to be the maximum size of a single shared memory segments (and the oracle sga is build out of these shared memory segments).

New to the Solaris 8 release is the modular debugger, mdb(1), which is unique among
available Solaris debuggers because it is easily extensible. Mdb(1) also includes a number of desirable usability features including command-line editing, command history, built-in output pager, syntax checking, and command pipelining. This is the recommended post-mortem debugger for the kernel.

To change the value of the integer variable shmmax from 8GB to 10 GB without Reboot Server, do the following

# cp /etc/system /etc/system_old

# grep shminfo_shmmax /etc/system
set shmsys:shminfo_shmmax=81920
# mdb -k
Loading modules: [ unix krtld genunix ip usba s1394 ipc nfs ptm logindmux random ]
> shminfo_shmmax /D
shminfo_shmmax:
shminfo_shmmax: 1
> shminfo_shmmax /E
shminfo_shmmax:
shminfo_shmmax: 81920
> $q

at we can see the “shminfo_shmmax” use a 64 bit value, let’s start to change the value

# mdb -kw
Loading modules: [ unix krtld genunix ip usba s1394 ipc nfs ptm logindmux random ]
> shminfo_shmmax /Z 0t102400
shminfo_shmmax: 0x5f5e10000 = 0x19000
> shminfo_shmmax /E
shminfo_shmmax:
shminfo_shmmax: 102400
> $q

After successfully, change the parameter “shminfo_shmmax” at /etc/system with same value on mdb

# vi /etc/system
set shmsys:shminfo_shmmax=102400

Taken from: http://sysinfo.bascomp.org/2008/02/21/change-shmmax-without-rebooting/

Thursday, June 19, 2008

How to submit dump/snap file to IBM

How to submit dump/snap file to IBM

Open a case with IBM.
Log in as root.

At the command line, enter:
# sysdumpdev -L

Look at the dump size and then execute #df -Im to find a filesystem with enough space to proceed to packaging. These directions assume /tmp has enough space.

# snap -gfkGLDN
# cd /tmp/ibmsupt/dump
# ls

Ensure that unix.Z, dump.snap and dump.Z(or dump.BZ) are present.
# cd /tmp/ibmsupt
# snap -c

This will create a snap.pax.Z file in the /tmp/ibmsupt directory.

The snap file will need to be renamed to pmr#.branch#.countrycode.snap.pax.Z (US=000)
# mv snap.pax.Z

After the snap files have been renamed and you have a PMR number, ftp it to IBM:
ftp testcase.software.ibm.com
login: anonymous
password:
ftp> cd /toibm/aix
ftp> bin
ftp> put
ftp> quit

How to replace mirrored hard disk in Sun Solaris 8 Server

How to replace mirrored hard disk in Sun Solaris 8 Server

error hard disk = c1t1d0

if the harddisk is hotswapable, the hard disk can be replace without server downtime. Otherwise, you need to break the mirrored harddisk before replacing the hard disk.

# metastat -p
d10 -m d11 d12 1
d11 1 1 c1t0d0s0
d12 1 1 c1t1d0s0
d23 -m d22 d21 1
d22 1 1 c1t1d0s1
d21 1 1 c1t0d0s1
d30 -m d31 d32 1
d31 1 1 c3t8d0s0
d32 1 1 c4t8d0s0
d40 -m d41 d42 1
d41 1 1 c3t8d0s1
d42 1 1 c4t8d0s1
d50 -m d51 d52 1
d51 1 1 c1t0d0s4
d52 1 1 c1t1d0s4
d60 -m d61 d62 1
d61 1 1 c1t0d0s3
d62 1 1 c1t1d0s3
d90 -m d91 d92 1
d91 1 1 c3t8d0s5
d92 1 1 c4t8d0s5
d100 -m d101 d102 1
d101 1 2 c3t9d0s0 c3t11d0s0 -i 32b
d102 1 2 c4t9d0s0 c4t11d0s0 -i 32b
d110 -m d111 d112 1
d111 1 1 c3t12d0s0
d112 1 1 c4t12d0s0

# metadetach -f d50 d52
d50: submirror d52 is detached

# metadetach -f d60 d62
d60: submirror d62 is detached

# metadetach -f d23 d22
d23: submirror d22 is detached

# metadetach -f d10 d12
d10: submirror d12 is detached

# metastat -p
d10 -m d11 1
d11 1 1 c1t0d0s0
d23 -m d21 1
d21 1 1 c1t0d0s1
d30 -m d31 d32 1
d31 1 1 c3t8d0s0
d32 1 1 c4t8d0s0
d40 -m d41 d42 1
d41 1 1 c3t8d0s1
d42 1 1 c4t8d0s1
d50 -m d51 1
d51 1 1 c1t0d0s4
d60 -m d61 1
d61 1 1 c1t0d0s3
d90 -m d91 d92 1
d91 1 1 c3t8d0s5
d92 1 1 c4t8d0s5
d100 -m d101 d102 1
d101 1 2 c3t9d0s0 c3t11d0s0 -i 32b
d102 1 2 c4t9d0s0 c4t11d0s0 -i 32b
d110 -m d111 d112 1
d111 1 1 c3t12d0s0
d112 1 1 c4t12d0s0
d12 1 1 c1t1d0s0
d22 1 1 c1t1d0s1
d52 1 1 c1t1d0s4
d62 1 1 c1t1d0s3

# metaclear d12
d12: Concat/Stripe is cleared

# metaclear d22
d22: Concat/Stripe is cleared

# metaclear d52
d52: Concat/Stripe is cleared

# metaclear d62
d62: Concat/Stripe is cleared

# metadb -i
flags first blk block count
a m p luo 16 1034 /dev/dsk/c1t0d0s7
a p luo 1050 1034 /dev/dsk/c1t0d0s7
a p luo 2084 1034 /dev/dsk/c1t0d0s7
W p l 16 1034 /dev/dsk/c1t1d0s7
W p l 1050 1034 /dev/dsk/c1t1d0s7
W p l 2084 1034 /dev/dsk/c1t1d0s7
a p luo 16 1034 /dev/dsk/c3t9d0s7
a p luo 1050 1034 /dev/dsk/c3t9d0s7
a p luo 2084 1034 /dev/dsk/c3t9d0s7
a p luo 16 1034 /dev/dsk/c3t11d0s7
a p luo 1050 1034 /dev/dsk/c3t11d0s7
a p luo 2084 1034 /dev/dsk/c3t11d0s7
a p luo 16 1034 /dev/dsk/c3t12d0s7
a p luo 1050 1034 /dev/dsk/c3t12d0s7
a p luo 2084 1034 /dev/dsk/c3t12d0s7
a p luo 16 1034 /dev/dsk/c4t8d0s7
a p luo 1050 1034 /dev/dsk/c4t8d0s7
a p luo 2084 1034 /dev/dsk/c4t8d0s7
a p luo 16 1034 /dev/dsk/c4t9d0s7
a p luo 1050 1034 /dev/dsk/c4t9d0s7
a p luo 2084 1034 /dev/dsk/c4t9d0s7
a p luo 16 1034 /dev/dsk/c4t11d0s7
a p luo 1050 1034 /dev/dsk/c4t11d0s7
a p luo 2084 1034 /dev/dsk/c4t11d0s7
a p luo 16 1034 /dev/dsk/c4t12d0s7
a p luo 1050 1034 /dev/dsk/c4t12d0s7
a p luo 2084 1034 /dev/dsk/c4t12d0s7
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

# metadb -d c1t1d0s7

# metadb -i
flags first blk block count
a m p luo 16 1034 /dev/dsk/c1t0d0s7
a p luo 1050 1034 /dev/dsk/c1t0d0s7
a p luo 2084 1034 /dev/dsk/c1t0d0s7
a p luo 16 1034 /dev/dsk/c3t9d0s7
a p luo 1050 1034 /dev/dsk/c3t9d0s7
a p luo 2084 1034 /dev/dsk/c3t9d0s7
a p luo 16 1034 /dev/dsk/c3t11d0s7
a p luo 1050 1034 /dev/dsk/c3t11d0s7
a p luo 2084 1034 /dev/dsk/c3t11d0s7
a p luo 16 1034 /dev/dsk/c3t12d0s7
a p luo 1050 1034 /dev/dsk/c3t12d0s7
a p luo 2084 1034 /dev/dsk/c3t12d0s7
a p luo 16 1034 /dev/dsk/c4t8d0s7
a p luo 1050 1034 /dev/dsk/c4t8d0s7
a p luo 2084 1034 /dev/dsk/c4t8d0s7
a p luo 16 1034 /dev/dsk/c4t9d0s7
a p luo 1050 1034 /dev/dsk/c4t9d0s7
a p luo 2084 1034 /dev/dsk/c4t9d0s7
a p luo 16 1034 /dev/dsk/c4t11d0s7
a p luo 1050 1034 /dev/dsk/c4t11d0s7
a p luo 2084 1034 /dev/dsk/c4t11d0s7
a p luo 16 1034 /dev/dsk/c4t12d0s7
a p luo 1050 1034 /dev/dsk/c4t12d0s7
a p luo 2084 1034 /dev/dsk/c4t12d0s7
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected configured unknown
c3::dsk/c3t11d0 disk connected configured unknown
c3::dsk/c3t12d0 disk connected configured unknown
c3::dsk/c3t8d0 disk connected configured unknown
c3::dsk/c3t9d0 disk connected configured unknown
c3::es/ses0 processor connected configured unknown
c4 scsi-bus connected configured unknown
c4::dsk/c4t11d0 disk connected configured unknown
c4::dsk/c4t12d0 disk connected configured unknown
c4::dsk/c4t8d0 disk connected configured unknown
c4::dsk/c4t9d0 disk connected configured unknown

# cfgadm -c unconfigure c1::dsk/c1t1d0

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 unavailable connected unconfigured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected configured unknown
c3::dsk/c3t11d0 disk connected configured unknown
c3::dsk/c3t12d0 disk connected configured unknown
c3::dsk/c3t8d0 disk connected configured unknown
c3::dsk/c3t9d0 disk connected configured unknown
c3::es/ses0 processor connected configured unknown
c4 scsi-bus connected configured unknown
c4::dsk/c4t11d0 disk connected configured unknown
c4::dsk/c4t12d0 disk connected configured unknown
c4::dsk/c4t8d0 disk connected configured unknown
c4::dsk/c4t9d0 disk connected configured unknown

Shut down server and replace hard disk here if the server does not support hotswapable

#devfsadm

# cfgadm -c configure c1::dsk/c1t1d0
or
# cfgadm -x replace_device c1::sd1
Replacing SCSI device: /devices/pci@1c,600000/scsi@2/sd@1,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? yes
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected unconfigured unknown
c3 scsi-bus connected configured unknown
c3::dsk/c3t11d0 disk connected configured unknown
c3::dsk/c3t12d0 disk connected configured unknown
c3::dsk/c3t8d0 disk connected configured unknown
c3::dsk/c3t9d0 disk connected configured unknown
c3::es/ses0 processor connected configured unknown
c4 scsi-bus connected configured unknown
c4::dsk/c4t11d0 disk connected configured unknown
c4::dsk/c4t12d0 disk connected configured unknown
c4::dsk/c4t8d0 disk connected configured unknown
c4::dsk/c4t9d0 disk connected configured unknown

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@1c,600000/scsi@2/sd@0,0
1. c1t1d0
/pci@1c,600000/scsi@2/sd@1,0
2. c1t2d0
/pci@1c,600000/scsi@2/sd@2,0
3. c1t3d0
/pci@1c,600000/scsi@2/sd@3,0
4. c3t8d0
/pci@1d,700000/pci@1/scsi@4/sd@8,0
5. c3t9d0
/pci@1d,700000/pci@1/scsi@4/sd@9,0
6. c3t11d0
/pci@1d,700000/pci@1/scsi@4/sd@b,0
7. c3t12d0
/pci@1d,700000/pci@1/scsi@4/sd@c,0
8. c4t8d0
/pci@1d,700000/pci@1/scsi@5/sd@8,0
9. c4t9d0
/pci@1d,700000/pci@1/scsi@5/sd@9,0
10. c4t11d0
/pci@1d,700000/pci@1/scsi@5/sd@b,0
11. c4t12d0
/pci@1d,700000/pci@1/scsi@5/sd@c,0
Specify disk (enter its number): ^D

# prtvtoc /dev/rdsk/c1t0d0s2 fmthard -s - /dev/rdsk/c1t1d0s2
fmthard: New volume table of contents now in place.

# metadb -a c1t1d0s7

# metainit d12 1 1 c1t1d0s0
d12: Concat/Stripe is setup

# metainit d22 1 1 c1t1d0s1
d22: Concat/Stripe is setup

# metainit d52 1 1 c1t1d0s4
d52: Concat/Stripe is setup

# metainit d62 1 1 c1t1d0s3
d62: Concat/Stripe is setup

# metattach d60 d62
d60: submirror d62 is attached

# metattach d50 d52
d50: submirror d52 is attached

# metattach d23 d22
d23: submirror d22 is attached

# metattach d10 d12
d10: submirror d12 is attached

# metastat -p
d10 -m d11 d12 1
d11 1 1 c1t0d0s0
d12 1 1 c1t1d0s0
d23 -m d22 d21 1
d22 1 1 c1t1d0s1
d21 1 1 c1t0d0s1
d30 -m d31 d32 1
d31 1 1 c3t8d0s0
d32 1 1 c4t8d0s0
d40 -m d41 d42 1
d41 1 1 c3t8d0s1
d42 1 1 c4t8d0s1
d50 -m d51 d52 1
d51 1 1 c1t0d0s4
d52 1 1 c1t1d0s4
d60 -m d61 d62 1
d61 1 1 c1t0d0s3
d62 1 1 c1t1d0s3
d90 -m d91 d92 1
d91 1 1 c3t8d0s5
d92 1 1 c4t8d0s5
d100 -m d101 d102 1
d101 1 2 c3t9d0s0 c3t11d0s0 -i 32b
d102 1 2 c4t9d0s0 c4t11d0s0 -i 32b
d110 -m d111 d112 1
d111 1 1 c3t12d0s0
d112 1 1 c4t12d0s0

Check metastat for the mirror resync status.

# metastat

Friday, May 16, 2008

Setting Up a Solaris DHCP Client

Introduction

One of the problems that can arise when trying to use a Solaris box as a DHCP client is that by default, the server is expected to supply a hostname, in addition to all the other stuff (like IP address, DNS servers, etc.). Most cable modems and home routers don't supply a (usable) hostname, so it gets set to "unknown". This page describes how to get around that. (Where this page says "cable modem", "DSL modem" can be substituted.)

This page assumes that le0 is the interface you using for your DHCP connection. Substitute hme0 or whatever interface you're actually using in the examples below.

Setting up DHCP

There are two ways of using DHCP:

DHCP has limited control
DHCP has full control

The first case may be where you want to use your own /etc/resolv.conf and so on, with a minimum of hassle.

The second case would be the normal situation, especially if your cable modem provider has a habit of changing DNS name server IP addresses on you (like mine does!), so I'll concentrate on that here. I have a script to automate the first method, should you want to use it. You'll need to change the DEFAULT_ADDR and INTERFACE variables as required.

The first thing to do is to create an empty /etc/hostname.le0, like this:

> /etc/hostname.le0

Creating this file ensures that the interface gets plumbed, ready for the DHCP software to do its stuff.

Next, you create /etc/dhcp.le0. This file can be empty if you want to accept the defaults, but may also contain one or both of these directives:

wait time, and
primary

By default, ifconfig will wait 30 seconds for the DHCP server to respond (after which time, the boot will continue, while the interface gets configured in the background). Specifying the wait directive tells ifconfig not to return until the DHCP has responded. time can be set to the special value of forever, with obvious meaning. I use a time value of 300, which seems to be long enough for my cable provider.

The primary directive indicates to ifconfig that the current interface is the primary one, if you have more than one interface under DHCP control. If you only have one interface under DHCP control, then it is automatically the primary one, so primary is redundant (although it's permissible).

With these files in place, subsequent reboots will place le0 under DHCP control: you're ready to go!

Unknown hostname

Actually, there's one snag: most (if not all) cable modem DHCP servers don't provide you with a hostname (even if they did, odds are it won't be one you want anyway!). This wouldn't be a problem, except that the boot scripts (/etc/init.d/rootusr in particular) try to be clever, and set your hostname to "unknown" in this case, which is not at all useful!

The trick is to change your hostname back to the right one, preferably without changing any of the supplied start-up scripts, which are liable to be being stomped on when you upgrade or install a patch. You've also got to do it early enough in the boot process, so that rpcbind, sendmail and friends don't get confused by using the wrong hostname. To solve this problem, put this little script in to /etc/init.d/set_hostname, with a symbolic link to it from /etc/rc2.d/S70set_hostname.

Starting with Solaris 10, the preceding paragraph can be ignored. Instead, just make sure that the hostname you want to use is in /etc/nodename; the contents of that file will then be used to set the hostname. (Note that it is essential that the hostname you put into /etc/nodename is terminated with a carriage return. Breakage will happen if this is not the case.) Also, from Solaris 8 it is possible to tell the DHCP software not to request a hostname from the DHCP server. To do this, remove the token 12 from the PARAM_REQUEST_LIST line in /etc/default/dhcpagent. (/etc/default/dhcpagent describes what the default tokens are; 12 is the hostname, 3 is the default router, 6 is the DNS server, and so on.)

With these modifications in place, reboot, and you'll be using your cable modem in no time!

Taken from: http://www.rite-group.com/rich/solaris_dhcp.html

Wednesday, May 14, 2008

Quick Tips to Find Files on Linux File System

One of the first hurdles that every Linux newbie working on Command Line Interface (CLI) bumps into is finding files on the file system. Administrators who switch from Windows environment are so much used to the click-n-find mentality that discovering files via Linux CLI is painful for them. This tutorial is written for those friends who work on Linux and don’t have the luxury of Graphical User Interface (GUI).

I started playing with Linux during my internship, working with Snort (Intrusion Detection System), Nessus (Vulnerability Scanner) and IPTables (Firewall). Like most of programs, these tools also have quite a few configuration files. Initially, it was difficult for me to remember path to each file and I started to use the power of ‘find’ and ‘locate’ commands which I will share with you in this tutorial.

Method 1: LOCATE
Before we start playing around with LOCATE command, it’s important to learn about “updatedb”. Every day, your system automatically via cron runs updatedb command to create or update a database that keeps a record of all filenames. The locate command then searches through this database to find files.

This database is by default stored at /var/lib/mlocate/mlocate.db. Obviously we are curious to what this database looks like, so first I do ls -lh to find the size of this file.

Since this is in db format, I doubt if we would see anything legible with a “cat” command. So instead I used a string command, which threw a lot of file names on the string (132516 to be exact). Hence, I used grep to only see filenames which have lighttpd – a web server installed on my system.

But, of course this is not the right way to do searches. This we did just to see what updatedb is doing. Now let’s get back to “locate”. Remember that since locate is reading the database created by updatedb, so your results would be as new as the last run of updatedb command. You can always run updatedb manually from the CLI and then use the locate command.

Let’s start exercising this command by searching for commands. I start by looking for pdf documentation files for “snort”. If I just type in “locate snort” it gives me 1179 file names in result.

[root@localhost:~] locate snort less
/etc/snort
/etc/snort/rules
/etc/snort/rules/VRT-License.txt
/etc/snort/rules/attack-responses.rules
/etc/snort/rules/backdoor.rules
/etc/snort/rules/bad-traffic.rules
/etc/snort/rules/cgi-bin.list
/etc/snort/rules/chat.rules
/etc/snort/rules/classification.config
/etc/snort/rules/ddos.rules
/etc/snort/rules/deleted.rules
....

But, I want the documentation files which I already know are in PDF format. So now I will use power or regular expressions to further narrow down my results.

The “–r” options is used to tell “locate” command to expect a regular expression. In the above case, I use pdf$ in regex to only show me files which end with pdf.

Remember that updatedb exclude temporary folders, so it may not give you results as you expect. To remove these bottlenecks comes the command “find”.

Method 2: Find
Find command is the most useful of all commands I have used in my few years of managing Linux machines. Still this command is not fully understood and utilized by many administrators. Unlike “locate” command, “find” command actually goes through the file-system and looks for the pattern you define while running the command.

Most common usage of “find” command is to search for a file with specific file name.

Like “-name” find command has other qualifiers based on time as show below. These are also very helpful if you are doing forensic analysis on your Linux machine.

-iname = same, as name but case insensitive
-atime n = true, if file was accessed n days ago
-amin n = true, if file was accessed n minutes ago
-mtime n = true, if file contents were changed n days ago
-mmin n = true, if file content were changed n minutes ago
-ctime n = true, if file attributes were changed n days ago
-cmin n = true, if file attributes were changed n minutes ago

To make reader understand these qualifiers, I created a file with name “foobar.txt” four minutes back and then I run “find /root -mmin -5” to show me all files in /root folder where last modification time is less than 5 minutes and it shows me the foobar.txt file. However, if I change the value of –mmin to less than 2 minutes, it shows me nothing.

There is another very useful qualifier, which searches on file size.

Some other qualifiers that I always use while administering Linux servers are:

-regex expression = select files which match the regular expression
-iregex expression = same as above but case insensitive
-empty = select files and directories which are empty
-type filetype = Select file by Linux file types
-user username = Select files owned by the given user
-group groupname = Select files owned by the given group

There are few more qualifiers, but I leave those as homework for you to read the manpage and enhance your knowledge.

NOTE: One thing you will notice is that “locate” runs at super fast, that’s because it is looking from a database file rather than actually traversing the file system.

This was a very short and crisp introduction to find and locate commands, but these are the most important commands for any administrator. Once you get used to them, you will wish there was something similar and so powerful in windows.

Taken from: http://www.secguru.com/article/quick_tips_find_files_linux_file_system

Configuring sar for your system

Before you can tune a system properly, you must decide which system characteristics are important, and which ones are less so. Once you decide your priorities, you then need to find a way
to measure the system performance according to those priorities. In fact, the system activity reporter programs are a good measuring tool for many aspects of system performance. In this article, we'll introduce you to the sar utility, which can give you detailed performance information about your system.

What does sar measure?

Since system tuning involves the art of finding acceptable compromises, you need the ability see the impact of your changes on multiple subsystems. System activity reporter (SAR) programs
collect system-performance information in distinct groups. Table A shows how sar groups the performance information. The first column shows the switch you give to sar in order to request that particular information group, and the second column briefly describes the information group.

Table A

Switch Performance Monitoring Group
A All monitoring groups
a File access statistics
b Buffer activity
c System call activity
d Block device activity
g Paging out activity
k Kernel memory allocation
m Message and semaphores
p Paging in activity
q CPU Run queue statistics
r Unused memory and disk pages
u CPU usage statistics (default)
v Report status of system tables
w System swapping and switching
y TTY device activity

One way you can run sar is to specify a sampling interval and the number of times
you want it to run. So, if you want to check the file-access statistics every
20 seconds for the next five minutes, you'd run sar like this:This whole
listing is the command? It's just the first row, isn't it? What follows is the
results of the command?

$ sar -a 20 15

SunOS Devo 5.5.1 Generic_103641-08 i86pc    11/05/97

01:06:02  iget/s namei/s dirbk/s

01:06:22     270     397     278

01:06:42     602     785     685

01:07:02     194     238     215

Configuring sar to collect data

Notice that you can't just run sar right now. If you try to run the sar command without first configuring
it, it gives you an error message like this:

$ sar -a 20 15

sar: can't open /var/adm/sa/sa03

No such file or directory

Sure enough, if you look at the /var/adm/sa directory, you won't see any files in it, much less that
sa03 file it's complaining about. If you create a blank file, using touch, for example, sar will start to work. However, why must you do something so strange to make sar work? And if you try to run sar tomorrow, you'll get a similar error, but this time it will complain about a different file, such as sa04.

It turns out that the sar program is only one part of the performance monitoring package. Three commands in the /usr/lib/sa directory also contribute to the whole. The sadc command collects system data and stores it to a binary file, suitable for sar to use. The shell script sa1 is a wrapper for sadc, suitable for use in cron jobs, so it can be run automatically. The sa2 script is a wrapper for sar that forces it to print a report in ASCII format from the binary information in the files sadc creates.

If you run the sa1 script as intended, it creates a binary file containing all the performance statistics for the day. This file allows sar to read the data and report on it without forcing you to wait and collect it. Since you may want to investigate the data a bit later, or compare one days' worth of information against another, the sar, sa1, and sa2 programs name the data file using the
same format: /var/adm/sa/saX, where X is the day number. Therefore, when you run sar, one of the first things it does is look for today's binary file. When it doesn't find the file, it prints the error.

The best way to run sa1 and sa2 is from a cron job. Sun provides an example of how to create the cron job instead of forcing you to figure it out for yourself. Thus, if you edit the crontab for the account sys, you'll see commented-out sample cron schedules for sa1 and sa2, as shown in Figure A.

Figure A: The sys account already has prototype entries for running sa1 and sa2, which you can uncomment and use.

#ident  "@(#)sys        1.5     92/07/14 SMI"   /* SVr4.0 1.2   */

# The sys crontab should be used to do performance collection. See cron

# and performance manual pages for details on startup.

#0 * * * 0-6 /usr/lib/sa/sa1

#20,40 8-17 * * 1-5 /usr/lib/sa/sa1

#5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A

The first cron schedule uses sa1 to take a snapshot of system performance at the beginning of every hour every day. The second cron schedule adds a snapshot at 20 minutes (:20) and 40
minutes (:40) after the hour between 8:00A.M. and 5:00 P.M., every Monday through Friday. As a result, you get more detail during business hours, and less during the evenings and weekends.

The final line schedules sa2 to run at 6:05 every Monday through Friday to create an ASCII report from the data collected by sa1. This ASCII data is stored using a similar filename convention: /var/adm/sa/sarX, again where X is the day number.

The simplest way to configure sar to run is to edit the sys account's crontab and remove the # signs from the start of the sa1 and sa2 command lines. However, you may want to customize the cron schedules to suit your own preferences. For example, your company might run multiple shifts, and you may want more detailed data. Thus, you can modify the cron job to run sa1 at 15-minute intervals, every business day.

You can't just log into the sys account and edit the cron job, though, because the sys account is usually locked. Instead, you must log in as root, then su to the sys account, like so:

$ su

Password:

# su sys

 At this point, be sure to set the EDITOR environment variable to your favorite editor, and edit the
crontab file, like this:

# EDITOR=vi

# export EDITOR

# crontab –e

Now, your favorite editor (vi, in this case) comes up, and you can edit the cron schedules. For our
example, we just want to run sa1 every 15 minutes every day, and the sa2 program should generate ASCII versions of the data just before midnight. So
we'll change the cron schedule to look like this:

0,15,30,45 * * * 0-6 /usr/lib/sa/sa1

55 23 * * 0-6 /usr/lib/sa/sa2 –A

Next, we save the file and exit, and crontab will start the appropriate cron jobs for us. That's all you
must do to configure sar. Once you do so, you can use sar without worrying about the file open errors any more.

Using the binary data files

Once the system is creating the binary data files, you can use sar without specifying the interval between samples and the number of samples you want to take. You can simply specify the
data sets you want to see, and sar will print all that's accumulated thus far for the day. Therefore, if you're interested in CPU use and paging activity, you'd run sar as shown in Figure B. Since we ran sar near the end of the day, and we're sampling every 15 minutes, we're inundated with details. That's the major problem with detail--it's easy to get swamped.

Figure B: The sar -up command reports detailed information about the CPU and paging use up to the current time.

$ sar -up

SunOS Devo 5.5.1 Generic_103641-08 i86pc 11/04/97

00:00:01    %usr    %sys    %wio   %idle

00:15:00       0       0       0      99

00:30:00       0       0       0      99

00:45:00       0       1       0      99

22:15:00       0       0       0      99

22:30:00       0       0       0      99

22:45:00       1       1       3      95

Average        3       1       4      92

00:00:01  atch/s  pgin/s ppgin/s  pflt/s  vflt/s slock/s

00:15:00    0.00    0.02    0.03    1.82    2.93    0.00

00:30:00    0.00    0.00    0.00    4.35    6.15    0.00

00:45:00    0.00    0.02    0.02   38.95   44.79    0.00

Getting the bigger picture

While getting a detailed picture of your system is wonderful, you probably don't need or want such a detailed report very often. After all, your job is to manage the system, not micromanage it. Do you think the president of your company monitors the details of the day-to-day operations of the company? Of course not--the president is happy to see the weekly reports showing that the business is chugging along smoothly. It's only when the business is having problems that the president starts to examine and analyze details. Your role as system administrator is similar to that of the company president: As long as the system is running smoothly, you merely want to glance at a report to see that everything is going nicely. You don't want to delve into a morass of details unless something's awry. Consequently, what we usually want from sar isn't a detailed report on
all the system statistics, but rather a simple summary.

The sar command provides three command-line switches to let you control how you want sar to summarize its data. The -s and -e options allow you to select the starting and ending times of the report, and the -i option allows you to specify the reporting interval. So you can see an hourly summary of CPU usage during working hours by using sar like this:

$ sar -s 08 -e 18 -i 3600 -u

SunOS Devo 5.5.1 Generic_103641-08 i86pc    11/03/97

08:00:00    %usr    %sys    %wio   %idle

09:00:01       0       1       2      97

10:00:00       3       3       1      94

11:00:00       0       0       0     100

12:00:00       0       0       0     100

13:00:00       0       0       0     100

14:00:00       0       0       0     100

15:00:00       5      56      30       8

16:00:01       3      68      24       5

17:00:00       0      11      10      79

18:00:00       0       0       0     100

Average        1      14       7      78

If we had a performance problem during the day, we could quickly tell when it occurred using this summary report. Then, we'd adjust our s, e, and i options to focus on the details we're actually interested in seeing. Instead of wading through pages of data, we can be selective.

Conclusion

Once you get sar configured, it can capture all the performance statistics for your machine. It's a good idea to browse through the man page for sar a few times to get acquainted with the values it can capture. You don't have to understand all of it, especially at the beginning. To start with, it's a good policy to become familiar with the numbers when your system is operating normally, because then you'll be able to pinpoint which system characteristics are degrading, and begin addressing the problems.

Taken from: http://members.tripod.com/Dennis_Caparas/Configuring_sar_for_your_system.html

Friday, May 9, 2008

How to Expand a Solaris File System

Note

Solaris Volume Manager volumes can be expanded. However, volumes cannot be reduced in size.

· A volume can be expanded whether it is used for a file system, application, or database. You can expand RAID-0 (stripe and concatenation) volumes, RAID-1 (mirror) volumes, and RAID-5 volumes and soft partitions.

· You can concatenate a volume that contains an existing file system while the file system is in use. As long as the file system is a UFS file system, the file system can be expanded (with the growfs command) to fill the larger space. You can expand the file system without interrupting read access to the data.

· Once a file system is expanded, it cannot be reduced in size, due to constraints in the UFS file system.

· Applications and databases that use the raw device must have their own method to expand the added space so that they can recognize it. Solaris Volume Manager does not provide this capability.

· When a component is added to a RAID-5 volume, it becomes a concatenation to the volume. The new component does not contain parity information. However, data on the new component is protected by the overall parity calculation that takes place for the volume.

· You can expand a log device by adding additional components. You do not need to run the growfs command, as Solaris Volume Manager automatically recognizes the additional space on reboot.

· Soft partitions can be expanded by adding space from the underlying volume or slice. All other volumes can be expanded by adding slices.

Taken from: http://docs.huihoo.com/opensolaris/solaris-volume-manager-administration-guide/html/ch20s06.html

Steps for expanding filesystem with soft partition:

1. Check Prerequisites

# df -k /local
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d46 62992061 58223588 4138553 94% /local

# metastat -p d46
d46 -p d50 -o 763363392 -b 117440512 -o 922747008 -b 10485760
d50 -m d49 1
d49 2 2 c8t60060E8004EAEA000000EAEA000027FFd0s6 c8t60060E8004EAEA000000EAEA0000273Bd0s6 -i 32b \
1 c8t60060E8004EAEA000000EAEA000027F7d0s6

# metarecover -n -v /dev/md/rdsk/d50 -p -m | grep FREE
NONE 0 FREE 0 31
NONE 0 FREE 763363360 31
NONE 0 FREE 880803904 31
NONE 0 FREE 922746976 31
NONE 0 FREE 933232768 31
NONE 0 FREE 1008730272 391510367

2. Expand the soft partition

# metattach d46 24gb
d46: Soft Partition has been grown

3. Expand the filesystem

# growfs -M /local /dev/md/rdsk/d46
Warning: 2560 sector(s) in last cylinder unallocated
/dev/md/rdsk/d46: 178257920 sectors in 23211 cylinders of 15 tracks, 512 sectors
87040.0MB in 1658 cyl groups (14 c/g, 52.50MB/g, 6400 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 108064, 216096, 324128, 432160, 540192, 648224, 756256, 864288, 972320,
Initializing cylinder groups:
................................
super-block backups for last 10 cylinder groups at:
177192992, 177301024, 177409056, 177517088, 177625120, 177733152, 177841184,
177949216, 178057248, 178165280,

4. Verify filesystem status

# df -k /local
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d46 87775990 58226660 28919410 67% /local

NFS Mount Point Permission Issue (ls: ..: Permission denied)

This is to fix the moint point permission found by nfs user.

# su - nfsuser
$ cd /var
$ ls -la
ls: ..: Permission denied

For Example:

1. Boot to OK prompt
# init 0

2. Boot to maintenance mode
OK> boot –s

3. Make sure /var is not mounted
# mount grep var

4. If mounted:

Solaris 8 has an option:
# umount -f

which will forcibly unmount any partition from it's mount point. This should only be used in extreme circumstances, since anyone accessing a file from that partition will now get an error (EOI)

In other releases of Solaris:
If you get the following error umount: /var busy you can use the command >fuser -f /var.

This will return the following:
# fuser -f /var
/var: 2885c 2857c

Next perform the following:
# ps -ef grep 2857
root 2890 2857 0 13:40:30 pts/2 0:00 -sh
root 2857 2855 0 13:30:10 pts/2 0:00 -sh

if you kill the following shell with a kill or kill -9 you should be able to unmount the partition

5. Set permissions to 755 for /var mount point
# chmod 755 /var

6. Exit to multiuser mode
# exit

7. Verify that permission has been corrected:
# su – nfsuser
$ cd /var
$ ls –la

8. User should have no permission issue to ls now.

Saturday, May 3, 2008

Managing swap in the Solaris OS

Amit Dixit, October 2006

Installation of the Solaris OS creates /swap space and allocates 512 Mbyte by default. The Solaris OS supports applying swap to raw disk partitions and to file systems, and it also uses physical RAM as a swap area. Usually physical memory is more efficient, but we are always restricted with the amount of physical memory installed on the system.

It's always a good idea to apply swap to a raw partition, as compared to a file system, because a raw partition doesn't involve the overhead of the file system.

(Note: I've written this for Solaris versions 7, 8, 9, and 10. That said, I am pretty sure this is applicable to all the versions.)

Adding Raw Partition swap Space

To add a raw swap partition you need to perform the following steps on your system:

1. Identify a free disk partition on your system.

2. Add an entry to /etc/vfstab for the new raw partition as a swap partition:
/dev/dsk/c0t1d0s0 - - swap - no -

3. To enable this swap partition, issue the following command:
#swap -a /dev/desk/c0t1d0s0

4. To view the current swap details, use the following command:#swap -l

Adding File System swap

The Solaris OS supports applying swap to a file. To enable a file system swap you need to perform the following tasks:

1. Create a file using mkfile:
#mkfile 250m /opt/myswapfile

This will create a 250 Meg file, which the Solaris OS can use for swap.

2. To use this swap file, enable it with the following command:
#swap -a /opt/myswapfile

3. Check your change:
#swap -l

Note: To enable the new swap file at the next system boot, add the following entry to /etc/vfstab:
/opt/swapfile - - swap - no -

Disabling swap Space

The Solaris OS provides the ability to disable a swap file while the system is running. This is done with the -d option for swap. All allocated blocks are copied to other swap areas.

solaris# swap -d /opt/myswapfile

To check your change, type this:

solaris# swap -l

Monitoring swap

It's always important to configure the right amount of swap space: Too little will result in poor performance and too much will waste disk space.

The Solaris OS starts using swap if it's running out of physical memory. This is called paging.

Here's how to get a summary of swap space:

solaris#swap -s
total: 3500744k bytes allocated + 3048720k reserved = 6549464k used,23869824k available

And here's how to get details on the individual device or file that constitutes swap space:

solaris#swap -l
swapfile dev swaplo blocks free
/dev/md/dsk/d1 85,1 16 41945456 41945456

If your system is running out of swap space you will see the following errors:

Not Enough Space

or

WARNING /tmp: File system full, swap space limit exceeded

To see if the system is running short of physical memory you can use vmstat and iostat.

solaris#vmstatkthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m3 m4 in sy cs us sy id
0 0 0 24137360 6421168 70 179 21 14 14 0 0 0 0 0 0 472 3363 1776 4 2 94
0 0 0 23869912 5953040 11 13 0 0 0 0 0 0 0 0 0 430 1071 1545 7 1 92
0 0 0 23870896 5953904 58 313 0 2 2 0 0 0 0 0 0 578 2369 1798 20 1 78
0 0 0 23874712 5957216 11 11 0 0 0 0 0 0 0 0 0 417 1325 1648 0 0 100
0 0 0 23874744 5957248 22 64 0 3 3 0 0 0 0 0 0 423 1578 1629 1 2 97

Watch the column sr (Scan Rate) in the vmstat output.

solaris#iostat -Pxn

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.1 2.7 1.1 5.6 0.0 0.1 0.2 25.5 0 2 c1t0d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.7 0 0 c1t0d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.2 0 0 c1t0d0s2

Watch the r/s and w/s columns in the iostat output for the device, which is configured as the swap device. If the values are high this means that a large amount of I/O is generated to free up pages.

If physical memory is too low, the system will be busy paging to the swap device with a heavy I/O on the swap device. In this state the system's CPU utilization will also increase.

Summary

For improved system performance it's important that you have allocated sufficient swap space to the system. To start with, configure 1.5 times the physical memory installed on the system. If required, allocate more swap space.

Taken from: http://www.sun.com/bigadmin/content/submitted/manage_swap.html

Understanding and setting up Solstice DiskSuite in Solaris

About Solstice DiskSuite:

SolsticeTM DiskSuiteTM 4.2.1 is a software product that manages data and disk drives.
Solstice DiskSuite 4.2.1 runs on all SPARCTM systems running SolarisTM 8, and on all x86 systems running Solaris 8.

DiskSuite's diskset feature is supported only on the SPARC platform edition of Solaris. This feature is not supported on x86 systems.

1. Advantages of Disksuite
Solstice disk suite provides three major functionalities :
1. Over come the disk size limitation by providing for joining of multiple disk slices to form a bigger volume.
2. Fault Tolerance by allowing mirroring of data from one disk to another and keeping parity information in RAID5.
3. Performance enhancement by allowing spreading the data space over multiple disks.

2. Disksuite terms
Metadevice :A virtual device composed of several physical devices - slices/disks . All the operations are carried out using metadevice name and transparently implemented on the individual device.

RAID : A group of disks used for creating a virtual volume is called array and depending on disk/slice arrangement these are called various types of RAID (Redundant Array of Independent Disk ).
RAID 0 Concatenation/Striping
RAID 1 Mirroring
RAID 5 Striped array with rotating parity.

Concatenation :Concatenation is joining of two or more disk slices to add up the disk space . Concatenation is serial in nature i.e. sequential data operation are performed serially on first disk then second disk and so on . Due to serial nature new slices can be added up without having to take the backup of entire concatenated volume ,adding slice and restoring backup .

Striping :Spreading of data over multiple disk drives mainly to enhance the performance by distributing data in alternating chunks - 16 k interleave across the stripes . Sequential data operations are performed in parallel on all the stripes by reading/writing 16k data blocks alternatively form the disk stripes.
Mirroring : Mirroring provides data redundancy by simultaneously writing data on to two sub mirrors of a mirrored device . A submirror can be a stripe or concatenated volume and a mirror can have three mirrors . Main concern here is that a mirror needs as much as the volume to be mirrored.

RAID 5 : RAID 5 provides data redundancy and advantage of striping and uses less space than mirroring . A RAID 5 is made up of at least three disk which are striped with parity information written alternately on all the disks . In case of a single disk failure the data can be rebuild using the parity information from the remaining disks .

3. Disksuite Packages :

Solstice disk suite is a part of server edition of the Solaris OS and is not included with desktop edition . The software is in pkgadd format & can be found in following locations in CD :
Solaris 2.6 - “Solaris Server Intranet Extensions 1.0” CD.
Solaris 7 - “Solaris Easy Access Server 3.0”
Solaris 8 - “Solaris 8 Software 2 of 2”

Solaris 2.6 & 2.7 Solstice Disk suite version is 4.2 . Following packages are part of it but only the "SUNWmd" is the minimum required package and a patch.
SUNWmd - Solstice DiskSuite
SUNWmdg - Solstice DiskSuite Tool
SUNWmdn - Solstice DiskSuite Log Daemon
Patch No. 106627-04 (obtain latest revision)

Solaris 8 DiskSuite version is 4.2.1 .Following are the minimum required packages ..
SUNWmdr Solstice DiskSuite Drivers (root)
SUNWmdu Solstice DiskSuite Commands
SUNWmdx Solstice DiskSuite Drivers (64-bit)

4. Installing DiskSuite 4.2.1 in Solaris 8

# cd /cdrom/sol_8_401_sparc_2/Solaris_8/EA/products/DiskSuite_4.2.1/sparc/Packages

# pkgadd -d .
The following packages are available:
1 SUNWmdg Solstice DiskSuite Tool
(sparc) 4.2.1,REV=1999.11.04.18.29
2 SUNWmdja Solstice DiskSuite Japanese localization
(sparc) 4.2.1,REV=1999.12.09.15.37
3 SUNWmdnr Solstice DiskSuite Log Daemon Configuration Files
(sparc) 4.2.1,REV=1999.11.04.18.29
4 SUNWmdnu Solstice DiskSuite Log Daemon
(sparc) 4.2.1,REV=1999.11.04.18.29
5 SUNWmdr Solstice DiskSuite Drivers
(sparc) 4.2.1,REV=1999.12.03.10.00
6 SUNWmdu Solstice DiskSuite Commands
(sparc) 4.2.1,REV=1999.11.04.18.29
7 SUNWmdx Solstice DiskSuite Drivers(64-bit)
(sparc) 4.2.1,REV=1999.11.04.18.29
Select 1,3,4,5,6,7 packages .

Enter ‘yes’ for the questions asked during installation and reboot the system after installation .

Put /usr/opt/SUNWmd/bin in root PATH as the DISKSUITE commands are located in this directory

5. Creating State Database :

State meta database , metadb , keeps information of the metadevices and is needed for Disksuite operation . Disksuite can not function without metadb so a copy of replica databases is placed on different disks to ensure that a copy is available in case of a complete disk failure .

Metadb needs a dedicated disk slice so create partitions of about 5 Meg. on the disks for metadb. If there is no space available for metadb then it can be taken from swap. Having metadb on two disks can create problems as DISKSUITE looks for database replica number > 50% of total replicas and if one of the two disks crashes the replica falls at 50%. On next reboot system will go to single user mode and one has to recreate additional replicas to correct the metadb errors.

The following command creates three replicas of metadb on three disk slices.

#metadb -a -f -c 3 /dev/dsk/c0t1d0s6 /dev/dsk/c0t2d0s6 /dev/dsk/c0t3d0s6

6. Creating MetaDevices :
Metadevices can be created in two ways
1. Directly from the command line
2. Editing the /etc/opt/SUNWmd/ file as per example given in the md.tab and initializing devices on command line using metainit .

6.1 ) Creating a concatenated Metadevice :
#metainit d0 3 1 /dev/dsk/c0t0d0s4 1 /dev/dsk/c0t0d0s4 1 /dev/dsk/c0t0d0s4

d0 - metadevice name
3 - Total Number of Slices
1 - Number of Slices to be added followed by slice name.

6.2 ) Creating a stripe of 32k interleave
# metainit d10 1 2 c0t1d0s2 c0t2d0s2 -i 32k

d0 - metadevice name
1 - Total Number of Stripe
2- Number of Slices to be added to stripe followed by slice name .
-i chunks of data written alternatively on stripes.

6.3 ) Creating a Mirror :
A mirror is a metadevice composed of one or more submirrors. A submirror is made of one or more striped or concatenated metadevices.
Mirroring data provides you with maximum data availability by maintaining multiple copies of your data. The system must contain at least three state database replicas before you can create mirrors. Any file system including root (/), swap, and /usr, or any application such as a database, can use a mirror.
6.3.1 ) Creating a simple mirror from new partitions

1.Create two stripes for two submirors as d21 & d22

# metainit d21 1 1 c0t0d0s2
d21: Concat/Stripe is setup
# metainit t d22 1 1 c1t0d0s2
d22: Concat/Stripe is setup

2. Create a mirror device (d20) using one of the submirror (d21)

# metainit d20 -m d21
d20: Mirror is setup

3. Attach the second submirror (D21) to the main mirror device (D20)

# metattach d20 d22
d50: Submirror d52 is attached.

4. Make file system on new metadevice

#newfs /dev/md/rdsk/d20
edit /etc/vfstab to mount the /dev/dsk/d20 on a mount point.

6.3.2.) Mirroring a Partitions with data which can be unmounted

# metainit f d1 1 1 c1t0d0s0
d1: Concat/Stripe is setup
# metainit d2 1 1 c2t0d0s0
d2: Concat/Stripe is setup
# metainit d0 -m d1
d0: Mirror is setup
# umount /local
(Edit the /etc/vfstab file so that the file system references the mirror)
#mount /local
#metattach d0 d2
d0: Submirror d2 is attached

6.3.3 ) Mirroring a Partitions with data which can not be unmounted - root and /usr
· /usr mirroring
# metainit -f d12 1 1 c0t3d0s6
d12: Concat/Stripe is setup
# metainit d22 1 1 c1t0d0s6
d22: Concat/Stripe is setup
# metainit d2 -m d12
d2: Mirror is setup
(Edit the /etc/vfstab file so that /usr references the mirror)
# reboot
...
...
# metattach d2 d22
d2: Submirror d22 is attached
· root mirroring
# metainit -f d11 1 1 c0t3d0s0
d11: Concat/Stripe is setup
# metainit d12 1 1 c1t3d0s0
d12: Concat/Stripe is setup
# metainit d10 -m d11
d10: Mirror is setup
# metaroot d10
# lockfs -fa
# reboot
…
…
# metattach d10 d12
d10: Submirror d12 is attached

6.3.4 ) Making Mirrored disk bootable
a.) # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

6.3.5 ) Creating alterbate name for Mirrored boot disk

a.) Find physical path name for the second boot disk
# ls -l /dev/rdsk/c1t3d0s0
lrwxrwxrwx 1 root root 55 Sep 12 11:19 /dev/rdsk/c1t3d0s0 ->../../devices/sbus@1,f8000000/esp@1,200000/sd@3,0:a

b.) Create an alias for booting from disk2
ok> nvalias bootdisk2 /sbus@1,f8000000/esp@1,200000/sd@3,0:a
ok> boot bootdisk2

6.4 ) Creating a RAID 5 volume :

The system must contain at least three state database replicas before you can create RAID5 metadevices.

A RAID5 metadevice can only handle a single slice failure.A RAID5 metadevice can be grown by concatenating additional slices to the metadevice. The new slices do not store parity information, however they are parity protected. The resulting RAID5 metadevice continues to handle a single slice failure. Create a RAID5 metadevice from a slice that contains an existing file system.will erase the data during the RAID5 initialization process .The interlace value is key to RAID5 performance. It is configurable at the time the metadevice is created; thereafter, the value cannot be modified. The default interlace value is 16 Kbytes which is reasonable for most of the applications.

6.4.1.) To setup raid5 on three slices of different disks .

# metainit d45 -r c2t3d0s2 c3t0d0s2 c4t0d0s2
d45: RAID is setup

6.5.) Creating a Trans Meta Device :

Trans meta devices enables ufs logging . There is one logging device and a master device and all file system changes are written into logging device and posted on to master device. This greatly reduces the fsck time for very large file systems as fsck has to check only the logging device which is usually of 64 M. maximum size.Logging device preferably should be mirrored and located on a different drive and controller than the master device .

Ufs logging can not be done for root partition.

6.5.1) Trans Metadevice for a File System That Can Be Unmounted
· /home2
1. Setup metadevice

# umount /home2
# metainit d63 -t c0t2d0s2 c2t2d0s1
d63: Trans is setup
Logging becomes effective for the file system when it is remounted

2. Change vfstab entry & reboot

from
/dev/md/dsk/d2 /dev/md/rdsk/d2 /home2 ufs 2 yes -
to
/dev/md/dsk/d63 /dev/md/rdsk/d63 /home2 ufs 2 yes -
# mount /home2

Next reboot displays the following message for logging device
# reboot
...
/dev/md/rdsk/d63: is logging

6.5.2 ) Trans Metadevice for a File System That Cannot Be Unmounted
· /usr
1.) Setup metadevice
# metainit -f d20 -t c0t3d0s6 c1t2d0s1
d20: Trans is setup

2.) Change vfstab entry & reboot:
from
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no -
to
/dev/md/dsk/d20 /dev/md/rdsk/d20 /usr ufs 1 no -
# reboot

6.5.3 ) TransMeta device using Mirrors

1.) Setup metadevice

#umount /home2
#metainit d64 -t d30 d12
d64 trans is setup

2.) Change vfstab entry & reboot:
from
/dev/md/dsk/d30 /dev/md/rdsk/d30 /home2 ufs 2 yes
to
/dev/md/dsk/d64 /dev/md/rdsk/d64 /home2 ufs 2 yes

6.6 ) HotSpare Pool

A hot spare pool is a collection of slices reserved by DiskSuite to be automatically substituted in case of a slice failure in either a submirror or RAID5 metadevice . A hot spare cannot be a metadevice and it can be associated with multiple submirrors or RAID5 metadevices. However, a submirror or RAID5 metadevice can only be asociated with one hot spare pool. .Replacement is based on a first fit for the failed slice and they need to be replaced with repaired or new slices. Hot spare pools may be allocated, deallocated, or reassigned at any time unless a slice in the hot spare pool is being used to replace damaged slice of its associated metadevice.

6.6.1) Associating a Hot Spare Pool with Submirrors

# metaparam -h hsp100 d10
# metaparam -h hsp100 d11
# metastat d0
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d11
State: Okay
...
d10: Submirror of d0
State: Okay
Hot spare pool: hsp100
...
d11: Submirror of d0
State: Okay
Hot spare pool: hsp100

6.6.2 ) Associating or changing a Hot Spare Pool with a RAID5 Metadevice

#metaparam -h hsp001 d10
#metastat d10
d10:RAID
State: Okay
Hot spare Pool: hsp001

6.6.3 ) Adding a Hot Spare Slice to All Hot Spare Pools

# metahs -a -all /dev/dsk/c3t0d0s2
hsp001: Hotspare is added
hsp002: Hotspare is added
hsp003: Hotspare is added

6.7 ) Disksets

Few important points about disksets :
A diskset is a set of shared disk drives containing DiskSuite objects that can be shared exclusively (but not concurrently) by one or two hosts. Disksets are used in high availability failover situations where the ownership of the failed machine’s diskset is transferred to other machine . Disksets are connected to two hosts for sharing and must have same attributes , controller/target/drive , in both machines except for the ownership .
DiskSuite must be installed on each host that will be connected to the diskset.There is one metadevice state database per shared diskset and one on the "local" diskset. Each host must have its local metadevice state database set up before you can create disksets. Each host in a diskset must have a local diskset besides a shared diskset.A diskset can be created seprately on one host & then added to the second host later.
Drive should not be in use by a file system, database, or any other application for adding in diskset .
When a drive is added to disksuite it is repartitioned so that the metadevice state database replica for the diskset can be placed on the drive. Drives are repartitioned when they are added to a diskset only if Slice 7 is not set up correctly. A small portion of each drive is reserved in Slice 7 for use by DiskSuite. The remainder of the space on each drive is placed into Slice 0.. After adding a drive to a diskset, it may be repartitioned as necessary, provided that no changes are made to Slice 7 . If Slice 7 starts at cylinder 0, and is large enough to contain a state database replica, the disk is not repartitioned.
When drives are added to a diskset, DiskSuite re-balances the state database replicas across the remaining drives. Later, if necessary, you can change the replica layout with the metadb(1M) command.
To create a diskset, root must be a member of Group 14, or the ./rhosts file must contain an entry for each host.

6.7.1 ) Creating Two Disksets

host1# metaset -s diskset0 -a -h host1 host2
host1# metaset -s diskset1 -a -h host1 host2
host1# metaset
Set name = diskset0, Set number = 1
Host Owner
host1
host2
Set name = diskset1, Set number = 2
Host Owner
host1
host2

6.7.2 ) Adding Drives to a Diskset

host1# metaset -s diskset0 -a c1t2d0 c1t3d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0

host1# metaset
Set name = diskset0, Set number = 1
Host Owner
host1 Yes
host2

Drive Dbase
c1t2d0 Yes
c1t3d0 Yes
c2t2d0 Yes
c2t3d0 Yes
c2t4d0 Yes
c2t5d0 Yes

Set name = diskset1, Set number = 2
Host Owner
host1
host2

6.7.3 ) Creating a Mirror in a Diskset

# metainit -s diskset0 d51 1 1 /dev/dsk/c0t0d0s2
diskset0/d51: Concat/Stripe is setup

# metainit -s diskset0 d52 1 1 /dev/dsk/c1t0d0s2
diskset0/d52: Concat/Stripe is setup

# metainit -s diskset0 d50 -m d51
diskset0/d50: mirror is setup

# metattach -s diskset0 d50 d52
diskset0/d50: Submirror d52 is attached

7.0 Trouble Shooting

7.1 ) Recovering from Stale State Database Replicas

Problem : State database corrupted or unavailable .
Causes : Disk failure , Disk I/O error.
Symptoms : Error message at the booting time if databases are <= 50% of total database. System comes to Single user mode.
ok boot...Hostname: host1metainit: Host1: stale databasesInsufficient metadevice database replicas located.Use metadb to delete databases which are broken.Ignore any "Read-only file system" error messages.Reboot the system when finished to reload the metadevicedatabase.After reboot, repair any broken database replicas which weredeleted.Type Ctrl-d to proceed with normal startup,(or give root password for system maintenance): Entering System Maintenance Mode.

1.) Use the metadb command to look at the metadevice state database and see which state database replicas are not available. Marked by unknown and M flag.
# /usr/opt/SUNWmd/metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3 M p unknown unknown /dev/dsk/c1t2d0s3 M p unknown unknown

2.) Delete the state database replicas on the bad disk using the -d option to the metadb(1M) command.
At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages:

# /usr/opt/SUNWmd/metadb -d -f c1t2d0s3metadb: demo: /etc/opt/SUNWmd/mddb.cf.new: Read-only file system .

Verify deletion
# /usr/opt/SUNWmd/metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3

3.) Reboot.

4.) Use the metadb command to add back the state database replicas and to see that the state database replicas are correct.# /usr/opt/SUNWmd/metadb -a -c 2 c1t2d0s3# /usr/opt/SUNWmd/metadb flags first blk block count a m p luo 16 1034 dev/dsk/c0t3d0s3 a p luo 1050 1034 dev/dsk/c0t3d0s3 a u 16 1034 dev/dsk/c1t2d0s3 a u 1050 1034 dev/dsk/c1t2d0s3
7.2 ) Metadevice Errors :

Problem : Sub Mirrors out of sync in "Needs maintainence" state ,
Causes : Disk problem / failure , improper shutdown , communication problems between two mirrored disks .
symptoms : "Needs maintainence" errors in metastat output
# /usr/opt/SUNWmd/metastatd0: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Okay...d10: Submirror of d0 State: Needs maintenance Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 " Size: 47628 blocks Stripe 0:Device Start Block Dbase State Hot Spare/dev/dsk/c0t3d0s0 0 No Maintenance d20: Submirror of d0 State: Okay Size: 47628 blocks Stripe 0:Device Start Block Dbase State Hot Spare/dev/dsk/c0t2d0s0 0 No Okay

Solution :

1.) If disk is all right - enable the failed metadevice with metareplace command .
If disk is failed - Replace disk create similar partitions as in failed disk and enable new device with metareplace command.
# /usr/opt/SUNWmd/metareplace -e d0 c0t3d0s0 Device /dev/dsk/c0t3d0s0 is enabled
2.) If disk has failed and you want to move the failed devices to new disk with different id (CnTnDn) - add new disk ,
format to create a similar partition scheme as in failed disk and use metarepalce command
# /usr/opt/SUNWmd/metareplace d0 c0t3d0s0

The metareplace command above can also be used for concate or strip replacement in a volume but that would involve restoring the backup if it is not mirrored.

Taken from: http://www.adminschoice.com/docs/solstice_disksuite.htm

Monday, April 28, 2008

Manage Linux log files with Logrotate

Log files are the most valuable tools available for Linux system security. The logrotate program is used to provide the administrator with an up-to-date record of events taking place on the system. The logrotate utility may also be used to back up log files, so copies may be used to establish patterns for system use. In this Daily Drill Down, I’ll cover the following topics:

The logrotate configuration
Setting defaults for logrotate
Using the include option to read other configuration files
Setting rotation parameters for specific files
Using the include option to override defaults

The logrotate program
The logrotate program is a log file manager. It is used to regularly cycle (or rotate) log files by removing the oldest ones from your system and creating new log files. It may be used to rotate based on the age of the file or the file’s size, and usually runs automatically through the cron utility. The logrotate program may also be used to compress log files and to configure e-mail to users when they are rotated.

The logrotate configuration
The logrotate program is configured by entering options in the /etc/logrotate.conf file. This is a text file, which may contain any of the configuration options listed in the table below. The options entered in /etc/logrotate.conf may be used to set configuration parameters for any log file on the system. These options may also be used to allow logrotate to read configuration parameters from other log files, by using the include parameter.

**Option**


compress	This is used to compress the rotated log file with gzip.
nocompress	This is used when you do not want to compress rotated log files.
copytruncate	This is used when processes are still writing information to open log files. This option copies the active log file to a backup and truncates the active log file.
nocopytruncate	This copies the log files to backup, but the open log file is not truncated.
create mode owner group	This rotates the log file and creates a new log file with the specified permissions, owner, and group. The default is to use the same mode, owner, and group as the original file.
nocreate	This prevents the creation of a new log file.
delaycompress	When used with the compress option, the rotated log file is not compressed until the next time it is cycled.
nodelaycompress	This overrides delaycompress. The log file is compressed when it is cycled.
errors address	This mails logrotate errors to an address.
ifempty	With this, the log file is rotated even if it is empty. This is the default for logrotate.
notifempty	This does not rotate the log file if it is empty.
mail address	This mails log files that are cycled to an address. When mail log files are cycled, they are effectively removed from the system.
nomail	When mail log files are cycled, a copy is not mailed.
olddir directory	With this, cycled log files are kept in the specified directory. This directory must be on the same filesystem as the current log files.
noolddir	Cycled log files are kept in the same directory as the current log files.
prerotate/endscript	These are statements that enclose commands to be executed prior to a log file being rotated. The prerotate and endscript keywords must appear on a line by themselves.
postrotate/endscript	These are statements that enclose commands to be executed after a log file has been rotated. The postrotate and endscript keywords must appear on a line by themselves.
daily	This is used to rotate log files daily.
weekly	This is used to rotate log files weekly.
monthly	This is used to rotate log files monthly.
rotate count	This specifies the number of times to rotate a file before it is deleted. A count of 0 (zero) means no copies are retained. A count of 5 means five copies are retained.
tabootext [+] list	This directs logrotate to not rotate files with the specified extension. The default list of extensions is .rpm-orig, .rpmsave, v, and ~.
size size	With this, the log file is rotated when the specified size is reached. Size may be specified in bytes (default), kilobytes (sizek), or megabytes (sizem).

The /etc/logrotate.conf file
The /etc/logrotate.conf file is the default configuration file for logrotate. The default /etc/logrotate.conf file installed with Red Hat Linux is shown below:
# see "man logrotate" for details
# rotate log files weekly
weekly

# keep 4 weeks worth of backlogs
rotate 4

# send errors to root
errors root
# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
#compress
1
# RPM packages drop log rotation information into this directory
include /etc/logrotate.d

# no packages own lastlog or wtmp --we'll rotate them here
/var/log/wtmp {
monthly
create 0664 root utmp
rotate 1
}

/var/log/lastlog {
monthly
rotate 1
}

# system-specific logs may be configured here

Setting defaults for logrotate
Default configuration settings are normally placed close to the beginning of the logrotate.conf file. These settings are usually in effect system-wide. The default settings for logrotate on this system are established in the first 12 lines of the file.

The third line
weekly

specifies that all log files will be rotated weekly.

The fifth line
rotate 4

specifies that four copies of old log files are retained before the files are cycled. Cycling refers to removing the oldest log files and replacing them with new copies.

The seventh line
errors root

sends all logrotate error messages to root.

The ninth line
create

configures logrotate to automatically create new log files. The new log files will have the same permissions, owner, and group as the file being rotated.

The eleventh line
#compress

prevents logrotate from compressing log files when they are rotated. Compression is enabled by removing the comment (#) from this line.

Using the include option
The include option allows the administrator to take log file rotation information, which may be installed in several files, and use it in the main configuration file. When logrotate finds the include option on a line in logrotate.conf, the information in the file specified is read as if it appeared in /etc/logrotate.conf.

Line 13 in /etc/logrotate.conf
include /etc/logrotate.d

tells logrotate to be read in the log rotation parameters, which are stored in the files contained in the /etc/logrotate.d directory. The include option is very useful when RPM packages are installed on a system. RPM packages’ log rotation parameters will typically install in the /etc/logrotate.d directory.

The include option is important. Some of the applications that install their log rotation parameters to /etc/logrotate.d by default are apache, linuxconf, samba, cron, and syslog. The include option allows the parameters from each of these files to be read into logrotate.conf.

Using the include option in /etc/logrotate.conf allows the administrator to configure a rotation policy for these packages through a single configuration file.

Using include to override defaults
When a file is read by /etc/logrotate.conf, the rotation parameters specified in the include will override the parameters specified in the logrotate file. An example of /etc/logrotate.conf being overridden is shown below:
#Log rotation parameters for linuxconf
/var/log/htmlaccess.log
{ errors jim
notifempty
nocompress
weekly
prerotate
/usr/bin/chattr -a /var/log/htmlaccess.log
endscript
postrotate
/usr/bin/chattr +a /var/log/htmlaccess.log
endscript
}
/var/log/netconf.log
{ nocompress
monthly
}

In this example, when the /etc/logrotate.d/linuxconf file is read by /etc/logrotate.conf, the following options will override the defaults specified in /etc/logrotate.conf:
Notifempty
errors jim

The nocompress and weekly options do not override any options contained in /etc/logrotate.conf.

Setting parameters for a specific file
Configuration parameters for a specific file are often required. A common example would be to include a section in the /etc/logrotate.conf file to rotate the /var/log/wtmp file once per month and keep only one copy of the log. When configuration is required for a specific file, the following format is used:
#comments
/full/path/to/file
{
option(s)
}

The following entry would cause the /var/log/wtmp file to be rotated once a month, with one backup copy retained:
#Use logrotate to rotate wtmp
/var/log/wtmp
{
monthly
rotate 1
}
Although the opening bracket may appear on a line with other text or commands, the closing bracket must be on a line by itself.
Using the prerotate and postrotate options
The section of code below shows a typical script in /etc/logrotate.d/syslog. This section applies only to /var/log/messages. On a production server, /etc/logrotate.d/syslog would probably contain similar entries.
/var/log/messages
{
prerotate
/usr/bin/chattr -a /var/log/messages
endscript
postrotate
/usr/bin/kill -HUP syslogd
/usr/bin/chattr +a /var/log/messages
endscript
}

The format for this script uses the following methods:

The first line, /var/logmessages, declares the file for which this script will be used.
The curly braces,{ }, are used to enclose the entire script. All commands contained within these braces will be run on the /var/log/messages file.
The prerotate command specifies actions to be taken prior to the file being rotated by logrotate.
The command /usr/bin/chattr -a is run to remove the append-only attribute from /var/log/messages.
The endscript command marks the end of the prerotate portion of this script.
The next line, postrotate, specifies the following commands are to be run on /var/log/messages after the file has been rotated by logrotate.
The command /usr/bin/killall -HUPsyslogd is run to reinitiate the system logging daemon, syslogd.
The next command, /usr/bin/chattr +a /var/log/messages, reassigns the append-only attribute to the /var/log/messages file. This means the file may only be seen in append mode. This prevents the file from being overridden by any other program or user.
The endscript command appears on a line by itself and marks the end of the postrotate portion of this script.
The last curly brace,}, marks the end of commands to be applied to the /var/log/messages file.

Running logrotate
There are three steps involved in running logrotate:

Identify the log files on your system.
Create rotation schedules and parameters for the log files.
Run logrotate through the cron daemon.

The code below shows the default cronjob shipped with Red Hat Linux to allow logrotate to run daily:
#/etc/cron.daily/logrotate
#! /bin/sh

/usr/sbin/logrotate /etc/logrotate.conf

This cronjob allows logrotate to run daily with the rotation parameter specified in /etc/logrotate.conf.

Conclusion
Log rotation is the first step in log file management. The logrotate utility provides the Linux administrator with the ability to maintain a log file rotation policy and to retain copies of log files to assist in establishing patterns related to system usage. In this Daily Drill Down, we looked at the installation and configuration of logrotate, used the include option to read configuration files related to RPM packages, and ran logrotate as a cronjob. We also discussed the proper methods for restarting logrotate after the log rotation procedure is completed.
The authors and editors have taken care in preparation of the content contained herein but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Taken from: http://articles.techrepublic.com.com/5100-6345-1052474.html?tag=rbxccnbtr1

Solaris Soft Partitioning

Solstice DiskSuite / Solaris Volume Manager

Soft Partitioning

A Primer for Understanding Soft Partitioning,
a new feature in Solstice DiskSuite (Solaris Volume Manager)

The intent of this document is to describe Soft Partitioning within Solstice DiskSuite (soon-to-be-renamed Solaris Volume Manager), and offer a short primer/tutorial on how to create, use, and delete them.

Until now, Solaris, without any volume management software, has only ever allowed a fixed number of partitions on a physical disk (seven (7) on SPARC platforms). With the increase in capacity of disks, this limitation has become a severe restriction.

SDS/SVM uses these slices for its metadevices (sub-mirrors, trans, stripes, and RAID5) and hence is faced with the same limitation, whereas Veritas Volume Manager (VxVM) allows for the logical partitioning of disks into a virtually unlimited number of subdisks.

Soft Partitioning allows for a disk to be subdivided into many partitions which are controlled and maintained by software, thereby removing the limitation of the number of partitions on a disk. A soft partition is made up of one or more "extents". An extent describes the parts of the physical disk that make up the soft partition. While the maximum number of extents per soft partition is 2147483647, the majority of soft partitions will use only one (1) extent.

What is new?

Soft Partitioning was not in the original Solstice DiskSuite 4.2.1 Release, which coincided with the release of Solaris 8. However, the soft partitioning functionality was released in patch 108693-06 for SDS 4.2.1.

When Solaris 9 gets released, the "Solstice DiskSuite" name will change to "Solaris Volume Manager" ("SVM") and it will be bundled in with Solaris 9. Soft Partitioning will, of course, be part of the base functionality of that release.

Soft Partitions are implemented by new kernel driver: md_sp.

   # modinfo | grep md_sp
228 78328000 4743 - 1 md_sp (Meta disk soft partition module)

There are new options to the metainit command:

   metainit softpart -p [-e] component size
metainit softpart -p component -o offset -b size

The metattach command has been modified to allow for growing of soft partitions:

   metattach softpart size

There is a new command... metarecover:

   metarecover [-n] [-v] component -p [-d|-m]

NOTE: the -p option means that the command refers to soft partitions.

Creating Soft Partitions

There are three methods to create a soft partition using the metainit command:

Specifying an unused disk and size (with the -e option). For example:

   # metainit d0 -p -e c1t0d0 200m

The -e option requires that the name of the disk supplied be in the form c#t#d#.

The last parameter (200m) specifies the initial size of the soft partition. The sizes can be specified in blocks, kilobytes, megabytes, gigabytes, and terabytes.

The -e option causes the disk to be repartitioned such that slice 7 has enough space to hold a replica (although no replica is actually created on this disk) and slice 0 contains the rest of the space. Slice 2 is removed from the disk. The soft partition that is being created is put into slice 0. Further soft partitions can be created on slice 0 by the next method of creating a soft partition.

After this command is run, the layout of the disk would like similar to this example:

   Part      Tag   Flag   Cylinders     Size           Blocks
0 unassigned   wm     5 - 2035   999.63MB   (2031/0/0) 2047248
1 unassigned   wm     0            0        (0/0/0)          0
2 unassigned   wm     0            0        (0/0/0)          0
3 unassigned   wm     0            0        (0/0/0)          0
4 unassigned   wm     0            0        (0/0/0)          0
5 unassigned   wm     0            0        (0/0/0)          0
6 unassigned   wm     0            0        (0/0/0)          0
7 unassigned   wu     0 -   4      2.46MB   (5/0/0)       5040

This command (with the -e) can only be run on an empty disk (one that is not used in any other metadevice). If another metadevice or replica already exists on this disk, one of the following messages will be printed, and no soft partition will be created.

   metainit: hostname: c#t#d#s0: has appeared more than once in the specification of d#

   metainit: hostname: c#t#d#s#: has a metadevice database replica

Specifying an existing slice name and size (without the -e option). This will be the most common method of creation. For example:
# metainit d1 -p c1t0d0s0 1g
This will create a soft partition on the specified slice. No repartitioning of the disk is done. Provided there is space on the slice, additional soft partitions could be created as required. The device name must include the slice number (c#t#d#s#).
If another soft partition already exists in this slice, this one will be created immediately after the existing one. Therefore, no overlap of soft partitions can occur by accident.
Specifying an existing slice and absolute offset and size values. For example:
# metainit d2 -p c1t0d0s0 -o 2048 -b 1024
The -o parameter signifies the offset into the slice, and the -b parameter is the size for the soft partition. All numbers are in blocks (a block is 512 bytes). The metainit command ensures that extents and soft partitions do not overlap. For example, the following is an attempt to create overlapping soft partitions.
# metainit d1 -p c1t0d0s0 -o 1 -b 2024
d1: Soft Partition is setup
# metainit d2 -p c1t0d0s0 -o 2000 -b 2024
metainit: hostname: d2: overlapping extents specified
An offset of 0 is not valid, as the first block on a slice containing a soft partition contains the initial extent header. Each extent header consumes 1 block of disk and each soft partition will have an extent header placed at the end of each extent. Extent headers are explained in more detail in the next section.
NOTE: This method is not documented in the man page for metainit and is not recommended for manual use. It is here because a subsequent metastat -p command will output information in this format.

Extent Headers

Whenever a soft partiton is created in a disk slice, an "extent header" is written to disk. Internally to Sun, these are sometimes referred to as "watermarks".

An extent header is a consistency record and contains such information as the metadevice (soft partition) name, it's status, it's size, and a checksum. Each extent header 1 block (512 bytes) in size.

The following diagram shows an example 100MB slice (c1t0d0s0) and the extent headers (watermarks) that have been created on it. The command to make the soft partition shown was

   # metainit d1 -p c1t0d0s0 20m

->>

There is always an extent header on the first and last blocks in the slice. Note that the 80MB of space left over from the creation of the soft partition can be used to make one or more additional soft partitions. Each additional soft partition will create an additional extent header to be created as well.

Mirroring Soft Partitions

Once you have created soft partitions, what can you do with them? Well, one thing to do is to create mirrors out of them. Unfortunately, even though a soft partition is a metadevice, they cannot serve directly as a submirror. For example:

   # metainit d10 -p c1t11d0s4 100m
d10: Soft Partition is setup
# metainit d20 -m d10
metainit: hostname: d10: invalid unit

Instead, you must first take the soft partition and create a simple concat/stripe out of it. For example:

   # metainit d10 -p c1t0d0s0 100m
d10: Soft Partition is setup
# metainit d20 1 1 d10
d20: Concat/Stripe is setup
# metainit d30 -m d20
d30: Mirror is setup

# metainit d11 -p c2t0d0s0 100m
d11: Soft Partition is setup
# metainit d21 1 1 d11
d21: Concat/Stripe is setup
# metattach d30 d21
d30: submirror d21 is attached

Once done, the resulting metastat output of the mirror will look like this:

   # metastat d30

d30: Mirror
  Submirror 0: d20
    State: Okay
  Submirror 1: d21
    State: Okay
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 204624 blocks

d20: Submirror of d30
  State: Okay
  Size: 204624 blocks
  Stripe 0:
      Device              Start Block  Dbase State        Hot Spare
      d10                        0     No    Okay

d10: Soft Partition
  Component: c1t0d0s0
  State: Okay
  Size: 204800 blocks
      Extent              Start Block              Block count
           0                        1                   204800

d21: Submirror of d30
  State: Okay
  Size: 204624 blocks
  Stripe 0:
      Device              Start Block  Dbase State        Hot Spare
      d11                        0     No    Okay

d11: Soft Partition
  Component: c2t0d0s0
  State: Okay
  Size: 204800 blocks
      Extent              Start Block              Block count
           0                        1                   204800

Combining Soft Partitions Together into a RAID5 Device

RAID5 devices can be made up of soft partitions directly. This example shows 4 soft partitions (from 4 separate slices) striped together to make a RAID5 device:

   # metainit d1 -p c1t0d0s0 10m
d1: Soft Partition is setup
# metainit d2 -p c2t0d0s0 10m
d2: Soft Partition is setup
# metainit d3 -p c3t0d0s0 10m
d3: Soft Partition is setup
# metainit d4 -p c4t0d0s0 10m
d4: Soft Partition is setup
# metainit d10 -r d1 d2 d3 d4
d10: RAID is setup

Once done, the resulting metastat output of the RAID5 device will look like this:

   # metastat d10

d10: RAID
  State: Okay
  Interlace: 32 blocks
  Size: 59472 blocks
Original device:
  Size: 60384 blocks
      Device              Start Block  Dbase State        Hot Spare
      d1                       330     No    Okay
      d2                       330     No    Okay
      d3                       330     No    Okay
      d4                       330     No    Okay

d1: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                        1                    20480

d2: Soft Partition
  Component: c1t0d4s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                        1                    20480

d3: Soft Partition
  Component: c1t1d1s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                        1                    20480

d4: Soft Partition
  Component: c1t1d3s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                        1                    20480

Using Soft Partitions for MetaTrans (UFS Logging) Devices

MetaTrans devices (UFS logging) can be built on top of soft partitions. Soft partitions can be used for the master device, the logging device, or both. In the following example, soft partitions are used for both the master and the logging device:

   # metainit d1 -p c1t0d0s0 500m
d1: Soft Partition is setup
# metainit d2 -p c2t0d0s0 50m
d2: Soft Partition is setup
# metainit d10 -t d1 d2
d1: Trans is setup

Once done, the resulting metastat output of the metatrans device will look like this:

   # metastat d10
d10: Trans
  State: Okay
  Size: 1024000 blocks
  Master Device: d1
  Logging Device: d2

d1: Soft Partition
  Component: c1t1d3s0
  State: Okay
  Size: 1024000 blocks
      Extent              Start Block              Block count
           0                        1                  1024000

d2: Logging device for d10
  State: Okay
  Size: 102142 blocks

d2: Soft Partition
  Component: c1t1d1s0
  State: Okay
  Size: 102400 blocks
      Extent              Start Block              Block count
           0                        1                   102400

Layering

Most of the time, soft partitions are made on a disk slice. However, there are certain situations where it can be beneficial to make a soft partition on top of an existing metadevice. This is referred to as layering.

For example, say you have a 90GB RAID5 device made up of 6 18GB disks. You can then take that 90GB device and "split it up" into many soft partitions. These many soft partitions then can be accessed as separate simple metadevices, although the data in them is protected by the RAID5 parity in the underlying device.

Soft partitions can be layered only on top of concat/stripes, mirrors, and RAID5 devices. Soft partitions cannot be layered on top of a metatrans device or directly on top of another soft partition.

Here is an example of layering soft partitions on top of an existing RAID5 metadevice. First, we create the RAID5 device, then soft partition that device into 3 100MB partitions (obviously, we could create more than just 3 soft partitions).

   # metainit d0 -r c1t0d2s0 c1t0d4s0 c1t1d1s0 c1t1d3s0
d0: RAID is setup

# metainit d1 -p d0 100m
d1: Soft Partition is setup
# metainit d2 -p d0 100m
d2: Soft Partition is setup
# metainit d3 -p d0 100m
d3: Soft Partition is setup

Each of the resulting soft partitions (d1, d2, and d3) can be accessed individually (i.e., newfs and mount).

Soft partitions can be built on top of an existing mirror device as well, just like we did above on the RAID5 device. In the following example, the mirror device (d0) is "carved up" into 3 smaller soft partitions.

   # metainit d10 1 1 c1t0d2s0
d10: Concat/Stripe is setup
# metainit d20 1 1 c2t0d0s0
d20: Concat/Stripe is setup
# metainit d0 -m d10 d20
d0: Mirror is setup

# metainit d1 -p d0 100m
d1: Soft Partition is setup
# metainit d2 -p d0 100m
d2: Soft Partition is setup
# metainit d3 -p d0 100m
d3: Soft Partition is setup

Soft partitions are not allowed to be parented by other soft partitions directly. For example:

   # metainit d1 -p c1t0d0s0 100m
d1: Soft Partition is setup
# metainit d2 -p d1 10m
metainit: hostname: d1: invalid unit

Soft partitions also cannot be built on top of trans (UFS logging) devices. For example:

   # metainit d1 -t d10 d20
d1: Trans is setup
# metainit d2 -p d1 100m
metainit: hostname: d1: invalid unit

Growing Soft Partitions

A soft partition can be grown by the use of the metattach command. There is no mechanism to shrink a soft partition.

   # metattach d0 10m
d0: Soft Partition has been grown

When additional space is added to an existing soft partition, the additional space is taken from any available space on the same device and might not be contiguous with the existing soft partition. Growing soft partitions must be done with free space in the same device as the current soft partition.

The following example shows how growing a soft partition will increase the size of the current extent:

   # metainit d1 -p c1t0d2s0 100m
d1: Soft Partition is setup
# metastat d1
d1: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 204800 blocks
      Extent              Start Block              Block count
           0                        1                   204800

# metattach d1 50m
d1: Soft Partition has been grown
# metastat d1
d1: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 307200 blocks
      Extent              Start Block              Block count
           0                        1                   307200

Note how after the metattach is run, there is still only one extent, but the (block count) has grown from 204800 (100MB) to 307200 (150MB).

In the following example, the extent cannot be grown, as it was above, because another soft partition is "in the way". Therefore, a second extent is created in the same slice.

   # metainit d1 -p c1t0d2s0 100m
d1: Soft Partition is setup
# metainit d2 -p c1t0d2s0 10m
d2: Soft Partition is setup
# metastat
d1: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 204800 blocks
      Extent              Start Block              Block count
           0                        1                   204800

d2: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                   204802                    20480

# metattach d1 50m
d1: Soft Partition has been grown
# metastat
d1: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 307200 blocks
      Extent              Start Block              Block count
           0                        1                   204800
           1                   225283                   102400

d2: Soft Partition
  Component: c1t0d2s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                   204802                    20480

Note how d1 now has two non-contiguous extents that together make up the 307200 (150MB) blocks.

NOTE: Growing the metadevice does not modify the data or the filesystem inside the metadevice. If the metadevice contains a filesystem, you must use the appropriate command(s) to grow that filesystem after the metadevice has been grown.

Deleting Soft Partitions

This is achieved by using the metaclear command in the normal way:

   # metaclear d0
d0: Soft Partition is cleared

If other metadevices are using the soft partition, the metaclear will error with:

   metaclear: hostname: d0: metadevice in use

Using Soft Partitions with Disksets

There are no differences with soft partitioning in a diskset, other than having to specify the -s option on the commandline to specify the diskset name.

The only potential problem occurs when dealing with did disk devices that are in a SunCluster configuration. Unfortunately, the naming convention of the did devices is similar to that of SDS/SVM in that the disks are referred to as d#. This means that SDS/SVM could confuse a did disk with a metadevice when creating a soft partition.

The simple workaround to this problem is to use the full path to the did device on the metainint commandline in order to prevent any confusion.

For example, the following command to create a 1GB soft partition on /dev/did/rdsk/d7s0 would be invalid:

   # metainit -s set2 d0 -p d7s0 1g

Instead, the correct command to run would be:

   # metainit -s set2 d0 -p /dev/did/rdsk/d7s0 1g

How to list the soft partitions in a given slice

The metarecover command, with the -n and -v options, will display information about the soft partitons existing in a given slice.

The metarecover command actually scans the given slice for extent headers and prints the information that it finds about those headers.

In each slice/device, there are also 2 additional extent headers; one which preceeds the free space in the slice, and the one on the last block of the slice. These are printed as well. This is an easy way to determine how much free space is available in a slice for additional soft partitions.

   # metarecover -v -n /dev/rdsk/c1t0d0s0 -p
Verifying on-disk structures on c1t0d0s0.
The following extent headers were found on c1t0d0s0.
Name  Seq#    Type          Offset          Length
 d0     0   ALLOC               0           20481
 d1     0   ALLOC           20481           40961
NONE     0     END        17674901               1
NONE     0    FREE           61442        17613459
Found 2 soft partition(s) on c1t0d0s0.

In the above example, there were 2 soft partitions (d0 and d1) found on c1t0d0s0, as well as 17613458 blocks (approx 8.4GB) of unallocated free space.

IMPORTANT NOTE: The information printed by this command is relative to the extent header, not the soft partition itself. Therefore, the 'offset' is the starting location of the extent header, not the extent itself. Also, the 'length' given is the length of the extent plus the header. Therefore, in the example above, there are only 17613458 free blocks, not 17613459 blocks.

Because soft partitions can be layered above metadevices like mirrors or RAID5 devices (see layering, above), this command can also be run on them to determine the locations and sizes of the extent headers. In the example below, d0 is a RAID5 metadevice which has 4 soft partitions in it. There is no free space left in this device.

   # metarecover -v -n d0 -p
Verifying on-disk structures on d0.
The following extent headers were found on d0.
              Name  Seq#    Type               Offset               Length
                d1     0   ALLOC                    0               204801
                d2     0   ALLOC               204801               204801
                d3     0   ALLOC               409602               204801
               d99     0   ALLOC               614403              7573580
              NONE     0     END              8187983                    1
Found 4 soft partition(s) on d0.

Fragmentation

Fragmentation of free space will occur on a slice when there has been activity in creating, deleting, and possibly growing soft partitions. At this time, there is no method to defragment a disk.

For example, the following sequence of commands can result in some amount of fragmentation. First, create 2 10MB soft partitions on a slice.

   # metainit d1 -p c1t0d0s0 10m
d1: Soft Partition is setup
# metainit d2 -p c1t0d0s0 10m
d2: Soft Partition is setup

Then, remove the first 10MB soft partition and then create a 20MB soft partition.

   # metaclear d1
d1: Soft Partition is cleared
# metainit d3 -p c1t0d0s0 20m
d3: Soft Partition is setup

When the d3 metadevice was created, the 10MB of free space at the beginning of the slice is not used, because there is a contiguous 20MB space available further out that can be used instead. Therefore, the 10MB of free space is skipped over in favor of the first 20MB of contiguous space. The metarecover command will show the fragmentation (multiple free spaces):

   # metarecover -v -n c1t0d0s0 -p
Verifying on-disk structures on c1t0d0s0.
The following extent headers were found on c1t0d0s0.
           Name  Seq#    Type               Offset               Length
             d2     0   ALLOC                20481                20481
             d3     0   ALLOC                40962                40961
           NONE     0     END              2047247                    1
           NONE     0    FREE                81923              1965324
           NONE     0    FREE                    0                20481
Found 2 soft partition(s) on c1t0d0s0.

Recovering Soft Partitions

The 'metarecover' command is run when something has gone wrong. It should not be run except to recover from a catastrophic problem. There are two main functions that this command does. It can

scan through the given slice and recreate the soft partitions that it finds there. this is good when moving a disk with soft partitions to a new machine. The option to use on the metarecover command is -d.
reads through the current replica and creates the soft partitions on the given slice. This is good to run after a disk fails and gets replaced with a new one. The option to use on the metarecover command is -m.

Recreating Information in the Replica from the Extent Headers

Here is a very simple example showing a disk which had soft partitions created on it (in slice 0) on another host, which is being moved to a new machine. We wish to extract the soft partitions on this new machine. Currently, there are no metadevices created.

   # metastat

This command scans the given slice (in this case, "c0t0d0s0") and, for each soft partition it finds in that slice, it puts an entry into the current replica. The data on the disk is not modified, and nothing on the slice specified is modified. All that happens is that the extent headers are read and information is written to the replica.

   # metarecover c0t0d0s0 -p -d
The following soft partitions were found and will be added to
your metadevice configuration.
Name            Size     No. of Extents
d1           61440         1
d2           20480         1
WARNING: You are about to add one or more soft partition
metadevices to your metadevice configuration.  If there
appears to be an error in the soft partition(s) displayed
above, do NOT proceed with this recovery operation.

Are you sure you want to do this (yes/no)? yes

c0t0d0s0: Soft Partitions recovered from device.

Now, we can see the soft partition metadevices have been created for us:

   # metastat
d1: Soft Partition
  Component: c0t0d0s0
  State: Okay
  Size: 61440 blocks
      Extent              Start Block              Block count
           0                   120836                    61440

d2: Soft Partition
  Component: c0t0d0s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                    20482                    20480

Recreating Soft Partitions from Information in the Replica

This example essentially does the opposite of example 1. In this case, the actual extent headers on the disk have been lost, either because something wrote over them, or because the disk hosting the soft partitions had to be replaced with new disk drive. Although the replica shows the soft partitions to be "Okay":

   # metastat
d1: Soft Partition
  Component: c0t0d0s0
  State: Okay
  Size: 61440 blocks
      Extent              Start Block              Block count
           0                   120836                    61440

d2: Soft Partition
  Component: c0t0d0s0
  State: Okay
  Size: 20480 blocks
      Extent              Start Block              Block count
           0                    20482                    20480

there are no extent headers on the disk, so I/O to the disk will error out.

   # dd if=/dev/zero of=/dev/md/rdsk/d2
dd: /dev/md/rdsk/d2: open: I/O error

To check the disk to see if any extent headers exist on the disk, you can run the command

   # metarecover -n c0t0d0s0 -p
found incorrect magic number 0, expected 20000127.
No extent headers found on c0t0d0s0.
c0t0d0s0: On-disk structures invalid or no soft partitions found.
metarecover: hostname: d0: bad magic number in extent header

The above command confirms that there are no extent headers on the disk. To have the extent headers written out to the disk, according to the information currently in the replica, run the command

   # metarecover c0t0d0s0 -p -m
c0t0d0s0: Soft Partition metadb configuration is valid

WARNING: You are about to overwrite portions of c0t0d0s0
with soft partition metadata. The extent headers will be
written to match the existing metadb configuration.  If
the device was not previously setup with this
configuration, data loss may result.

Are you sure you want to do this (yes/no)? yes

c0t0d0s0: Soft Partitions recovered from metadb

Now, the extent headers have been written to the disk, so I/O will work correctly now. Running the verify command again, we see

   # metarecover -n c0t0d0s0 -p
c0t0d0s0: Soft Partition metadb configuration is valid
c0t0d0s0: Soft Partition metadb matches extent header configuration

Taken from: http://www.sysunconfig.net/unixtips/soft-partitions.html