OpenVMS Alpha Galaxy Guide

Document revision date: 28 June 1999

OpenVMS Alpha Galaxy Guide

Contents

Index

8.4 Step 3: Upgrade the Firmware

To upgrade the firmware, use the Alpha Systems Firmware Update Version 5.4 CD-ROM that is included in the OpenVMS Version 7.2--1 CD-ROM package. Be sure to read the release notes that are included in the package before installing the firmware.

8.5 Step 4: Set Environment Variables

Configure the primary console for instance 0.

CPU0 is the primary for instance 0.

Create the Galaxy environment variables. For descriptions of the Galaxy environment variables and common values for them, refer to Chapter 6.

The following example is for an AlphaServer 4100 with three CPUs and 512MB of memory divided into 256MB + 192MB + 64MB.

P00>>> create -nv lp_count 2 P00>>> create -nv lp_cpu_mask0 1 P00>>> create -nv lp_cpu_mask1 6 P00>>> create -nv lp_io_mask0 10 P00>>> create -nv lp_io_mask1 20 P00>>> create -nv lp_mem_size0 10000000 P00>>> create -nv lp_mem_size1 c000000 P00>>> create -nv lp_shared_mem_size 4000000 P00>>> set auto_action halt

If you have four CPUs and you want to assign all secondary CPUs to instance 1, the lp_cpu_mask1 variable will be E. If you split the CPUs between both instances, CPU 0 must be the primary for instance 0, and CPU 1 must be the primary CPU for instance 1.

The mem_size variables depend on your configuration and how you want to split it up.

galaxy_io_mask0 must be set to 10
galaxy_io_mask1 must be set to 20

You must set the console environment variable AUTO_ACTION to HALT. This will ensure that the system does not boot and that you will be able to enter the Galaxy command.

8.6 Step 5: Initialize the System and Start the Console Devices

Initialize the system and start the Galaxy firmware by entering the following commands:
P00>>> init P00>>> galaxy
After the self-test completes, the Galaxy command will start the console on instance 1.
The first time that the Galaxy starts, it might display several messages like the following:
CPU0 would not join
IOD0 and IOD1 did not pass the power-up self-test
This happens because there are two sets of environment variables, and the galaxy variables are not present initially on instance 1.
Note that when the I/O bus is divided between the two Galaxy partitions, the port letter of a device might change. For example, a disk designated as DKC300 when the AlphaServer 4100 is a single system could become DKA300 when it is configured as partition 0 of the OpenVMS Galaxy.

Configure the console for instance 1.
Use the same commands from step 2 to create the same Galaxy environment variables.

P01>>> create -nv lp_cpu_mask0 1 P01>>> create -nv lp_cpu_mask1 6 P01>>> create -nv lp_io_mask0 10 P01>>> create -nv lp_io_mask1 20 P01>>> create -nv lp_mem_size0 10000000 P01>>> create -nv lp_mem_size1 c000000 P01>>> create -nv lp_count 2 P01>>> create -nv lp_shared_mem_size 4000000 P01>>> set auto_action halt

Initialize the system and restart the Galaxy firmware by entering the following command:
P00>>> init
When the console displays the following confirmation prompt, type Y:
Do you REALLY want to reset the Galaxy (Y/N)

Configure the system root, boot device, and other related variables.
The following example settings are from an OpenVMS Engineering system. Change these variables to meet the needs of your own environment.

P00>>> set boot_osflags 12,0 P00>>> set bootdef_dev dka0 P00>>> set boot_reset off !!! must be OFF !!! P00>>> set ewa0_mode twisted P01>>> set boot_osflags 11,0 P01>>> set bootdef_dev dkb200 P01>>> set boot_reset off !!! must be OFF !!! P01>>> set ewa0_mode twisted

Boot instance 1 as follows:
P01>>> boot
Once instance 1 is booted, log in to the system account and edit the SYS$SYSTEM:MODPARAMS.DAT file to include the following line:
GALAXY=1
Confirm that the lines for the SCS node and SCS system ID are correct. Run AUTOGEN as follows to configure instance 1 as a Galaxy member, and leave the system halted:
$ @SYS$UPDATE:AUTOGEN GETDATA SHUTDOWN INITIAL
Boot instance 0 as follows:
P00>>> boot
Once instance 0 is booted, log in to the system account and edit the SYS$SYSTEM:MODPARAMS.DAT file to include the following line:
Add the line GALAXY=1
Confirm that the lines for the SCS node and SCS system ID are correct. Run AUTOGEN as follows to configure instance 0 as a Galaxy member, and leave the system halted:
$ @SYS$UPDATE:AUTOGEN GETDATA SHUTDOWN INITIAL
Prepare the Galaxy to come up automatically upon initialization or power cycle of the system. Set the AUTO_ACTION environment variable on both instances to RESTART.
P00>>> set auto_action restart P01>>> set auto_action restart
Initialize the Galaxy again by entering the following commands at the primary console:
P00>>> init
When the console displays the following confirmation prompt, type Y:
Do you REALLY want to reset the Galaxy (Y/N)
Alternatively, you could power-cycle your system, and the Galaxy with both instances should bootstrap automatically.

Congratulations! You have created an OpenVMS Galaxy.

Chapter 9
Using a Single-Instance Galaxy on Any Alpha System

With OpenVMS Alpha Version 7.2--1, you can run a single-instance Galaxy on any Alpha platform. This capability allows early adopters to evaluate OpenVMS Galaxy features and, most important, to develop and test Galaxy-aware applications without incurring the expense of setting up a full-scale Galaxy computing environment on a system capable of running multiple instances of OpenVMS (for example, an AlphaServer 8400).

A single-instance Galaxy running on any Alpha system is not an emulator. It is OpenVMS Galaxy code with Galaxy interfaces and underlying operating system functions. All Galaxy APIs are present in a single-instance Galaxy (for example, resource management, shared memory access, event notification, locking for synchronization, and shared memory for global sections).

Any application that is run on a single-instance Galaxy will exercise the identical operating system code on a multiple-instance Galaxy system. This is accomplished by creating the configuration file SYS$SYSTEM:GLX$GCT.BIN, which OpenVMS reads into memory. On a Galaxy platform (for example, an AlphaServer 8400), the console places configuration data in memory for OpenVMS to use. Once the configuration data is in memory, regardless of its origin, OpenVMS boots as a Galaxy instance.

To use the Galaxy Configuration Utility (GCU) to create a single-instance Galaxy on any Alpha system, use the following procedure:

Run the GCU on the OpenVMS Alpha system on which you want to use the single-instance Galaxy.

If the GCU is run on a non-Galaxy system, it will prompt as to whether you want to create a single-instance Galaxy. Click on OK.

The GCU next prompts for the amount of memory to designate as shared memory. Enter any value that is a multiple of 8MB. Note that you must specify at least 8MB of shared memory if you want to boot as a Galaxy instance.

When the GCU has displayed the configuration, it will already have written the file GLX$GCT.BIN to the current directory. You can exit the GCU at this point. If you made a mistake or want to alter the configuration, you can close the current model and repeat the process.

To reboot the system as a Galaxy instance:

Copy the GLX$GCT.BIN file to SYS$SYSROOT:[SYSEXE]GLX$GCT.BIN.
Shut down the system.
Reboot with a conversational boot command (>>> B -FL 0,1 device).
SYSBOOT> SET GALAXY 1
SYSBOOT> CONTINUE
Add GALAXY=1 to SYS$SYSTEM:MODPARAMS.DAT

Chapter 10
OpenVMS Galaxy Tips and Techniques

This chapter contains operating hints that OpenVMS Engineering has found useful for dealing with known issues and in creating and supporting OpenVMS Galaxy environments. These operating hints resolve issues that fall into the following categories:

Hardware and firmware issues affecting all platforms
Hardware and firmware issues affecting AlphaServer 8400 and 8200 systems
Hardware and firmware issues affecting AlphaServer 4100 systems
OpenVMS software issues that affect all platforms

10.1 Hardware and Firmware Issues Affecting All Platforms

The following sections describe known OpenVMS Galaxy issues that affect AlphaServer 8400, 8200, and 4100 platforms.

10.1.1 Console Tips

Because AlphaServer GS140, GS60, 8400, 8200, and 4100 systems were designed prior to the Galaxy Software Architecture, OpenVMS Galaxy console firmware and system operations must handle a few restrictions.

The following list briefly describes some things you should be aware of and some things you should avoid doing:

Do not set the BOOT_RESET environment variable to 1. This causes each secondary console to reset the bus before booting, thus resetting all previously booted partitions. Remember that OpenVMS Galaxy partitions share the hardware.
Be patient. Console initialization and system rebooting can take several minutes.
Do not attempt to abort a firmware update process!
This can leave your system seriously hung.
When updating console firmware, update ALL CPUs at the same time.
You cannot run two different types of CPUs or two different firmware revisions. If you fail to provide consistent firmware revisions, the system will hang on power-up.
Never issue the GALAXY command from a secondary console. This will reinitialize the system, and you will need to start over from the primary console.

10.1.2 System Auto-Action

Upon system power-up, if the AUTO_ACTION console environment variable is set to BOOT or RESTART for instance 0 and the LP_COUNT environment variable is set to 2 or more, then the GALAXY command will automatically be issued and instance 0 will attempt to boot.

The setting of AUTO_ACTION in the console environment variables for the other instances will dictate their behavior upon the issuing of the GALAXY command (whether it is issued automatically or by the user from the console).

To set up your system for this feature, you must set the console environment variable "AUTO_ACTION" to "RESTART" or "BOOT" on each instance, and be sure to specify appropriate values for the BOOT_OSFLAGS and BOOTDEF_DEV environment variables for each instance.

10.1.3 Disparate Environment Variables

Version 7.2--1 Note

For OpenVMS Version 7.2--1, lp* console environment variables on secondary instances are ignored.

Setting disparate environment variables is no longer cause for concern.

Be careful when setting Galaxy environment variables at secondary consoles.

It is very easy to cause problems in a Galaxy configuration by setting the Galaxy (lp*) console environment variables differently on the main console from the consoles of additional instances.

If you boot a secondary instance of a Galaxy system with the LP_COUNT console environment variable set to zero, OpenVMS will hang after the banner is displayed. Additionally, on an AlphaServer 4100 Galaxy system, the system will display an EISA configuration error; if the primary instance is already booted with the LP_COUNT environment variable set to two, it then crashes with a machine check.

OpenVMS Engineering expects to change these rules in a future release. For now, you must manually set all the environment variables to be the same in all instances and then INIT.

10.1.4 Console INIT Command Is Not Per-Instance

You cannot use the INIT command to reset a single instance. The console INIT command affects the entire system (not individual instances) by sending a reset signal to buses and devices.

You can enter an INIT command at any console, but the output is displayed at the primary console.

When you enter the INIT command, the console displays the following question:

"Do you really want to reset ALL partitions? (Y/N)"

10.1.5 INIT Command Behavior

The INIT command and power-on will both start secondary consoles if the LP_COUNT console environment variable and the AUTO_ACTION console environment variable are set appropriately on a primary instance.

Secondary instances will then boot depending on the setting of their AUTO_ACTION variable, as shown in Table 10-1.

When the LP_COUNT console environment variable is set to 0 or the AUTO_ACTION console environment variable is set to HALT, the LPINIT is not done by either the power-on or INIT.

Table 10-1 shows the effect of INIT or power cycle when the LP_COUNT console environment variable is set to a nonzero value.

Table 10-1 Effect of INIT or Power Cycle
AUTO_ACTION Setting on Primary Console

Effect on Halt Restart or Boot

Secondary consoles Not started Started

Primary instance Not booted Booted ¹

Secondary instance Not booted Booted depending on AUTO_ACTION setting on secondary console

**Table 10-1 Effect of INIT or Power Cycle**
	AUTO_ACTION Setting on Primary Console
Effect on	Halt	Restart or Boot
Secondary consoles	Not started	Started
Primary instance	Not booted	Booted ¹
Secondary instance	Not booted	Booted depending on AUTO_ACTION setting on secondary console

¹Not booted with INIT on AlphaServer 8400/8200

If INIT is issued at a secondary console, that secondary console is the one that does not boot. Others (including the primary console) will boot depending on the AUTO_ACTION setting for that instance.

10.1.6 CTRL/P Issues

CTRL/P affects only one instance. If an instance is in an IPL31 loop, CTRL/P might not work.

Avoid using CTRL/P during SYSBOOT and Bugcheck. If you enter CTRL/P duing SYSBOOT followed by a BOOT command, the console might respond with the following message:

Inconsistent boot driver state. System is configured with multiple partitions. A complete INIT must be performed before rebooting.

10.1.7 CPUs That Cannot Be Reassigned

When CPUs cannot be reassigned with the Galaxy Configuration Utility (GCU), this usually means that no communications exist between instances.

For CPUs to be reassigned, instances must be in a cluster or DECnet must be set up with proxies. Note that TCP/IP communications between instances are being developed for a future release.

10.1.8 GLXCRASH is the Heartbeat Timeout BUGCHECK

Each SHARING member in an OpenVMS Galaxy ticks a heartbeat cell in shared memory that other instances watch. If an instance's heartbeat stops ticking for the amount of time specified as milliseconds in the GLX_INST_TMO SYSGEN parameter, that instance will be assumed to be "dead" and will be removed as a sharing member.

If you CTRL/P a sharing member for longer than GLX_INST_TMO milliseconds, upon issuing a CONTINUE command from the console, the instance will immediately bugcheck with a GLXCRASH bugcheck. For example:

^P and wait a while, then P00>>>c continuing CPU 0 **** OpenVMS (TM) Alpha Operating System X6PI-SSB - BUGCHECK **** ** Bugcheck code = 00000A94: GLXCRASH, BUGCHK requested from another Galaxy instance

This is very similar to the behavior of an OpenVMS Cluster that would result in a CLUEXIT bugcheck if a cluster member was in a CTRL/P state for longer than RECNXINTERVAL seconds.

Sharing members all use the same value of GLX_INST_TMO. The default value is currently 5000 milliseconds. To debug or test, you can increase the timeout value by resetting the GLX_INST_TMO SYSGEN parameter.

10.2 Hardware and Firmware Issues Affecting AlphaServer 8400 and 8200 Systems

The following sections describe known OpenVMS Galaxy issues that affect AlphaServer 8400 and 8200 systems.

10.2.1 Allocating Primary CPUs to Instances on AlphaServer 8400/8200

On AlphaServer 8400/8200 systems, only an even-numbered CPU can be the primary CPU in an instance. Therefore, do not define an instance to consist only of odd numbered CPUs.

10.2.2 Console Terminal Connection (Secondary Instances)

The console terminal must be connected to COM1, which is the connector furthest from the DWLPB motherboard. See the Chapter 6 in this document for complete details about installing the KFE72-DA console subsystem.

If the console terminal is connected to COM2, you will be able to enter console commands, but you will not see any output from the console or from OpenVMS.

10.2.3 EISA Ethernet Port is Unsupported

Use of the Twisted-Pair Ethernet port on the standard I/O module of the KFE72-DA is unnsupported.

10.2.4 Console MIGRATE Command on AlphaServer 8200

On an AlphaServer 8200 Galaxy system, the CPUs are 8, 9, 10, 11. CPU 11 cannot be reassigned from one instance to another under the following conditions:

Failover is set.
The CPU is stopped.
The system is then shut down or crashes.

The console migrate command generated by OpenVMS was:

MIGRATE -CPU 11 -PARTITION 0

This produced the error "unable to migrate CPU 17". Changing the migrate command to specify -CPU 0B produced the desired effect.

The problem occurs because the MIGRATE command currently expects hexadecimal CPU numbers. It is likely to affect only the AlphaServer 8200, because a 12-CPU, 4GB (total), 2-instance AlphaServer 8400 Galaxy seems to be an unlikely configuration.

This problem will be fixed in a future console version.

10.2.5 DWLPA Cannot Be Used

The KFE72-DA used to provide the console connection for instances other than instance zero on an AlphaServer 8200 or 8400 must be installed in a DWLPB PCI bus. If a DWLPA PCI bus is used, a console machinecheck will occur during power-up before the initial P00>>> prompt is displayed.

10.3 Hardware and Firmware Issues Affecting AlphaServer 4100 Systems

The following sections describe known OpenVMS Galaxy issues that affect AlphaServer 4100 systems.

10.3.1 Allocating Primary CPUs to Instances on AlphaServer 4100

On AlphaServer 4100 systems, the following CPU allocation restrictions apply:

Only CPU 0 can be the primary CPU for instance 0.
Only CPU 1 can be the primary CPU for instance 1.

10.3.2 No LPINIT on AlphaServer 4100 Systems

On AlphaServer 8400 and 8200 systems, the LPINIT or GALAXY console commands start the consoles for second or third instances.

The LPINIT command is not valid on AlphaServer 4100 systems. On AlphaServer 4100 systems, you must use the GALAXY command to start the console for the second instance.

If you enter the LPINIT command on an AlphaServer 4100 system, the following message is displayed:

P00>>> lpinit lpinit: No such command P00>>>

This will be corrected in a future Galaxy firmware update.

10.3.3 Do Not Use Gigabit Cards on AlphaServer 4100 Galaxy Systems

The Gigabit Ethernet adapter (DEGPA) is not supported for use in an OpenVMS Galaxy system. A fix will be included in a future version of OpenVMS.

10.3.4 Minimum Revision Power Control Module

A power control module (PCM) 54-24117-01 Rev F03 is needed to support more than two CPUs.

Three CPUs might work sometimes, but using a PCM 54-24117-01 Rev F03 is the safest practice.

10.4 OpenVMS Software Issues Affecting All Platforms

This section lists known OpenVMS software issues that affect all Galaxy platforms.

10.4.1 SSRVEXCEPT Bugcheck

If your OpenVMS Galaxy is configured in an existing OpenVMS Cluster, you must ensure that all the nodes in the cluster recognize new security classes as described in the Release Notes chapter (Chapter 1).

Failure to follow these procedures will cause OpenVMS VAX and Alpha systems running OpenVMS Version 6.2 or Version 7.1 to crash.

For complete documentation about this issue, see the Release Notes (Chapter 1).

10.4.2 GLXSHUTSHMEM Bugcheck

In an OpenVMS Galaxy, no process can have shared memory mapped to an instance when it leaves the Galaxy---for example, during a shutdown. To stop the process if an application is running from a system process (UIC group 1), you must modify SYS$MANAGER:SYSHUTDWN as shown in the following example from the OpenVMS Galaxy CPU Load balancer program:

** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM ** ** $! ** $! If the GCU$BALANCER image is running, stop it to release shmem. ** $! ** $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql") ** $ procid = f$pid(ctx) ** $ if procid .NES. "" then $ stop/id='procid'

For more information about the shutdown warning in the OpenVMS Galaxy CPU Load balancer program, see Appendix A.

If a process still has shared memory mapped when an instance leaves the Galaxy, the instance will crash with a GLXSHUTSHMEM bugcheck.

Contents

Index

privacy and legal statement

6512PRO_004.HTML

OpenVMS Alpha Galaxy Guide

8.4 Step 3: Upgrade the Firmware

Chapter 9Using a Single-Instance Galaxy on Any Alpha System

Chapter 10OpenVMS Galaxy Tips and Techniques

10.1 Hardware and Firmware Issues Affecting All Platforms

10.1.2 System Auto-Action

10.3.2 No LPINIT on AlphaServer 4100 Systems

Chapter 9
Using a Single-Instance Galaxy on Any Alpha System

Chapter 10
OpenVMS Galaxy Tips and Techniques