SIEM disaster recovery guidelines and best practices
Technical Articles ID:
KB90674
Last Modified: 2023-07-14 07:05:18 Etc/GMT
Environment
SIEM Enterprise Log Manager (ELM) 11.x
SIEM Enterprise Security Manager (ESM) 11.x
SIEM Event Receiver (Receiver) 11.x
Summary
When the worst happens, it isn't always known what the right path is to recovery. This document covers the most common failures, the best way to restore service, and how to preserve existing data.
Common disasters can include the following:
- Loss of ESM because of hardware or file system failure
- Loss of Receiver because of hardware or file system failure
- Loss of ELM because of hardware or file system failure
- Other disk or filesystem corruption not involving a full box replacement
Problem
The best business practice for disaster recovery is to take regular backups and keep one off-site copy monthly.
SIEM backup and export solutions:
- ESM database backup
- ELM database backup
- Policy editor export including custom rules
- Alarm export
- Report and dashboard exports
- Watchlist export
- Receiver data source export
Devices with no backup. The ESM retains the device configuration:
- Event Receiver (Receiver)
- Application Data Monitor (ADM)
- Database Event Monitor (DBM)
- Advanced Correlation Engine (ACE)
Solution
Recovery of an ESM without backups
The ESM is one of the most critical devices to back up. All data source, dashboard, and report information is stored on the ESM. If an ESM is lost and a backup isn't available, a partial recovery of data might be possible using a Receiver sync.
- When the ESM is lost, immediately SSH directly to any receivers connected to it and backup /etc/NitroGuard/thirdparty.conf.
NOTE: If the customer data source names in thirdparty.conf contain a dash (-) in the name, replace it with another character. Doing so prevents an issue where the data source names are truncated after syncing.
- Install and rack the replacement ESM. Make sure it's on the same version as the previous device.
- Set the SSH keys of all other SIEM devices back to factory default; for example, the Receiver, ACE, ADM, DBM, and ELM. For specific instructions, follow the steps outlined in KB74464 - How to reset the default factory key when SSH is not authorized.
- Add the devices back to the ESM using their existing IP address. For example, if the Receiver IP address before the crash is 192.168.100.103, add it to the new ESM with the same IP address. Don't write out any data source settings. It's important that the Receiver keep its original configuration until the sync is done.
- For each Receiver that has been readded, go to Properties, Receiver Configuration, Sync Device. If you receive an error message about the Receiver needing to have no data sources, make sure that no data sources have been manually added back. Data sources include any ePolicy Orchestrator (ePO) devices that associate with a Receiver. For instance, Sync Device doesn't work if the Receiver has no data sources but there's an ePO device on the ESM that's associated with that Receiver.
- Use the Receiver thirdparty.conf file to pull its data source configuration back to the ESM and automatically readd the devices. Use of the Receiver thirdparty.conf file preserves ipsids and Device IDs and makes it possible to quickly recover event collection by allowing access to the existing ELM data. It takes a few minutes to sync the configuration from the Receiver back to the ESM.
- When the sync is complete, edit the Receiver data sources under the Data Source tab. Write out the data source settings and roll policy.
The ESM now begins collecting events from the Receiver and the existing list of data sources is recovered.
- Open any data source on the Receiver and click Logging in the edit view. This action prompts the user to associate an ELM with the Receiver. Answer Yes and allow the action to complete.
- Repeat step 8 for all Receivers on the ESM. This step makes sure that ELM Archive and Enhanced ELM Search work later.
Recovery of a Receiver after replacement or crash
The Receiver configuration is stored on the ESM. To replace a Receiver, follow the steps below:
- Rack and install the replacement Receiver, or re-ISO the failed Receiver if an RMA isn't needed.
- Provision the new Receiver with the same IP address as the old Receiver, and make sure it's on the same version as the previous device.
- Key the device from the graphical user interface by going to Receiver Properties, Key Management.
- Under Receiver Properties, Connection tab, click Status. Continue to the next step when the status is pulled back and doesn't generate an SSH error.
- Under the Receiver Properties, Data Source tab, modify any data source to enable the write button, and then click write to write out the data source settings and roll policy.
- Data collection on the Receiver resumes after the writing and rolling is performed.
Recovery of an ELM device without an ELM database backup
The ELM is one of the most critical SIEM devices for compliance, and backups must be created regularly. If the ELM hardware is replaced or needs reimaging, it might be possible to recover it. The database location and ELM logs must be on a CIFS, NFS, SAN, or iSCSI device for recovery to be possible.
- Rack and install the new ELM using the same IP address as the previous unit. Make sure that the ELM is on the same SIEM version as the original.
- Reinstall any missing SAN or iSCSI volumes under the ELM Properties, Data Storage tab. If NFS or CIFS is used, skip this step.
- Rekey the ELM under Properties, Key Management.
- Go to ELM Properties, Storage Pool.
- In the top window under devices, add the previously used NFS, CIFS, SAN, or iSCSI device again. For NFS and CIFS devices, make sure that you use the same share name and path that was previously used. If the previous share name and path aren't known, use the network path where the mgtdb directory is stored. The idea is to create a storage pool device with access to the ELMs mgtdb directory on the network.
- SSH to the ELM and confirm whether the network share is accessible by running df -h.
- Locate the mgtdb directory on the network share path from the ELM command line. For example, if the NFS share is 10.10.10.10 and the mount point is /elm_storage/nfs_1, you can use cd /elm_storage/nfs_1 and ls -al to find an mgtdb directory. If all else fails, find / -name 'mgtdb' shows all locations. You're trying to find the original mgtdb location on the network.
- After the original mgtdb location is found, examine the symbolic links in /usr/local/elm/mgtdb and /elm_allocations/MGTDBxxx and make sure they eventually point back to the /elm_storage/xxx nfs share. For instance, if the mgtdb was found in /elm_storage/nfs_1/mgtdb, you can create a symlink in /elm_allocations/xxx that points to /elm_storage/nfs_1/mgtdb. You can then create a symbolic link in /usr/local/elm/mgtdb pointing to the symlink in /elm_allocations/xx, which then points to the nfs mount in /elm_storage/xxx. By way of example, /usr/local/elm/mgtdb is a symbolic link that points to /elm_allocations/MGTDB_Alloc123 and /elm_allocations/MGTDB_Alloc123 is a symlink that points to /elm_storage/nfs_1/mgtdb.
- /usr/local/elm/mgtdb symbolic link points to /elm_allocations/MGTDB_xxx
- /elm_allocations/MGTDB_xxx points to /elm_storage/name_of_NFS_mount/mgtdb
- /elm_storage/name_of_NFS_mount/mgtdb is where the NFS share is mounted to SIEM, and contains a subdirectory of mgtdb, which contains the database
- Run the command vi /etc/NitroGuard/mgtdbloc.conf and make sure the path there matches the symlink in step 8; for example, /elm_storage/nfs_1/mgtdb.
- To make the changes take effect, perform ELMStop and ELMStart.
- After the ELM database starts, the ELM begins working, but the storage.conf and alloc.conf files need to be manually re-created.
Connect to the ELM database, query it, and find the names of the storage pools and their location. For example, nquery -d rec -i --long --noblob opens the database. (ELM database is still called rec.) It's then possible to get the names of the storage devices using select * from rg. It's also possible to get the names of each shid and allocation name by examining tables like rg2sh or sh.
- SSH to the newly commissioned ELM.
|