5 1 0:Configuring Si3 Deduplication Store
Overview
SEP sesam v. 5.0.0 Jaglion has introduced a new generation Si3 data store. It offers significantly increased performance for backup, restore and migration, as well as direct backup to S3, resulting in improved performance, scaling and resource savings.
- The new Si3 can detect duplicate data fragments, optimizing the recovery process.
- It enables you to back up your data to S3 cloud storage and Azure (≥ Jaglion V2), and restore the items you want directly from there.
- Si3 supports any direct attached disk (except NSS and MooseFS volumes, see Restrictions), provides global deduplication with source-side (Si3S) and target-side (Si3T) deduplication, replication and encryption, and enables single file restore (SFR) and instant recovery.
- When configuring deduplication, you should consider the performance factors of deduplication. These include infrastructure (storage types), network speed, storage disk set up, achievable deduplication ratio, etc. For details, see Deduplication.
- The new immutable storage feature (introduced in Jaglion V2) is also based on Si3 store (set up on a dedicated Linux server). SiS is SEP Immutable Storage, based on the File Protection Service (FPS), which scans the file system and sets the immutable bit for all new objects. This means that all data stored in SiS is marked immutable at the time of storage. Even with full admin access to the SEP sesam backup server, attackers cannot delete, modify, or encrypt data stored on SiS. For details, see SEP Immutable Storage – SiS.
Seeding Si3 deduplication store is currently not supported (see the Si3 V1 and Si3 comparison section below).
How to upgrade from the old Si3 V1 to the new Si3?
SEP sesam does not support a direct upgrade from the old Si3 V1 to Si3. However, to use the new Si3 you can:
- Back up all data again to the newly configured Si3 deduplication store.
- You can create a replication job to replicate from the Si3 V1 to the Si3 store. Replication reads all data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. For details, see the section Replicating from Si3 V1 to Si3.
Tip | |
You can also configure a new Si3 and an old Si3 V1 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup. |
Deduplication types
SEP sesam provides target-based (Si3T) and source-based deduplication (Si3S). For details on the deduplication concept and recommendations, see Deduplication.
- Both Si3T and Si3S require a configured Si3 deduplication store.
- In general, only one Si3 V1 or Si3 deduplication store can be configured on a server. There is only one exception to this rule: You can use the enable_gui_allow_multi_dedup key to configure both Si3 deduplication store types on the same backup server or RDS to perform a smooth upgrade from Si3 V1 to Si3.
- A valid licence is required for each Si3 deduplication store.
- You can also configure an Si3 deduplication store via a command line. For details, see Configuring and Administering Si3 Deduplication Store with CLI.
SEP sesam support for S3-compatible cloud and Blob storage
With SEP sesam Si3, you can back up your data directly to the S3 cloud and to Azure Blob storage (≥ Jaglion V2). As S3 is an open API standard and AWS Simple Storage Service is a sample implementation of the standard, SEP sesam Si3 can also be used with other S3-compatible cloud implementations. The configuration and management of Si3 in an S3-compatible cloud implementation is similar to the example shown in Backup to S3 Cloud Storage and must follow the same process and rules provided for using Si3 with S3. For more details, see Backup to S3 Cloud Storage. For the list of supported object storage, see the support matrix.
Warning | |
In Azure, read access carries higher costs. Tasks such as housekeeping, consistency checks, and restores will incur higher expenses. Consider this when planning your operations. |
Updating Si3 on S3 from 5.0.0.4 to the new version
If you use Si3 on S3 and update from 5.0.0.4 to the new version, the structure of the existing stores will change as the structure of Si3 on S3 is automatically recreated (this includes recreating the index after the renaming). Example:
- The S3 bucket is called seps3, the Si3 deduplication store name is newNG. The S3 structure with version 5.0.0.4 of NG is: seps3/pages; seps3/pages-trash; seps3/objects-trash.
- When updating to the next version of NG, the structure changes to: seps3/newNG/pages; seps3/newNG/pages-trash; seps3/newNG/objects-trash. During this renaming, the Si3 service is not available.
Prerequisites
- For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements.
- For details on the required Java version, see Java Compatibility Matrix. Si3 is not mandatory, so there is no dependency rule for it in the RPM/DEB packages.
- When estimating the maximum size of a deduplication store, you have to ensure that there is enough space available for dedup trash, otherwise the deduplication store will run out of space. You should calculate the required disk space based on a representative sample of your full backup and add the additional storage space equal to approximately 50% of the representative full backup.
Required additional amount of RAM
The following table shows the required additional amount of RAM for the Si3-NG data store. The TB value corresponds to the capacity of the Si3-NG data store.
Note | |
These requirements relate solely to the need for deduplication. In addition to these requirements, the amount of memory for the operating system and other services should be taken into account. |
Si3-NG data store capacity (check initial size limit) | RAM |
---|---|
<20 TB | 16 GiB |
20-40 TB | 32 GiB |
You can use the following command (from the admin command line) to find out how much RAM is needed at what capacity of Si3. Note that you need to set the sesam profile to run the command: sm_dedup_interface -T dedup2 propose jvmconfig <Si3-CAPACITY>
Required additional amount of CPU cores
The following table shows the number of CPU cores required for a Si3 data store. The TB value is the amount of data backed up (before deduplication)!
Backed up data (before dedup) | CPU cores |
---|---|
10 TB | 4 |
20 TB | 4 |
40 TB | 8 |
Performance tip
Applies to Windows only: SEP AG recommends using the High performance power plan to increase the performance of your backup. Note that Windows sets all computers to the Balanced power plan by default and you must manually switch to the High Performance power plan. This way, your Windows computer will use more power, but the systems with Si3 will always operate at the highest performance level.
- From the Start menu, go to Control Panel -> System and Security -> Power Options and change the setting to High performance.
Restrictions
- Si3 deduplication store is not supported for NSS and MooseFS volumes.
- To avoid problems resulting from the combination of excessively large Si3 deduplication stores and inefficient hardware, the maximum initial Si3/Si3-NG deduplication store size is currently limited to 40 TB. Please contact SEP sesam support if your specific requirements are different. This limitation applies to the creation of a new Si3 deduplication store in the GUI.
Note | |
It is recommended to run Si3 deduplication (SEP sesam Server or RDS) on the physical host. It is also possible to run it on a virtual machine. In this case, take into account that deduplication consumes a lot of server resources for reading, processing and writing the deduplicated data, as well as for some other deduplication tasks such as housekeeping and various checks. These tasks require a large amount of IO and a large amount of memory. Si3 performance can be affected by other VMs running on the same host. Therefore, if you are running Si3 on a VM, you should be aware of possible bottlenecks and shortcomings. |
Configuration procedure
The SEP sesam data store is a disk based storage that allows savesets (backed-up data) to be backed up directly to configured storage locations, including S3 cloud storage and Azure. Note that configuration procedure for the latter differs from the one described below. For details, see Backup to S3 Cloud Storage and Backup to Azure Storage.
Enable Si3 setup on the same host
To make the upgrade from Si3 V1 to Si3 smoother, you can configure a new Si3 and an old Si3 V1 on the same backup server or RDS by using the enable_gui_allow_multi_dedup key.
- Open the global settings in the GUI: In the menu bar, click Configuration -> Defaults -> Settings.
- Set the key value of enable_gui_allow_multi_dedup to 1.
Configure Si3
SEP Si3 target deduplication is easy to configure and ready to use by selecting the Si3 deduplication data store type. Note that Si3 deduplication store is not supported for NSS and MooseFS volumes. For other limitations, see Restrictions.
Tip | |
Si3 store can also be used to back up your data directly to S3 cloud or Azure. In this case, the configuration is slightly different depending on the type of storage cloud. For more information, see Backup to S3 Cloud Storage and Backup to Azure Storage. |
- In the Main selection -> Components, click Data Stores to display the data store contents frame.
- From the Data Stores menu, select New Data Store. A New Data Store dialog appears.
- Under Data store properties, enter a meaningful name for the Si3 deduplication store in the Name field. Entering the name also creates the name of the drive group for your Si3 deduplication store in the Create new drive group field.
- From the Store type drop-down list, select SEP Si3 NG Deduplication Store.
- Ensure that the Create drive option is enabled under the Drive parameter properties. The predefined value for the drive is automatically entered in the Drive number field. It is recommended to also activate the option Create second drive. Without this option, SEP sesam can only assign one drive for either reading or writing, with one job on the same drive at a time. If you use the additional dedicated drive for restore, you can perform a backup on the first drive and restore your data from the second drive simultaneously. You can also add a third drive for migration.
- The name in the Create new drive group is already created. You can change it by simply entering a new name.
- The predefined number of channels is already available in the Max. channels drop-down list. The number of available channels depends on your SEP sesam Server package. For details on licensing, see Licensing.
- From the Device server drop-down list, select the device server for your data store.
- In the Path field, enter the location of your data store or use the Browse button to select it. Click OK.
If you use the Browse button, the New Data Store information window appears with predefined recommended values for the size of your Si3 deduplication store. Click OK to confirm the selected location and recommended size values. You can change the size of your Si3 deduplication store later under Size properties (see section Size properties).
After configuring the Si3 deduplication store, configure the media pools first then set up your backup strategy. Make sure to test your newly created Si3 store by running a test backup on it.
Run a test backup on Si3
- Create a new backup task: In the Main Selection -> Tasks -> By clients, select your RDS client and then click New Backup Task. Configure your backup task and save it. For details, see Creating a Backup Task.
- Test the backup on the newly created Si3 store: From the menu bar, select Activities -> Immediate start -> Backup. In the Immediate start: Backup dialog, select the previously created media pool for Si3 as the target media pool for the backup. Click Start and check if your backup was successful by viewing the status of your backup job in the GUI (Monitoring -> Last Backup State or Job State -> Backups) or in the Web UI – Last backup state.
Now you can create different backup tasks to apply deduplication and enable the best possible scenarios for efficient backup in different environments. For details on how to select your deduplication method, see Deduplication. For details on how to configure a backup job, see Standard Backup Procedure.
Replicating from Si3 V1 to Si3
As SEP sesam does not support a direct upgrade from the old Si3 V1 to the new Si3, you can create a replication task to replicate from Si3 V1 to the Si3 store. Replication reads all data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. Once your new Si3 is set up, you should configure regular replication.
Configure a replication task
To configure a replication from Si3 V1 to Si3, proceed as follows.
- Create a replication task: In the Main selection -> Tasks -> Replication Tasks, click New Replication Task. The New Replication Task window is displayed.
- In the Name field, enter a name for the replication task, e.g., Si3-2-Si3NG.
- Enter the following information under Parameters:
- Media pool
- Pool: Select the name of the source media pool of the Si3 deduplication store from which the data will be replicated.
- Drive: Select the drive number of the drive to be used to read the data.
- Interface: Optionally, specify the network interface of the RDS to be used for data transfer.
- Destination
- Pool: Select the name of the target media pool you previously created for the new Si3 and to which the data will be replicated.
- Drive: Select the drive number of the drive that will be used to write the data.
- Interface: Optionally, enter the network interface of the RDS to be used for data transfer, e.g., the name of the RDS.
- Leave the Relative backup date (From) set to -99,999 and To set to 0.
- In the drop-down list based on, the Sesam days option is selected by default.
- Click Save to save your replication task.
After you have configured a replication task, start replication as follows.
Start replication
Note that any initial replication requires a large amount of CPU, network bandwidth and time to complete successfully.
Start replication manually as follows:
- In the GUI menu, select Activities -> Immediate start -> Replication.
- In the Immediate Start: Replication window, from the Task name drop-down list select the replication task you created earlier, e.g., Si3-2-Si3NG, and click Start.
Si3 data encryption
To configure Si3 data encryption, you have to create a security password for deduplication:
Main selection -> Components -> click Data Stores -> select your Si3 deduplication store and double-click it, then double-click the first drive of your Si3 deduplication store.
In the Encryption password field, specify the encryption password and repeat it.
For details, see Encrypting Si3 Deduplication Store.
Si3 deduplication store size properties
To change data store size properties, go to Main selection -> Components -> click Data Stores -> select your Si3 deduplication store and double-click it. Then under Size properties specify or modify the following:
- Capacity: Specify the size (in GiB) of the partition for backups.
- High watermark: Specify the value (in GiB) for the high watermark (HWM). The HWM defines the upper value for the used storage space. When this value is reached, the status of a datastore changes from OK to Warning, but backups continue to be performed. Make sure that you provide enough storage space for your backed up data.
- Si3 repair area: Specify the value (in GiB) for the Si3 repair area. The Si3 repair area (subdirectory trash) defines the space for Si3 files that were identified by a garbage collection job and are no longer used. These files are still kept in the repair area to allow for a possible repair of Si3 in case of structural problems (which may be caused by a file system error or an operating system crash). The files in the repair area are automatically removed after the specified period of time (SEP sesam default: 4 days) or when the disk usage threshold is reached. The Si3 repair function is disabled when the value is set to 0.
Note | |
The Si3 repair area for managing the disk space allocated for Si3 files is available only in advanced UI mode (formerly expert GUI mode). To see the Si3 repair area field, make sure your UI mode is set to advanced. For details, see Selecting UI mode. |
The Disk space usage properties are used by SEP sesam to report the following:
- Used: Total used space (in GiB) on the partition.
- Total: Maximum available space (in GiB) on the partition as reported by the operating system.
- Free: Available disk space (in GiB) for SEP sesam.
- Deduplication rate: Deduplication takes place as soon as the backup process has started. SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 data store. Only single instances of unique data are sent to the data store and replace each deduplicated file with a stub file. The deduplication ratio indicates the extent of data reduction achieved by Si3 deduplication, i.e. the ratio between the protected size of data and the actual physical data size stored. A ratio of 10:1 means that 10 times more data is protected than the physical capacity needed to store it. The deduplication ratio depends greatly on the deduplication method used (si3T or Si3S), the type of data, the backup level used (the deduplication ratio is higher when there are copy and full backups and when there is a larger amount of data), etc. For details, see Deduplication.
Monitoring deduplication status
You can view the status of your of your Si3 deduplication in the GUI (Si3 deduplication store properties -> Si3 State tab) or in the Web UI - Datastore actions. The data store status overview provides detailed information about consistency, utilization, sanity status, size, disk space usage as well as related media pools, media and drives, dependencies, data size before/after deduplication, etc.
Note | |
If fsck (file system consistency check) detects irregularity in the Si3 file system, the affected pages and chunks are recorded in the recovery.log. The Si3 deduplication store in GUI and Web UI is marked red and the Si3 purge is no longer executed. The purge is stopped to prevent the files in the Si3 repair area to be deleted as they may be required to repair Si3 in case of problems. Once the errors are fixed and the recovery.log is empty, the Si3 data store is no longer marked red and the Si3 purge is working again. |
Comparison of Si3 V1 and Si3
SEP sesam v. 5.0.0 Jaglion has introduced a new generation Si3 deduplication store. Si3 offers significantly higher performance for backup, restore and migration, as well as backup to S3 cloud and backup to Azure, the new immutable storage feature SiS, resulting in improved performance, scaling, and resource savings.
Function | Si3 | Si3 NG |
---|---|---|
Si3 backup | ||
Si3 deduplication (source-side and target-side) | ||
Si3 replication: local to remote store Notea | Si3 V1 to Si3 V1 | Si3 V1 to Si3; Si3 to Si3 |
Si3 replication: to S3 cloud | (provides more powerful features for backing up directly to the cloud, see the next two lines) | |
Backup to S3 Cloud Storage | ||
Backup to Azure Storage | (as of Jaglion V2) | |
SiS (SEP Immutable Storage) | (as of Jaglion V2) | |
Si3 restore | ||
Si3 encryption | (as of Jaglion V2) | |
Seeding Si3 deduplication store Noteb | ||
Usage of tachometer |
SEP sesam does not support a direct upgrade from the Si3 V1 to new Si3. However, to use the new Si3 you can:
- Back up all data again to the newly configured Si3 deduplication store.
- After configuring a new Si3, you can also create a replication job to replicate from the Si3 V1 to the Si3 store. Replication reads all the data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. For details, see Replicating from Si3 V1 to Si3.
- You can also configure a new Si3 and an old Si3 V1 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup.
The Initial Seed feature does not work in v. 5.0.0 Jaglion, but you can use it in earlier SEP sesam versions.
See also
Backup to S3 Cloud Storage – Backup to Azure Storage – Encrypting Si3 Deduplication Store – Deduplication – Configuring Source-side Deduplication – Configuring Si3 Replication – Configuring and Administering Si3 Deduplication Store by using CLI – Licensing – SEP Immutable Storage – SiS