5 0 0:Configuring and Administering Si3 Deduplication Store by using CLI

From SEPsesam
Revision as of 12:09, 11 April 2023 by Jus (talk | contribs) (Marked this version for translation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This is documentation for SEP sesam version 5.0.0 Jaglion.
This is not the latest version of SEP sesam documentation and, as such, does not provide information on features introduced in the latest release. For more information on SEP sesam releases, see SEP sesam Release Versions. For the latest documentation, check SEP sesam documentation.


Overview


SEP sesam provides a target-based (Si3T) and source-based deduplication (Si3S). For details on the deduplication concept and recommendations, see Deduplication.

Both, Si3T and Si3S require a configured Si3 deduplication store. As of SEP sesam v. 5.0.0 Jaglion, a new generation Si3 deduplication store can be used. Compared to the "old" Si3 V1 store type, Si3 offers significantly higher performance for backup, restore and migration, as well as direct backup to S3 cloud, resulting in improved performance, scaling and resource savings.

Typically, only one Si3 deduplication store can be configured on a server. However, since a direct upgrade from the old Si3 V1 to Si3 is not supported, you can replicate from Si3 V1 to Si3. For this purpose, you can also configure a new Si3 and an old Si3 V1 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup. For details, see Enabling Si3 setup on the same host.

  • A valid licence is required for each Si3 deduplication store.

Deprecated Si3 V1 data store

Note
The old generation Si3 V1 deduplication store is deprecated. This means that the old generation Si3 V1 is no longer being enhanced, but is still supported until further notice. SEP strongly recommends using the new Si3 data store instead, especially if the data is to be stored to S3 Cloud.
  • If you are using an old generation Si3 V1 deduplication store with S3, you will not be able to restore from S3 via the GUI.
  • You can configure a new Si3 and an old Si3 V1 in parallel on the same host and replicate from the Si3 V1 to the Si3 store. For details, see Configuring Si3 Deduplication Store.

Prerequisites

For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements. Keep in mind that these requirements represent the demand for deduplication only. In addition, the amount of memory for the operating system and other services should be taken into account.

In addition, the following prerequisites must be met to configure a Si3 deduplication store.


  • For the minimum Si3/Si3-NG hardware requirements that apply to the SEP sesam Si3/Si3-NG deduplication server, see Hardware Requirements.
  • For details on the required Java version, see Java Compatibility Matrix. Si3/Si3-NG is not mandatory, so there is no dependency rule for it in the RPM/DEB packages.
  • When estimating the maximum size of a deduplication store, you have to ensure that there is enough space available for dedup trash, otherwise the deduplication store will run out of space. You should calculate the required disk space based on a representative sample of your full backup and add the additional storage space equal to approximately 50% of the representative full backup.

Disk attachment and protocols

Si3/Si3-NG supports all types of direct-attached disk storage, such as serial attached SCSI (SAS), Serial ATA (SATA), and Fibre Channel (FC)/LUN.

Performance tip

Applies to Windows only: SEP AG recommends using the High performance power plan to increase the performance of your backup. Note that Windows sets all computers to the Balanced power plan by default and you must manually switch to the High Performance power plan. This way, your Windows computer will use more power, but the systems with Si3 will always operate at the highest performance level.

  • From the Start menu, go to Control Panel -> System and Security -> Power Options and change the setting to High performance.

Restrictions

  • Si3 NG deduplication store is not supported for NSS and MooseFS volumes.
  • To avoid problems resulting from the combination of excessively large Si3 deduplication stores and inefficient hardware, the maximum initial Si3/Si3-NG deduplication store size is limited to 40 TB. If you would need to increase this limit, contact SEP support.
  • This limitation applies to the creation of a new Si3/Si3-NG deduplication store in the GUI.
Note
It is recommended to run Si3 deduplication (SEP sesam Server or RDS) on the physical host. It is also possible to run it on a virtual machine. In this case, take into account that deduplication consumes a lot of server resources for reading, processing and writing the deduplicated data, as well as for some other deduplication tasks such as housekeeping and various checks. These tasks require a large amount of IO and a large amount of memory. Si3 performance can be affected by other virtual machines running on the same host. Therefore, if you are running Si3 on a VM, you should be aware of possible bottlenecks and shortcomings.

Required additional amount of RAM and CPU cores

Memory requirements are dependent on the number of concurrent streams and expected workload. The following tables show the recommended minimum additional amount of RAM and CPU cores for a Si3/Si3-NG data store to ensure good performance. The TB value corresponds to the capacity of the Si3/Si3-NG data store.

Note
These requirements relate solely to the need for deduplication. In addition, you should consider the amount of memory for the operating system and other services.
Si3/Si3-NG data store capacity (check initial size limit) RAM
<20 TB at least 16 GiB
20-40 TB at least 32 GiB

The following table shows the number of CPU cores required for a Si3/Si3-NG data store. The TB value is the amount of data backed up (before deduplication)!

Backed up data (before dedup) CPU cores Note
10 TB 4
20 TB 4
40 TB 8
Note

This is the minimum amount to ensure good performance. Depending on the number of concurrent streams, more cores may be needed.


Using CLI for Si3 data store configuration

SEP sesam provides command utilities for configuring and managing Si3 data stores. The following section provides some examples of commands and syntax.

Note
You must have SEP sesam administrator privileges to run SEP sesam CLI commands and use the command prompt as an administrator. All commands are run from the <SESAM_ROOT>/bin/sesam/ directory. If you want to execute SEP sesam commands globally (and not from the actual run directory), set the SEP sesam profile as described in What happens when I set a profile?.

The index size (max_pages) and Java's RAM requirements are important parameters for the operation of a Si3 data store as both parameters are used during its creation.

The sm_dedup_interface command is used to configure the hardware for a Si3 data store server. More details are available below in the section sm_dedup_interface.

stpd_conf

The Si3 and stpd configuration is stored in an .ini file in the directory gv_rw_ini:stpd_conf.

The file name is derived from the hw_drives.device (DS@ds1_2), as with any other DS device. Some information is duplicated because it is used both by both the Si3 server and stpd.

bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini
 [DEDUP]
 Backend=dedup
 Hostname=localhost
 defaultRepoPath="/datastore/ds1/ds1"
 maxPages=481900
 port=11703
 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M"

 [DISK_STORE]
 Storage_Location=/datastore/ds1/ds1
 Size=1000GB
 backend=dedup
 hostname=localhost
 port=11703
sm.ini

The RAM parameters for Java can be manually set in the sm.ini file. They override the automatically generated parameters from the drive .ini file. The recommended -Xmx value is ¼ (one quarter) of the available RAM. For example, if 16 GB is available, at least 4 GB (4096 MB) should be configured for the Si3 data store.

To obtain the default parameters used by Java on the target system, run the command java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).

max_pages

The second parameter (max_pages) is directly related to the Java memory parameter. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages.

The max_pages value is stored in the SEP sesam database in the drive's hw_drives.block_size field and is dynamically increased whenever necessary. The parameter is calculated with (hw_drives.block_size (*100)) and then copied to the drive configuration .ini file. If you have problems with the index, please contact SEP sesam support.

Advanced CLI Administration

Si3's main maintenance tasks are garbage collection (gc) and file system check (fsck) and run automatically. Garbage collection (gc) is started by sm_start during SEP sesam newday. The file system check (fsck) is carried out at regular intervals (again and again) and automatically with every backup.

The new generation of Si3 deduplication store has two types of file system check (fsck): object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

You can check their status or start/stop the tasks manually.

sm_dedup_interface

This is the main utility for configuring and managing data stores. Below is a list of some commands and their usage.

Note
Depending on the deduplication store used, Si3 V1 or Si3, some of the commands may be slightly different. When relevant, both command versions are described.
sm_dedup_interface -d <datastore> <command>
  - purge
  - objectinfo <remote filename>
  - put <input filename> <dest filename>
  - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]]
  - delete <remote filename> [<filename 2>]*
  - getlabel
  - getuuid
  - list
  - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>]
  - gc <start|stop|status|result>
  - key <set <key> <value>|get <key>|list>
  - log@server <msg>
  - propose serverconfig <repository netto GiB>
  - propose jvmconfig <repository netto Gib> (for Si3 V1 store; slightly different usage for Si3, see Notea)
  - snapshot
  - replicate from [-f] <remote hostname> <remote port> <remote filename>
  - replicate show
  - replicate abort <task id>
Notea

Depending on the deduplication store used, Si3 V1 or Si3, the command to find out how much RAM is needed at what capacity of data store differs slightly. Example:

Si3
Use the command sm_dedup_interface -T dedup2 propose jvmconfig <Si3_capacity>.
Si3 V1
Use the command sm_dedup_interface propose jvmconfig <Si3_capacity>.

The output of MaxDirectMemorySize is the required RAM value.
Note, however, that SEP sesam calculates the RAM consumption and uses these commands in the background. It is usually not needed to set the values manually. These manual changes are overwritten with the next drive configuration.
The index calculation is also associated with the command. If the index grows and is 95% full, backups can no longer be performed. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages. To solve the problems with the growing index, refer to Si3 Deduplication Troubleshooting.

Specific options

Most of the parameters are for internal use only.

status

Provides information about used space, stored data, label uuid and running processes (gc or fsck), etc.

The value Overall dedup ratio shows by how many percent the stored data has been reduced.

gc start
  • Starts the garbage collection.
  • Identifies unreferenced chunks and moves them to the trash.
  • Is started by SEP sesam with sm_start.
gc stop
  • Stops the garbage collection.
  • Can be restarted later.
gc status
Si3 gc status output example
sm_dedup_interface -d 3 gc status
Current gc status:
 State:                       Finished
 Started:                     2022-03-07 08:10:56
 Ended:                       2022-03-07 10:00:15
 Message:                     Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
STATUS=SUCCESS MSG=Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
get
  • Reads an object (file, saveset) from the deduplication store.
  • '-' can be used to specify STDIN.
put
  • Writes an object (file, saveset) to the deduplication store.
  • '-' can be used to specify STDOUT.
fsck
  • Starts a data store check.
  • Must be started manually.
  • If the parameter autopurge is set, all corrupted objects are deleted.
fsck status
Displays the current state or the state of the most recent data store check.

The Si3 deduplication store has two types of fsck: object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

purge
  • Deletes all pages marked as obsolete (empty trash) by the last run of garbage collection (gc).
  • Is started by sm_start after a SEP sesam day change.
  • getlabel and getuuid can be replaced with status

Logging

The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.

Logging info
  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • /var/opt/sesam/var/log/sms contains two log files:
    • sm_dedup_server_info-<drive>.log: INFO level and higher.
    • sm_dedup_server-<drive>.log: DEBUG and higher. This file will become quite large.
    • sm_dedup_gc-<drive>.log: garbage collection log.
    • sm_dedup_fsck-<drive>.log: file system check log.
  • Auto rotation if the log file size reaches 100 MB.

Files and directories

Objects

For every SEP sesam saveset, three objects (files) are stored in the Si3 store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.

Directories

The path <repo root path>/Si3-POOL/Si3-POOL00001/ is a legacy SEP sesam data store path and has nothing to do with the Si3 store. It will be removed in the future.


What is next?

After configuring the Si3/Si3 NG deduplication store, first configure the media pool(s) and then set up your backup strategy.

See also

Configuring Si3 NG Deduplication StoreConfiguring Source-side DeduplicationReplicationSi3 Deduplication TroubleshootingSEP Tachometer (only for Si3 store)

Copyright © SEP AG 1999-2024. All rights reserved.
Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.