Archive:SEP Si3 Deduplication Store
THE CONTENT OF THIS PAGE IS OUTDATED | |
SEP AG has discontinued support for obsolete SEP sesam versions. Instructions are still available for these SEP sesam products, however, SEP AG accepts no responsibility or liability for any errors or inaccuracies in the instructions or for the incorrect operation of obsolete SEP sesam software. It is strongly recommended that you update your SEP sesam software to the latest version. For the latest version of SEP sesam documentation, see documentation home. |
Requirements
This article describes configuration of SEP sesam Si3 deduplication store, introduced in SEP sesam version 4.4.2. Note that this is not the latest version of SEP sesam Si3 documentation and, as such, does not provide information on features introduced in 4.4.3 and described in Configuring an Si3 Deduplication Store.
Operating systems
Only 64 Bit platforms are supported. 32 Bit OS cannot handle more than 3.6 GB RAM without tricks.
- Windows 2008/2008 R2
- Windows 2012/2012 R2
- SLES 11/12
- RHEL 6/7
- Debian Wheezy
Hardware
Here are the 'minimum' hardware requirements to operate SEP sesam Si3 deduplication server.
- Productive environments
- 16 GB RAM
- 4 CPU cores for one Si3 data store
- 1 TB free hard disk space
- Test environments only
- 8 GB RAM at least
- 2 CPU cores
- 1 TB
- Java – for details on the required Java version, see Java Compatibility Matrix. Because Si3 is not "mandatory" there is no dependency rule in RPM/DEB packages for it.
- Additional amount of RAM and CPU cores required for one Si3 data store. The TB value is the capacity of the Si3 data store:
10 TB: 2544M 20 TB: 4839M 30 TB: 7134M 40 TB: 9429M 50 TB: 11724M 60 TB: 14019M
- Amount of CPU cores required for one Si3 data store. The TB value is the amount of backed up date (before deduplication)!
Backed up data CPU cores 10 TB: 4 20 TB: 4 40 TB: 8 80 TB: 16 160 TB: 32 or 64
Attention: Please keep in mind that these figures only represent demand of the deduplication. The amount of memory for the operating system and other services has to be added too.
Configuration
Attention |
Note: |
The article Configuring an Si3 Deduplication Store describes the steps necessary to configure a Si3 Deduplication Store via SEP sesam GUI. |
The index size (max_pages) and the required main memory for java are the both important parameters for the Si3-T DataStore operation. Both parameters will be calculated and used during the creation of a Si3-T DataStore. The max_pages value will be dynamically increased if required.
For planning the hardware for a Si3-T DataStore server the command sm_dedup_interface can be used. As parameter the size of the DataStore partition (DataStore capacity) is required.
command:
sm_dedup_interface propose jvmconfig <value in GB>
sample:
DataStore Partition with 50 TB (50000 GB)
sm_dedup_interface propose jvmconfig 50000
bigsrv1:~ # sm_dedup_interface propose jvmconfig 50000 JAVA_OPTS=-Xmx1333M -XX:MaxDirectMemorySize=11724M
These JAVA memory parameters will be automatically inserted in the drive configuration file of the Si3 DataStore drive, which resides in <SESAM_VAR>/ini/stpd_conf folder.
stpd_conf
Si3 and stpd configuration will be saved in an INI file in gv_rw_ini:stpd_conf directory. File name is derived from hw_drives.device (DS@ds1_2) as for every other DS device. At the moment some information are duplicated, because used by Si3 server and stpd.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini [DEDUP] Backend=dedup Hostname=localhost defaultRepoPath="/datastore/ds1/ds1" maxPages=481900 port=11703 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M" [DISK_STORE] Storage_Location=/datastore/ds1/ds1 Size=1000GB backend=dedup hostname=localhost port=11703
You can manually set the JAVA memory parameters in the sm.ini file. The parameter in the sm.ini will override the automatically generated parameter from the drive ini file. The recommended -Xmx value is 1/4 of the available RAM. Meaning if 16GB RAM are available than 4GB (4096M) RAM should at least be configured for the Si3. To obtain the default parameters used by java on the target system use java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).
sm.ini (Linux and SEP sesam V4.4.2 Windows)
[Params] sds_jvm_options=-Xms1024M -Xmx4096M
sm.ini (SEP sesam V4.4.1 Windows)
[DEDUP] sds_jvm_options=-Xms1024M -Xmx4096M
Note |
|
sm.ini
[Params] sds_jvm_options=-Xmx1333M -XX:MaxDirectMemorySize=11724M
The second parameter, the max_pages is in direct relationship to the JAVA memory parameter. The RAM is required to hold the whole index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on it (the max_pages).
- maxPages
- The value is stored in the Sesam database in the field hw_drives.block_size of the drive. The max_pages parameter will be calculated (hw_drives.block_size (*100)) and then inserted in the drive configuration ini file (see above).
Administration / Tools
The two main maintenance tasks, garbage collection (gc) and file system check (fsck) will running automatically. It's possible to check the status of the tasks or start / stop the tasks manually.
sm_dedup_interface
Usage: sm_dedup_interface -d <command> Valid commands are: - status - purge - objectinfo <remote filename> - put <input filename> <dest filename> - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]] - delete <remote filename> [<filename 2>]* - getlabel - getuuid - list - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>] - gc <start|stop|status|result> - key <set <key> <value>|get <key>|list> - log@server <msg> - propose serverconfig <repository netto GiB> - propose jvmconfig <repository netto Gib> - snapshot - replicate from [-f] <remote hostname> <remote port> <remote filename> - replicate show - replicate abort <task id>
Most of the parameters are only for internal use or for future use.
- gc start
- Starts the garbage collection. Identifies unreferenced chunks and moves them into trash. Will be started from Sesam using sm_start.
- gc stop
- Stops the garbage collection. Can be restarted later again.
- get
- Reads an object (file, saveset) from deduplication store. '-' can be used to specify STDIN
- put
- Writes an object (file, saveset) into deduplication store. '-' can be used to specify STDOUT
- status
- Information about used space, saved data, label uuid, status if gc or fsck are currently running.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # sm_dedup_interface -d ds1_2 status INFO Successfully initialized i2dedup library version v2.1.0-SNAPSHOT5 Server Status: Repository information: Version: 2.1.1 UUID: 3b9ec2ae-34e1-11e3-b88b-001b2146 Label: ds1 Max Pages: 481900 Max Pages recommended: 154100 (-Xmx1010M -XX:MaxDirectMemorySize=603M) GC process status: not running: GC finished. Fsck process status: not running: Fsck finished. Interrupted: false. Total Runtime: 1296.68s Bytes in repository: 259.02 GB Bytes delete pending: 9.18 GB Object information: Objects stored: 258 Data before deduplication: 1541.56 GB Data after deduplication: 58.94 GB Overall DeDup ratio: 96.18 % Key-Values: No keys stored.
The Overall DeDup ratio is the value, how much the amount of stored data has been reduced.
- fsck
- Starts a datastore check. Has to be started manually at the moment. If parameter 'autopurge' is set all corrupted objects are deleted. Attention: Sesam doesn't get this information until now.
- fsck status
- Shows current state or state of latest datastore check
si3fix:/var/opt/sesam/var/log/sms # sm_dedup_interface -d Si3_5 fsck status INFO Successfully initialized i2dedup library version v2.0.0-beta2 Current fsck status: Message: Logfile check progress: Bytes: 1270925865083/1512422546049 Throughput: 91.25 MiB/s Running: yes Started: 2013-05-29 20:57:17 Ended: - Bytes Checked: 0 Bytes Lost: 0 Objects checked:
- purge
- Deletes all pages marked as obsolete by last GC run (empty trash). Will be started by sm_start after Sesam day change.
getlabel and getuuid can be replaced by status
meteorologix:/var/opt/sesam/var/ini # sm_main reload sds 2013-05-28 16:42:06: sm_main[5697] started 2013-05-28 16:42:06: Arguments: sm_main reload sds 2013-05-28 16:42:06: SDS Server: "java" -Xmx1700M -XX:MaxDirectMemorySize=900M -classpath "/opt/sesam/bin/sds/i2dedup-server.jar" \ -Dlogback.configurationFile=/va /opt/sesam/var/ini/sm_sdslog.xml -Dgv_rw_stpd=/var/opt/sesam/var/log/sms/ -Ddrive_num=31 \ -Dconfig.inifile="/var/opt/sesam/var/ini/stpd_conf/SI3_31.ini" i2.dedup.streaming.BinaryProtocolServer Requesting server shut down... 2013-05-28 16:42:07.587 [main] INFO i.d.streaming.BinaryProtocolServer$ - Welcome to SEP DeDup Service. Loading configuration... 2013-05-28 16:42:07.637 [main] DEBUG i2.dedup.streaming.ServerOptions$ - Loaded configuration from ini file: dedup { backend=dedup hostname=localhost defaultRepoPath=/srv/5tb/data/defaultrepo/SI3 maxPages=262143 port=11732 }
Logging
- gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
- quite powerful logback library is used
- for more information see: http://logback.qos.ch/
- 2 files (since version 4.4.1-22 4 files) in /var/opt/sesam/var/log/sms
- sm_dedup_server_info-<drive>.log: INFO level and higher
- sm_dedup_server-<drive>.log: DEBUG and higher. Will become quite large
- sm_dedup_gc-<drive>.log: garbage collection log
- sm_dedup_fsck-<drive>.log: file system check log
- Auto rotation if 100 MB log file size will be reached
For users which have updated from a version lower than 4.4.1-22 and the gc and fsck logs missing:
- copy /opt/sesam/skel/templates/sm_sds.xml to /var/opt/sesam/var/ini/
GUI
- Data store type: SEP Si3 Deduplication Store
- All values are positive numbers
- Low-water-mark is always 0. It doesn't make sense here
- Configured capacity is used for license
Files and directories
Objects
For every Sesam saveset 3 objects/files are stored in Si3 store:
- <ssid>.data
- <ssid>.info
- <ssid>.info2
.data and .info file are the same as for a normal DS. 'info2' file is necessary, because data couldn't be appended to a Si3 object. So all DB information not available before backup has finished will be written into this file.
Directories
- <repo root path>/Si3-POOL/Si3-POOL00001/: Legacy Sesam DS path. Has nothing to do with Si3 store and will be removed in a future release
Work Flow
- GC is started by 'sm_start' during Sesam newday
- Purge is started by 'sm_start' during Sesam newday