Archive:SEP Si3 Deduplication Store

From SEPsesam
Icon archived docs.png THE CONTENT OF THIS PAGE IS OUTDATED
SEP AG has discontinued support for obsolete SEP sesam versions. Instructions are still available for these SEP sesam products, however, SEP AG accepts no responsibility or liability for any errors or inaccuracies in the instructions or for the incorrect operation of obsolete SEP sesam software. It is strongly recommended that you update your SEP sesam software to the latest version. For the latest version of SEP sesam documentation, see documentation home.

Template:Copyright SEP AG en

Requirements

This article describes configuration of SEP sesam Si3 deduplication store, introduced in SEP sesam version 4.4.2. Note that this is not the latest version of SEP sesam Si3 documentation and, as such, does not provide information on features introduced in 4.4.3 and described in Configuring an Si3 Deduplication Store.

Operating systems

Only 64 Bit platforms are supported. 32 Bit OS cannot handle more than 3.6 GB RAM without tricks.

  • Windows 2008/2008 R2
  • Windows 2012/2012 R2
  • SLES 11/12
  • RHEL 6/7
  • Debian Wheezy

Hardware

Here are the 'minimum' hardware requirements to operate SEP sesam Si3 deduplication server.

Productive environments
  • 16 GB RAM
  • 4 CPU cores for one Si3 data store
  • 1 TB free hard disk space
Test environments only
  • 8 GB RAM at least
  • 2 CPU cores
  • 1 TB
  • Java – for details on the required Java version, see Java Compatibility Matrix. Because Si3 is not "mandatory" there is no dependency rule in RPM/DEB packages for it.
  • Additional amount of RAM and CPU cores required for one Si3 data store. The TB value is the capacity of the Si3 data store:
  10 TB:   2544M
  20 TB:   4839M
  30 TB:   7134M
  40 TB:   9429M
  50 TB:   11724M
  60 TB:   14019M
  • Amount of CPU cores required for one Si3 data store. The TB value is the amount of backed up date (before deduplication)!
Backed up data     CPU cores   
  10 TB:            4
  20 TB:            4
  40 TB:            8
  80 TB:           16
 160 TB:           32 or 64

Attention: Please keep in mind that these figures only represent demand of the deduplication. The amount of memory for the operating system and other services has to be added too.

Configuration

Attention
- Only one Si3 store can be configured on a Sesam server or Sesam RDS
- Only 2 drives may be configured

Note:

The article Configuring an Si3 Deduplication Store describes the steps necessary to configure a Si3 Deduplication Store via SEP sesam GUI.

The index size (max_pages) and the required main memory for java are the both important parameters for the Si3-T DataStore operation. Both parameters will be calculated and used during the creation of a Si3-T DataStore. The max_pages value will be dynamically increased if required.
For planning the hardware for a Si3-T DataStore server the command sm_dedup_interface can be used. As parameter the size of the DataStore partition (DataStore capacity) is required.

command: sm_dedup_interface propose jvmconfig <value in GB>
sample: DataStore Partition with 50 TB (50000 GB)
sm_dedup_interface propose jvmconfig 50000

bigsrv1:~ # sm_dedup_interface propose jvmconfig 50000 
JAVA_OPTS=-Xmx1333M -XX:MaxDirectMemorySize=11724M

These JAVA memory parameters will be automatically inserted in the drive configuration file of the Si3 DataStore drive, which resides in <SESAM_VAR>/ini/stpd_conf folder.

stpd_conf

Si3 and stpd configuration will be saved in an INI file in gv_rw_ini:stpd_conf directory. File name is derived from hw_drives.device (DS@ds1_2) as for every other DS device. At the moment some information are duplicated, because used by Si3 server and stpd.

 bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini
 [DEDUP]
 Backend=dedup
 Hostname=localhost
 defaultRepoPath="/datastore/ds1/ds1"
 maxPages=481900
 port=11703
 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M"
 
 [DISK_STORE]
 Storage_Location=/datastore/ds1/ds1
 Size=1000GB
 backend=dedup
 hostname=localhost
 port=11703

You can manually set the JAVA memory parameters in the sm.ini file. The parameter in the sm.ini will override the automatically generated parameter from the drive ini file. The recommended -Xmx value is 1/4 of the available RAM. Meaning if 16GB RAM are available than 4GB (4096M) RAM should at least be configured for the Si3. To obtain the default parameters used by java on the target system use java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).

sm.ini (Linux and SEP sesam V4.4.2 Windows)

 [Params]
 sds_jvm_options=-Xms1024M -Xmx4096M 

sm.ini (SEP sesam V4.4.1 Windows)

 [DEDUP]
 sds_jvm_options=-Xms1024M -Xmx4096M 
Note
In addition it's possible to set the max_pages manually, but it's not recommended. Do this only if advised by SEP support. Changing this value of an existing Si3 store will cause a complete re-build of index (which can need some time)!

sm.ini

 [Params]
 sds_jvm_options=-Xmx1333M -XX:MaxDirectMemorySize=11724M


The second parameter, the max_pages is in direct relationship to the JAVA memory parameter. The RAM is required to hold the whole index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on it (the max_pages).

maxPages
The value is stored in the Sesam database in the field hw_drives.block_size of the drive. The max_pages parameter will be calculated (hw_drives.block_size (*100)) and then inserted in the drive configuration ini file (see above).

Administration / Tools

The two main maintenance tasks, garbage collection (gc) and file system check (fsck) will running automatically. It's possible to check the status of the tasks or start / stop the tasks manually.

sm_dedup_interface

Usage: sm_dedup_interface -d  <command>
Valid commands are: 
  - status
  - purge
  - objectinfo <remote filename>
  - put <input filename> <dest filename>
  - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]]
  - delete <remote filename> [<filename 2>]*
  - getlabel
  - getuuid
  - list
  - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>]
  - gc <start|stop|status|result>
  - key <set <key> <value>|get <key>|list>
  - log@server <msg>
  - propose serverconfig <repository netto GiB>
  - propose jvmconfig <repository netto Gib>
  - snapshot
  - replicate from [-f] <remote hostname> <remote port> <remote filename>
  - replicate show
  - replicate abort <task id>

Most of the parameters are only for internal use or for future use.

gc start
Starts the garbage collection. Identifies unreferenced chunks and moves them into trash. Will be started from Sesam using sm_start.
gc stop
Stops the garbage collection. Can be restarted later again.
get
Reads an object (file, saveset) from deduplication store. '-' can be used to specify STDIN
put
Writes an object (file, saveset) into deduplication store. '-' can be used to specify STDOUT
status
Information about used space, saved data, label uuid, status if gc or fsck are currently running.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # sm_dedup_interface -d ds1_2 status
INFO  Successfully initialized i2dedup library version v2.1.0-SNAPSHOT5
Server Status: 
 Repository information:
  Version:                   2.1.1
  UUID:                      3b9ec2ae-34e1-11e3-b88b-001b2146
  Label:                     ds1
  Max Pages:                 481900
  Max Pages recommended:     154100 (-Xmx1010M -XX:MaxDirectMemorySize=603M)
  GC process status:         not running: GC finished.
  Fsck process status:       not running: Fsck finished. Interrupted: false. Total Runtime: 1296.68s
  Bytes in repository:           259.02 GB
  Bytes delete pending:            9.18 GB

 Object information:
  Objects stored:                   258
  Data before deduplication:    1541.56 GB
  Data after  deduplication:      58.94 GB
  Overall DeDup ratio:            96.18 %

 Key-Values:
No keys stored.

The Overall DeDup ratio is the value, how much the amount of stored data has been reduced.


fsck
Starts a datastore check. Has to be started manually at the moment. If parameter 'autopurge' is set all corrupted objects are deleted. Attention: Sesam doesn't get this information until now.


fsck status
Shows current state or state of latest datastore check
 si3fix:/var/opt/sesam/var/log/sms # sm_dedup_interface -d Si3_5 fsck status
 INFO  Successfully initialized i2dedup library version v2.0.0-beta2
 Current fsck status:
 Message:       Logfile check progress: Bytes: 1270925865083/1512422546049 Throughput: 91.25 MiB/s
 Running:       yes
 Started:       2013-05-29 20:57:17
 Ended:         -
 Bytes Checked: 0
 Bytes Lost:    0
 Objects checked:
purge
Deletes all pages marked as obsolete by last GC run (empty trash). Will be started by sm_start after Sesam day change.

getlabel and getuuid can be replaced by status

 meteorologix:/var/opt/sesam/var/ini # sm_main reload sds
 2013-05-28 16:42:06: sm_main[5697] started
 2013-05-28 16:42:06: Arguments: sm_main reload sds
 2013-05-28 16:42:06: SDS Server: "java" -Xmx1700M -XX:MaxDirectMemorySize=900M -classpath "/opt/sesam/bin/sds/i2dedup-server.jar" \
 -Dlogback.configurationFile=/va   /opt/sesam/var/ini/sm_sdslog.xml -Dgv_rw_stpd=/var/opt/sesam/var/log/sms/ -Ddrive_num=31 \
 -Dconfig.inifile="/var/opt/sesam/var/ini/stpd_conf/SI3_31.ini"    i2.dedup.streaming.BinaryProtocolServer
  Requesting server shut down...
 2013-05-28 16:42:07.587 [main] INFO  i.d.streaming.BinaryProtocolServer$ - Welcome to SEP DeDup Service. Loading configuration...
 2013-05-28 16:42:07.637 [main] DEBUG i2.dedup.streaming.ServerOptions$ - Loaded configuration from ini file: dedup {
 backend=dedup
 hostname=localhost
 defaultRepoPath=/srv/5tb/data/defaultrepo/SI3
 maxPages=262143
 port=11732
 }

Logging

  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • quite powerful logback library is used
  • for more information see: http://logback.qos.ch/
  • 2 files (since version 4.4.1-22 4 files) in /var/opt/sesam/var/log/sms
    • sm_dedup_server_info-<drive>.log: INFO level and higher
    • sm_dedup_server-<drive>.log: DEBUG and higher. Will become quite large
    • sm_dedup_gc-<drive>.log: garbage collection log
    • sm_dedup_fsck-<drive>.log: file system check log
  • Auto rotation if 100 MB log file size will be reached

For users which have updated from a version lower than 4.4.1-22 and the gc and fsck logs missing:

  • copy /opt/sesam/skel/templates/sm_sds.xml to /var/opt/sesam/var/ini/

GUI

  • Data store type: SEP Si3 Deduplication Store
  • All values are positive numbers
  • Low-water-mark is always 0. It doesn't make sense here
  • Configured capacity is used for license

Files and directories

Objects

For every Sesam saveset 3 objects/files are stored in Si3 store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

.data and .info file are the same as for a normal DS. 'info2' file is necessary, because data couldn't be appended to a Si3 object. So all DB information not available before backup has finished will be written into this file.

Directories

  • <repo root path>/Si3-POOL/Si3-POOL00001/: Legacy Sesam DS path. Has nothing to do with Si3 store and will be removed in a future release

Work Flow

  • GC is started by 'sm_start' during Sesam newday
  • Purge is started by 'sm_start' during Sesam newday