5 1 0:Administering Si3 Deduplication Store
Overview
The Si3 data store is set up on a dedicated Linux server with SEP sesam installed. SEP sesam provides command utilities for configuring and managing Si3 data stores.
The main maintenance tasks on Si3 data store are garbage collection (gc) and file system check (fsck). These tasks are run automatically. Garbage collection (gc) is started by sm_start during SEP sesam newday. The file system check (fsck) is performed repeatedly in regular intervals and automatically with every backup.
The Si3 has two types of file system check (fsck): object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.
SEP sesam provides the sm_dedup_interface utility for configuring and managing Si3 data stores, and for recovering corrupted Si3 data stores.
Note | |
ou must have SEP sesam administrator privileges to run SEP sesam CLI commands and use the command prompt as an administrator. All commands are run from the <SESAM_ROOT>/bin/sesam/ directory. If you want to execute SEP sesam commands globally (and not from the actual run directory), set the SEP sesam profile as described in What happens when I set a profile?.
|
Administering Si3 data store
To perform administrative tasks and manage the Si3 data store you can use the sm_dedup_interface utility. Below is a list of some commands and their usage.
The general syntax for the sm_dedup_interface commands is:
sm_dedup_interface -d <datastore> <command>
The following commands are available:
- purge - objectinfo <remote filename> - put <input filename> <dest filename> - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]] - delete <remote filename> [<filename 2>]* - getlabel - getuuid - list - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>] - gc <start|stop|status|result> - key <set <key> <value>|get <key>|list> - log@server <msg> - propose serverconfig <repository netto GiB> - propose jvmconfig (see Notea) - snapshot - replicate from [-f] <remote hostname> <remote port> <remote filename> - replicate show - replicate abort <task id>
To find out how much RAM is needed at what capacity of Si3, use the following command:
sm_dedup_interface -T dedup2 propose jvmconfig <Si3_capacity>
The output of MaxDirectMemorySize is the required RAM value.
Note however, that SEP sesam calculates the RAM consumption and uses these commands in the background. It is usually not needed to set the values manually. These manual changes are overwritten with the next drive configuration.
The index calculation is also associated with the command. If the index grows and is 95% full, backups can no longer be performed. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages. To solve the problems with the growing index, refer to Si3 Deduplication Troubleshooting.Specific options
Most of the parameters are for internal use only.
- status
- Provides information about used space, stored data, label uuid and running processes (gc or fsck), etc.
Si3 status output example
sm_dedup_interface -d 3 status Server Status: Repository information: 2022-03-07 16:01:31 Start time: 2022-02-22 16:32:15 Server: localhost:11704 Path: /srv/single_disk/Si3-b11 Version: Version: Si3 Branch: 4321a7ba7bafbfb7e9a186a3821b0e0bf08d19bc Build: 4321a7b Commit: 2022-02-09 15:37:49 Build date: 2022-02-09 15:41:18 UUID: 5e999930-bd3f-11ea-8471-b79d351122df Label: Si3-b11 PCCK process status: not running: No items found to process: Stop time: 2022-03-07 16:00:48 (Started: 2022-03-07 16:00:48) OCCK process status: not running: No items found to process: Stop time: 2022-03-07 16:00:47 (Started: 2022-03-07 16:00:47) GC process status: not running: Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]: Stop time: 2022-03-07 10:00:15 (Started: 2022-03-07 08:10:56) Bytes in repository: 534.49 GiB Bytes delete pending: 159.60 GiB Pages dir size: 534.42 GiB Object dir size: 0.45 GiB Trash dirs size: 159.60 GiB Active tasks: All: 0, Backup: 0, Restore: 0, GC: 0, OCCK: 0, PCCK: 0 Sanity state: OK JVM arguments: -Xmx3335M, -Dlogback.configurationFile=/var/opt/sesam/var/ini/sm_sdslog2.xml, -Dgv_rw_stpd=/var/opt/sesam/var/log/sms, -Dlogs.dir=/var/opt/sesam/var/log/sms, -Ddrive_num=3, -Dconfig.inifile=/var/opt/sesam/var/ini/stpd_conf/Si3-b11_3.ini Recommended JVM arguments: -Xmx3312M Si3-storage: Bytes All: 1999421108224, Use: 736072216576, Free: 1263348891648, Used: 36% Index information: Size: 0.34 GiB Utilization: 57.35% (32890421/57344000) Reindex: - Object information: Objects stored: 36090 Data before deduplication: 10.66 TiB Overall DeDup ratio: 1 / 20.32 Saved storage space: 95.08 % S3 information: State: OFF Bucket:
The value Overall dedup ratio shows by how many percent the stored data has been reduced.
- gc start
- Starts the garbage collection.
- Identifies unreferenced chunks and moves them to the trash.
- Is started by SEP sesam with sm_start.
- gc stop
- Stops the garbage collection.
- Can be restarted later.
- gc status
Si3 status output example
sm_dedup_interface -d 3 gc status Current gc status: State: Finished Started: 2022-03-07 08:10:56 Ended: 2022-03-07 10:00:15 Message: Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0] STATUS=SUCCESS MSG=Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
- get
- Reads an object (file, saveset) from the deduplication store.
- '-' can be used to specify STDIN.
- put
- Writes an object (file, saveset) to the deduplication store.
- '-' can be used to specify STDOUT.
- fsck
- Starts a data store check.
- Must be started manually.
- If the parameter autopurge is set, all corrupted objects are deleted.
- fsck status
- Displays the current state or the state of the most recent data store check.
The Si3 deduplication store has two types of fsck: object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.
Si3 fsck status output example
sm_dedup_interface -d 3 fsck status Current occk status: Mode: Incremental. Since 2022-03-07 09:18:25 State: Finished Started: 2022-03-07 16:01:53 Ended: 2022-03-07 16:01:53 Last Full successful: 2022-01-04 10:41:20 Message: No items found to process Previous error: - Current pcck status: Mode: Incremental. Since 2022-03-07 09:58:48 State: Finished Started: 2022-03-07 16:01:53 Ended: 2022-03-07 16:01:53 Last Full successful: 2022-01-04 11:58:25 Message: No items found to process Previous error: -
- purge
- Deletes all pages marked as obsolete (empty trash) by the last run of garbage collection (gc).
- Is started by sm_start after a SEP sesam day change.
- getlabel and getuuid can be replaced with status.
Repairing corrupted Si3 data store
You can repair the Si3 store when pages or objects get corrupted.
- First determine the scope of corruption:
- To get the list of corrupted objects use:
sm_dedup_interface -d <datastore> corruptedobjects
- To get the list of corrupted pages use:
sm_dedup_interface -d <datastore> corruptedpages
- To get the list of corrupted objects use:
- Use the following command to replace the page in /pages directory with an older version from /pages-trash directory:
sm_dedup_interface -d <datastore> repair pages
The pages in trash contain all chunks deleted on previous GC. The oldest version of a page takes priority. - Use the following command to search for and recover the missing chunks in /pages-trash directory:
sm_dedup_interface -d <datastore> repair start
During the repair process a new page is created, which contains all chunks from the current page (page affected by 'missing chunks' issue) and all chunks found in the trash.
Cleanup of unrecoverable Si3 store
Warning | |
You should use the commands described in this section only in case the corrupted store cannot be recovered. |
When corruptions in the Si3 store persist, the initial page version has already been purged from trash or there were fatal errors during backup or restore. In this case broken pages or missing chunks cannot be recovered.
Cleanup can be performed by deleting unrecoverable objects manually or by using the automatic cleanup function.
- Deleting objects
When there are only a few unrecoverable objects, delete each object with the following commands:
sm_dedup_interface -d <datastore> delete corruted_object_id_1 ... sm_dedup_interface -d <datastore> delete corruted_object_id_Nth
In case of many corruptions you can delete all corrupted objects using the following command:
sm_dedup_interface -d <datastore> fsck purge
- Garbage collection
When you have deleted all unrecoverable objects, run garbage collection (gc):
sm_dedup_interface -d <datastore> gc start
- Automatic cleanup function
To start an automatic cleanup function, use the following command:
sm_dedup_interface ... fsck purge auto
The automatic cleanup function runs the following sequence of commands: PCCK start -> OCCK start -> Delete all corrupted objects -> GC start.
Logging
The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.
- Logging info
- gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
- /var/opt/sesam/var/log/sms contains two log files:
- sm_dedup_server_info-<drive>.log: Log level INFO and higher.
- sm_dedup_server-<drive>.log: Log level DEBUG and higher. This file can become quite large.
- sm_dedup_gc-<drive>.log: garbage collection log.
- sm_dedup_fsck-<drive>.log: file system check log.
- Auto rotation if the log file size reaches 100 MB.
Files and directories
- Objects
For every SEP sesam saveset, three objects (files) are stored in the Si3 store:
- <ssid>.data
- <ssid>.info
- <ssid>.info2
The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.
- Directories
See also
Configuring Si3 Deduplication Store – Configuring Source-side Deduplication – Replication – Si3 Deduplication Troubleshooting – Licensing