Source:Troubleshooting Si3 Deduplication

From SEPsesam
Other languages:

Si3 Deduplication

Unable to establish connection to S3 data store

Problem

Si3 NG data store may be unable to establish secure connection to S3 storage with the following error:

Error: Could not access data store. Server Status: 2023-03-30 10:17:10: ERROR Not started due to error: S3 is not connected Server Status: 2023-03-30 10:17:10: ERROR Not started due to error: S3 is not connected

Cause

In case Si3 NG data store connects to a storage provider that uses a self-signed certificate, this certificate is not recognized as trustworthy by default because it is not issued by a trusted certificate authority. This can result in connection being denied and log files in /var/opt/sesam/var/log/sms may contain a log message similar to this:

[...default-dispatcher-6] [1;31mERROR[0;39m [36mS3[0;39m - Unexpected error: {}, cause: {}
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: javax.net.ssl.SSLHandshakeException: General OpenSslEngine problem

Solution

To solve this problem use the keytool utility to import the public.crt certificate to the server certificate store. This will allow the Si3 server to recognize and trust the S3 storage provider's certificate, and establish a secure connection.

  1. Obtain the public certificate. Note that you can export it from the browser.
  2. Locate the cacerts file on your server. This is the location of your JVM certificate keystore.
  3. Import the public.crt certificate into the JVM's certificate keystore with the following command:
on Linux:
keytool -import -trustcacerts -keystore /var/lib/ca-certificates/java-cacerts -storepass changeit  -noprompt -alias <storage backend endpoint URL> -file /<path_to_certificate>/public.crt
on Windows:
C:\Program Files\ojdkbuild\java-11-openjdk-11.0.15-1\bin>keytool -import -trustcacerts -keystore "C:\Program Files\ojdkbuild\java-11-openjdk-11.0.15-1\lib\security\cacerts" -storepass changeit -noprompt -alias <storage backend endpoint URL> -file <path_to_certificate>\public.crt

Issues with S3 or S3-compatible storage

Problem

  • Si3 NG data store using S3 or S3-compatible storage can experience various issues, depending on cloud storage provider. These issues can affect backups, migrations, and replications. In addition, sanity state check of Si3 NG could report errors that have similar root cause.

Cause

  • Some cloud storage providers (for example, Wasabi) have request rate restrictions (how many HTTP(S) requests are allowed per second). Also on local storage with S3 option enabled, when multiple RDSs access the same local S3 storage, this can generate a lot of IOPS (I/Os per second).

Solution

  • You can adjust the settings on the affected Si3 NG data store:
  1. In the Main selection -> Components, click Data Stores to display the data store contents frame.
  2. Right-click the selected Si3 NG data store and then click Properties.
  3. Double-click a drive to open Drive Properties dialog, and then in Options field enter as follows:
dedup.s3.timeoutInSeconds=1200,dedup.s3.page.workers=2,dedup.maxAsyncRequests=50
This will increase the timeout period, active page workers and request rate.

Si3 remains in "shutting down" state

Problem

  • Manually stopping Garbage Collection (GC) fails and consequently Si3 remains in the "shutting down" state.

Solution

  • Restart the Si3 daemon by using sm_main restart sds. For more details on stopping and starting the SEP sesam services, see How to Start and Stop SEP sesam.

Si3 deduplication may not work with NFSv4

Problem

  • Si3 deduplication may not work with Network File System version 4 (NFSv4).

Cause

  • SEP sesam operations, such as backup, restore and migration, may fail due to Java problems with NFSv4.

Solution

  • To avoid this problem, connect your backup devices via NFSv3.

Repairing corrupted Si3 NG data store

You can repair the Si3 NG store when pages or objects get corrupted.

  1. First determine the scope of corruption:
    • To get the list of corrupted objects use:
      sm_dedup_interface -d <datastore> corruptedobjects
    • To get the list of corrupted pages use:
      sm_dedup_interface -d <datastore> corruptedpages
  2. Use the following command to replace the page in /pages directory with an older version from /pages-trash directory:
    sm_dedup_interface -d <datastore> repair pages
    The pages in trash contain all chunks deleted on previous GC. The oldest version of a page takes priority.
  3. Use the following command to search for and recover the missing chunks in /pages-trash directory:
    sm_dedup_interface -d <datastore> repair start
    During the repair process a new page is created, which contains all chunks from the current page (page affected by 'missing chunks' issue) and all chunks found in the trash.

Cleanup of unrecoverable Si3 NG store

SEP Warning.png Warning
You should use the commands described in this section only in case the corrupted store cannot be recovered.

When corruptions in the Si3 NG store persist, the initial page version has already been purged from trash or there were fatal errors during backup or restore. In this case broken pages or missing chunks cannot be recovered.

Cleanup can be performed by deleting unrecoverable objects manually or by using the automatic cleanup function.

Deleting objects

When there are only a few unrecoverable objects, delete each object with the following commands:

sm_dedup_interface -d <datastore> delete corruted_object_id_1

...

sm_dedup_interface -d <datastore> delete corruted_object_id_Nth

In case of many corruptions you can delete all corrupted objects using the following command:

sm_dedup_interface -d <datastore> fsck purge
Garbage collection

When you have deleted all unrecoverable objects, run garbage collection (gc):

sm_dedup_interface -d <datastore> gc start
Automatic cleanup function

To start an automatic cleanup function, use the following command:

sm_dedup_interface ... fsck purge auto

The automatic cleanup function runs the following sequence of commands: PCCK start -> OCCK start -> Delete all corrupted objects -> GC start.

Logging

The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.

Logging info
  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • /var/opt/sesam/var/log/sms contains two log files:
    • sm_dedup_server_info-<drive>.log: Log level INFO and higher.
    • sm_dedup_server-<drive>.log: Log level DEBUG and higher. This file can become quite large.
    • sm_dedup_gc-<drive>.log: garbage collection log.
    • sm_dedup_fsck-<drive>.log: file system check log.
  • Auto rotation if the log file size reaches 100 MB.

Files and directories

Objects

For every SEP sesam saveset, three objects (files) are stored in the Si3 NG store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.

Directories
Copyright © SEP AG 1999-2024. All rights reserved.
Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.