5 0 0:Source Side Deduplication
Overview
SEP sesam Si3 applies deduplication at the block level. In this deduplication technique, data is divided into blocks, which are then checked and duplicates are skipped. Only unique blocks are sent to storage. By eliminating redundant blocks, the size of the backed up data is reduced as no duplicate data is backed up. Storing the identical data only once results in reduced storage space requirements and network load as no duplicates are transferred over the network.
To enable the best possible scenarios for efficient data backup in different environments, SEP sesam offers a hybrid of both:
- target-based (Si3T) and
- source-based (Si3S) deduplication
Both methods use a configured Si3 deduplication data store that requires a special licence. See Licensing for details.
Deduplication store types
- Deprecated Si3 deduplication store
- As of SEP sesam v. 5.0.0 Jaglion, two Si3 deduplication store types are available. It is strongly recommended to use the newer type SEP Si3 NG deduplication store as the old generation Si3 deduplication store is deprecated. This means that the old generation Si3 is no longer being enhanced, but is still supported until further notice.
- Use the new Si3 NG deduplication store if the data is to be stored to S3 Cloud
- If you are using an old generation Si3 deduplication store with S3, you cannot restore from S3 using the GUI! See Enable Si3 NG setup on the same host to learn how to configure a new Si3 NG and an old Si3 on the same backup server or RDS to make the upgrade from Si3 to Si3 NG smoother.
- Advantages of the new generation Si3 NG data store
- Si3 NG is advantageous over the old Si3 store type as it offers better performance and resource savings. You can back up your data directly to S3 cloud storage and Azure storage and restore the items you want directly from there. It also provides a new immutable storage feature – SiS. For more details, see Configuring Si3 NG Deduplication Store.
Note that the instructions for source-side deduplication are the same for both types of deduplication store. Si3 NG is therefore not explicitly mentioned, but the term Si3 store is used for both types of deduplication store.
What is Si3 source deduplication (Si3S)
Si3 source deduplication means that data is deduplicated before it is sent over the network, making the backup extremely bandwidth efficient. During the backup, SEP sesam calculates the hash values of the data to be backed up on the client and queries the storage to determine whether the hash value of the block is already stored there. If it is, SEP sesam sends only the hash value; if not, it sends only changed or unknown blocks of the target Si3 dedup store to the backup server.
The advantage of Si3S deduplication is that only new or changed data is transferred to the backup server during the backup. This optimises bandwidth usage and requires less storage capacity. It can be used to minimize the data transferred during backup in situations where bandwidth is a problem and SEP sesam RDS cannot be used. See Deduplication for more details on recommended utilization of dedupe methods.
Not all data is suitable for deduplication: encrypted files, disk blocks with a non-standard size, etc. cannot be deduplicated. See Data Deduplication Use Cases for more information.
Note | |
Using source-side deduplication does not necessarily mean that the backup windows will be reduced. This actually depends on your data structure – note that hashing chunks of data is very CPU intensive and such backups might take even longer. You should consider which clients can be overloaded in this way. In general, source-based deduplication can be an excellent solution for environments with a low daily data change rate and low bandwidth between the backup server and the backed up client. |
Key features
Source-side deduplication is easy to configure and has the following advantages:
- Only new and unique data is backed up directly at the source.
- As less data is sent over the network, bandwidth is reduced.
- Reduced amount of required data storage.
Source-side deduplication can have the following disadvantages:
- The backup client can become overloaded and the backup window lengthens
- When used for virtual data centers where resources are shared between virtual machines, it can affect production workloads.
See Data Deduplication Use Cases for more information.
Prerequisites
Make sure that the following conditions are met before using deduplication:
- Check that the required license is installed.
- Si3S is supported on all available Linux (additional RDS required) and Windows operating systems. Si3S is already part of a SEP sesam Windows client package, but is not included in the Linux client package. To use it on Linux, you need to install SEP sesam RDS/Server to the Linux backup client. For details on the supported OS, see SEP sesam OS and Database Support Matrix.
- At least one Si3 deduplication store has to be configured on either a SEP sesam Server or SEP sesam Remote Device Server. For setup details, see Configuring Si3 Deduplication Store.
- Si3S increases the CPU overhead in the production environment to calculate hashes. The minimum requirements for the system which is going to be backed are:
- Minimum of 2 CPU cores
- 2 GB RAM
Limitations | |||
|
Configuring source-side deduplication
Configuring Si3S consists of 3 main steps:
- Creating a required backup environment with a deduplication store. Check the Si3 Deduplication Hardware Requirements and follow the step-by-step procedure as described in Configuring Si3 NG Deduplication Store in v. ≥ 5.0.0 Jaglion. For older, deprecated version see Configuring Si3 Deduplication Store.
- Once the Si3 deduplication store is created, configure the media pools.
- Set up your backup strategy by following the standard backup procedure: First create a backup task by selecting the data to be backed up, then determine when you want to back up your data and create a backup schedule, and then create a backup event. In this step, you also activate SEP Si3 source-side deduplication (see below).
Tip | |
You can use the Immediate Start button to enable Si3S and start your backup immediately. |
Creating a backup event with enabled Si3S
When you create a backup event, you also activate source-side deduplication.
- From Main Selection -> Scheduling -> Schedules, right-click the schedule for which you want to create a new event, then click New Backup Event.
- Under Sequence control, you can set the Priority of your backup event. For details, see Setting Event Priorities.
- Under Object, select the task or task group you want to link this event.
- Under Parameter, specify the Backup level.
- From the Media pool drop-down list, select the target media pool to which the data will be backed up. Note that you have to select the media pool that is combined with an Si3 deduplication store backend.
- Select the SEP Si3 Source Side Deduplication check box.
- Click OK to save the event.
Enabling and starting Si3S instantly
- From the menu bar, select Activities -> Immediate Start -> Backup.
- In the Immediate Start: Backup dialog, select a deduplication media pool as the backup target.
- The check box SEP Si3 Source Side Deduplication is shown: select it and click Start.
Verifying if Si3S is used
You can verify if source-side deduplication is being applied by selecting Job State -> Backups in the Main Selection window. The job state overview provides detailed information on the backup status and shows a ticked check box in the column Source Side Deduplication if source-side deduplication is being applied to a job. The Si3S status overview also provides information on the job status, deduplication rate, Si3S start and stop time, data size and throughput, assigned media pool, etc.
Tip | |
You can check the details of your backups online as well as start your backups immediately, restart failed backups, restore backups online and more by using Web UI. For details, see SEP sesam Web UI. |
Which network port is used for backups?
The client connects to the RDS or backup server using the following destination port: 11701 + the first dedup drive. For example, if the first dedup drive is 9, the client uses port 11710. Make sure the respective port is open in the firewall on the RDS or SEP sesam Server. You may need to manually detect and open the corresponding port. The source port is chosen randomly.
For more information, see also List of ports used by SEP sesam.
See also
Si3 Deduplication Hardware Requirements – Configuring Si3 NG Deduplication Store – Deduplication – Replication – List of Ports Used by SEP sesam – Licensing