Tools: Latest: How to Sync Data from Elasticsearch to Elasticsearch
Overview
Highlights
Elasticsearch Plugin
Trigger Data Scanning
Procedure
Step 2: Install BladePipe
Step 4: Create a DataJob Elasticsearch is a popular search engine that forms part of the modern data stack alongside relational databases, caching, real-time data warehouses, and message-oriented middleware. While writing data to Elasticsearch is relatively straightforward, real-time data synchronization can be more challenging. This article describes how to migrate and sync data from Elasticsearch to Elasticsearch using BladePipe and the Elasticsearch incremental data capture plugin. Elasticsearch does not explicitly provide a method for real-time change data capture. However, its plugin API IndexingOperationListener can track INDEX and DELETE events. The INDEX event includes INSERT or UPDATE operations, while the DELETE event refers to traditional DELETE operations. Once the mechanism for capturing incremental data is established, the next challenge is how to make this data available in downstream tools. We use a dedicated index, cc_es_trigger_idx, as a container for incremental data. This approach has several benefits: The structure of the cc_es_trigger_idx index is as follows, where row_data holds the data after the INDEX operations, and pk stores the document _id. As for the incremental data generated by using the Elasticsearch plugin, simply perform batch scanning in the order of the scn field in the cc_es_trigger_idx index to consume the data. The coding style for data consumption is consistent with that used for the SAP Hana as a Source. Elasticsearch strictly identifies third-party packages that plugins depend on. If there are conflicts or version mismatches with Elasticsearch's own dependencies, the plugin cannot be loaded. Therefore, the plugin must be compatible with the exact version of Elasticsearch, including the minor version. Given the impracticality of releasing numerous pre-compiled packages and to encourage widespread use, we place the open-source plugin on GitHub. Follow the instructions in Preparation for Elasticsearch CDC to install the incremental data capture plugin. Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker. Note
In the Specification settings, make sure that you select a specification of at least 1 GB. Allocating too little memory may result in Out of Memory (OOM) errors during DataJob execution. NoteIf you need to select specific fields for synchronization, you can first create the index on the target Elasticsearch instance. This allows you to define the schemas and fields that you want to synchronize. NoteThe DataJob creation process involves several steps. Click Sync Settings > ConsoleJob, find the DataJob creation record, and click Details to view it. The DataJob creation with a source Elasticsearch instance includes the following steps: Note
Once the DataJob is created and started, BladePipe will automatically run the following DataTasks: Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse