This Meerschaum plugin implements experimental synchronization strategies which are discussed in my master's thesis. Consult the presentation slides for a summary of the strategies or read below for a brief overview.
You can download the results tarball or browse the figures, which includes figures like the following:
The strategies are implemented in the methods.py
module and are organized into three classes:
- Simple Syncs
Minimize run-time and bandwidth at the expense of accuracy.- Simple Sync
Select all rows newer than the latest target datetime value. - Simple Backtrack Sync
Select rows newer than a “walked back” latest target datetime value. - Simple Slow-ID Sync
Select rows newer than the oldest datetime of each ID’s latest datetime values. - Simple Append Sync
Generate Simple Sync queries for each ID and append them into a single transaction. - Simple Join Sync
Left join a temporary table of latest datetime values to emulate Simple Sync per each ID.
- Simple Sync
- Iterative Syncs
Guarantee perfect accuracy by considering the entire datetime axis.- Iterative Simple Sync
For each partition of the datetime axis, compare row-counts and perform Simple Sync when row-counts differ. - Daily Row-Count Sync
Build a table of days’ row-counts and perform Simple Sync on days with differing row-counts. - Binary Search Sync
For each partition of the datetime axis, compare row-counts and recursively binary search partitions with different row-counts until sufficiently small intervals are encountered. Perform Simple Sync on the small intervals. - Iterative CPISync
For each partition of the datetime axis, compare row-counts and perform CPISync when row-counts differ.
- Iterative Simple Sync
- Corrective Syncs
Perform Simple Sync daily and an iterative strategy monthly.
mrsm install plugin syncx
Or clone into the plugins directory:
git clone https://github.com/bmeares/syncx ~/.config/meerschaum/plugins/syncx
Run the included start.sh
script followed by the number of the batch:
- Baseline (Naïve Sync vs Simple Sync)
- Simple Syncs
- Iterative Syncs
- Corrective Syncs
./start.sh 0 1 2 3
After running the simulations, generate the figures with the results.py
script. Figures will be placed in ~/syncx_results/
.
python results.py
Alternatively, use the scenarios
action which generates run-time, bandwidth, and accuracy data in the ~/.config/meerschaum/plugins/syncx/scenarios/
folder.
mrsm scenarios unknown-backlog --sync-methods simple naive
The scenarios
action supports the following flags:
scenario (positional arguments)
The scenarios to simulate.
Defaults to all scenarios.
Options:
- single-append-only
- multiple-large-n-append-only
- single-known-backlog
- unknown-backlog
--begin
The datetime to start the simulation.
Defaults to '2021-01-01 00:00:00'
--end
The datetime to end the simulation.
Defaults to '2022-01-01 00:00:00'
--iterations
How many times to run each simulation.
Results will be the average of all iterations
(to reduce noise in the run-time calculations).
Defaults to 1.
--source
SQLConnector of the source database
Defaults to 'sql:memory'
--target
SQLConnector of the target database
Defaults to 'sql:memory'
--debug
Verbosity toggle.
Defaults to `False.`
--sync-methods
Strategies to test.
Options:
- naive
### Simple Syncs
- simple
- simple-backtrack
- simple-slow-id
- append
- join
### Iterative Syncs
- unbounded-simple /
bounded-simple
- unbounded-cpi /
bounded-cpi
- unbounded-binary /
bounded-binary
- unbounded-daily-rowcount /
bounded-daily-rowcount
### Corrective Syncs
- simple-monthly-naive
- simple-monthly-iterative-simple /
simple-monthly-bounded-simple
- simple-monthly-cpi /
simple-monthly-bounded-cpi
- simple-monthly-binary /
simple-monthly-bounded-binary
- simple-monthly-daily-rowcount /
simple-monthly-bounded-daily-rowcount