neosync sync
Overview
Learn how to sync data to a local destination with the neosync sync CLI command.
The neosync sync
command is used to sync data from a neosync connection to a local destination.
Supported sources are currently postgres, mysql connections and AWS S3 Sync Job.
Supported are currently postgres and mysql.
Usage
neosync sync
Options
The following options can be passed using the neosync sync
command:
General Options
--api-key
- Neosync API Key. Takes precedence over$NEOSYNC_API_KEY
--config
- Path to yaml config. Defaults toneosync.yaml
in current directory.--connection-id
- Neosync connection id for sync data source. Takes precedence over config.--job-id
- Neosync job id for sync data source. For [AWS S3, GCP Cloud Storage] jobs only. Takes precedence over config.--job-run-id
- Neosync job run id for sync data source. For [AWS S3, GCP Cloud Storage] jobs only. Takes precedence over config.--output
- Sets output type (auto, plain, tty). (defaultauto
).--debug
- Sets the log level to debug and prints much more information. Works best with--output plain
.
SQL Destination Options
--destination-connection-url
- Local destination connection url to sync data to. Takes precedence over config.--destination-driver
- Destination connection driver (postgres, mysql). Takes precedence over config.--truncate-before-insert
- Truncates the table before inserting data. This will not work with Foreign Keys.--truncate-cascade
- Truncate cascades to all tables. Only supported for postgres.--init-schema
- Creates the table schema and its constraints.--on-conflict-do-nothing
- If there is a conflict when inserting data into SQL database do not insert.
SQL Connection Pool Options
--destination-idle-duration
- Maximum amount of time a connection may be idle (e.g. '5m')--destination-idle-limit
- Maximum number of idle connections--destination-open-duration
- Maximum amount of time a connection may be open (e.g. '30s')--destination-open-limit
- Maximum number of open connections
Batch Processing Options
--destination-max-in-flight
- Maximum allowed batched rows to sync. If not provided, uses server default of 64--destination-batch-count
- Batch size of rows that will be sent to the destination.--destination-batch-period
- Duration of time that a batch of rows will be sent.
AWS DynamoDB Destination Options
--aws-access-key-id
- AWS Access Key ID for DynamoDB--aws-secret-access-key
- AWS Secret Access Key for DynamoDB--aws-session-token
- AWS Session Token for DynamoDB--aws-role-arn
- AWS Role ARN for DynamoDB--aws-role-external-id
- AWS Role External ID for DynamoDB--aws-profile
- AWS Profile for DynamoDB--aws-endpoint
- Custom endpoint for DynamoDB--aws-region
- AWS Region for DynamoDB
Yaml Config File
To persist settings, a yaml config may be enabled. It can be provided like so:
neosync sync --config ./path/to/config.yaml
NB: Flags will take precedence over values provided in the config.
source:
connection-id: d9dc020d-746b-48c1-9319-a165a25ac32e
connection-opts:
# only used if source is AWS S3
job-run-id: 43a4aac5-c4a8-4e4f-8554-03b7c5fffc04-2024-06-14T17:43:24Z
destination:
# SQL destination configuration
connection-url: user:pass@tcp(127.0.0.1:3306)/database
driver: mysql
truncate-before-insert: false
truncate-cascade: false
init-schema: false
on-conflict:
do-nothing: false
do-update:
enabled: false
connection-opts:
open-limit: 25 # remove to unset and use system default (CLI falls back to default of 25 if not provided)
idle-limit: 2 # remove to unset and use system default
idle-duration: 30s # remove to unset and use system default
open-duration: 5m # remove to unset and use system default
max-in-flight: 10
batch:
count: 100
period: 5s
# AWS DynamoDB destination configuration
aws-dynamodb-destination:
aws-cred-config:
region: us-west-2
access-key-id: your-access-key
secret-access-key: your-secret-key
session-token: your-session-token
role-arn: your-role-arn
role-external-id: your-external-id
endpoint: http://localhost:8000
profile: default
Circular Dependencies
Support for Circular Dependencies: The CLI sync feature in Neosync is capable of managing both self-referencing circular dependencies and those involving multiple tables. In scenarios where the source data is not from a SQL database (like AWS S3) but the destination is a SQL database, Neosync utilizes the foreign key constraints of the destination SQL database to effectively insert data. This approach ensures data integrity and respects the relational structure of the SQL database.
Nullable Columns: For circular dependencies to work, at least one table involved in the dependency must have a column that is nullable.
Foreign Key Dependencies and Table Constraints: While a CLI sync does not modify table constraints, it synchronizes data based on foreign key dependencies.
Data Insertion and Updating Process: Sync jobs first performs an initial data insertion. Subsequently, it updates the columns involved in the circular dependency.
Syncing from AWS S3
To synchronize data from a Neosync job with AWS S3 as the destination, you must provide either a job ID or job run ID. Using a job ID will sync data from the most recent job run.
During this process, the table constraints from the destination-connection-url
database are used to determine the correct order for syncing the data. This ensures that the data is
synchronized in a way that respects the relational structure and integrity of the local database.