1. Create an ext-fs connection
Amorphic provides functionality to create connections to import Dataset through a JDBC, S3 or ext-fs connection. Select connection as ext-fs.
- Connection Type: External File System (ext-fs)
- Connection Name: Connection name for the connection.
- Description: Description of the connection.
- Authorized Users: Select the users who are authorized to perform data ingestion using the connection.
- Host OS: Select the OS (Windows or Linux) from which the data is being ingested into S3.
After the connection is created, it will provide a command and link to download the csclone binary file. Download the file through the command or download link.
2. Create a Dataset
Once a connection is created, create a dataset with the ext-fs connection. After the dataset is created it provides a command to run the ingestion process.
<csclone-file> [FLAGS] [OPTIONS] --s3-bucket-name <s3-bucket-name> --source <source>
3. Start the Data ingestion
Copy the csclone file downloaded in step 1 into the source machine. Use the below commands for securely copying the files:
chmod 400 <csclone-file> (run this to protect the file against accidental overwriting)
scp <localmachine/path_to_the_file> <username>@<server_ip>:/<path_to_remote_directory>
Note: If you’re using an AWS EC2 as your source machine then use the following command to copy the file.
chmod 400 <csclone-file> (run this to protect the file against accidental overwriting)
scp -i <ec2_private_key>.pem <localmachine/path_to_the_file> <user>@<server_ip>:<path_to_remote_directory>
In source machine, add the execute permission to the file using the below command:
chmod +x <csclone-file>
Run the ingestion execute command (provided in dataset) in the source machine which will trigger a process thread. This thread will start to continuously monitor the path for the files and saved to S3.
Once the files are ingested, the dataset will be updated with the files.