Skip to main content

Import files to a dataset

You just created a dataset. You now need to upload some files into it.

Get access and secret keys#

note

For now, access and secret keys are restricted to a single dataset. We plan to allow the creation of project and organization-based keys in the near future.

Datasets, under-the-hook, S3-compatible buckets. You can use all the tools of the S3 environment. In order to read, write or delete files to your datasets, you need to get the dataset access and secret keys.

Quick guide#

Step-by-step#

  1. Login/sign up to your csquare account.
  2. Select the project and got to the dataset page.
  3. Copy your access and secret key.

Using the csquare's explorer#

Quick guide#

Step-by-step#

  1. Select the project you want to work with.
  2. Go to the Datasets page using the sidebar.
  3. Click on the ID of the dataset you want to work with to access the file explorer.
  4. Drag-and-drop the files to upload in the explorer.

You just updated your dataset. Well done!

Using the s3cmd command line interface#

Installation#

You can also use the famous s3cmd CLI to upload files to your datasets. You can install s3cmd from the (official s3cmd website)[https://s3tools.org/s3cmd]. You can check the installation status of s3cmd by running:

CLI
s3cmd --help

which should output:

Usage: s3cmd [options] COMMAND [parameters]
S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.
Options:
-h, --help show this help message and exit
--configure Invoke interactive (re)configuration tool. Optionally
use as '--configure s3://some-bucket' to test access
to a specific bucket instead of attempting to list
them all.
-c FILE, --config=FILE
Config file name. Defaults to $HOME/.s3cfg
--dump-config Dump current configuration after parsing config files
and command line options and exit.
--access_key=ACCESS_KEY
AWS Access Key
--secret_key=SECRET_KEY
AWS Secret Key
[...]

Configuring s3cmd#

Once you have a working CLI, you can configure s3cmd by running:

s3cmd --configure

and answer the interactive questions:

Access key
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: [YOUR ACCESS KEY]
Secret key
Secret Key: [YOUR SECRET KEY]
Default region
Default Region [US]: ch-dk-2
S3 endpoint
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: storage.csquare.run
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: storage.csquare.run

You can then use the default configuration values by hitting Enter as many times as required.

Keys validation
Test access with supplied credentials? [Y/n]
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Do not forget to save your configuration.

Save your configuration
Save settings? [y/N] y
Configuration saved to '/home/[you]/.s3cfg'

You can now test the connection by listing the associated buckets:

user@computer:~$ s3cmd ls
2020-11-22 21:32 s3://mnist

All the available commands are accessible by running s3cmd --help or on the s3cmd documentation.

Basic usage#

Copy a directory to a dataset
s3cmd put --recursive [path] s3://[dataset]

Advanced usage#

If you do not want to save your s3cmd configuration or want to use s3cmdn without the --configure flag, you can pass the following options.

Usage without configuration
s3cmd --access_key=[access-key] --secret_key=[secret-key] --host=storage.csquare.run --host-bucket=storage.csquare.run [command] [args]

Using the s5cmd command line interface#

s5cmd is a more powerful than s3cmd and supports parallelization of the uploads and has in general much better performances than most of the other S3 CLI clients. However, usage of s5cmd is a bit more complex than s3cmd because it depends on the AWS CLI configuration files.

Installation#

  1. Install the AWS CLI version 2 (no AWS account is required), then check the AWS CLI installation via the following command:
Check AWS CLI installation
aws --version
  1. Install s5cmd: follow the installation instructions, ensure that s5cmd is properly accessible with:
Check s5cmd installation
s5cmd --help

Configuring s5cmd#

Configure your dataset access and secret keys. In this example, we are using the dataset my-dataset with its access key C2-2zJBcUJTG0GIHbvpsm9xe and secret key 9sfHEWO11HCj8yvBdVevdd.

Setup your access and secret keys
aws configure --profile my-dataset
> AWS Access Key ID [None]: C2-2zJBcUJTG0GIHbvpsm9xe
> AWS Secret Access Key [None]: 9sfHEWO11HCj8yvBdVevdd
> Default region name [None]:
> Default output format [None]:

Usage#

Use s5cmd using the AWS_PROFILE environment variable and --endpoint-url=https://storage.csquare.app flag:

s5cmd usage
export AWS_PROFILE="my-dataset"
s5cmd --endpoint-url=https://storage.csquare.app cp ./ s3://my-dataset

Refer to the s5cmd documentation for more information.