This example uses the parquet format, to create parquet files in s3://bucket_name/path/to/files
, with each table placed in its own directory.
The (top level) spec section is described in the Destination Spec Reference.
It is also possible to use {{YEAR}}
, {{MONTH}}
, {{DAY}}
and {{HOUR}}
in the path to create a directory structure based on the current time. For example:
Other supported formats are json
and csv
.
The plugin needs to be authenticated with your account(s) in order to sync information from your cloud setup.
The plugin requires only PutObject
permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it PutObject
permissions.
There are multiple ways to authenticate with AWS, and the plugin respects the AWS credential provider chain. This means that CloudQuery will follow the following priorities when attempting to authenticate:
- The
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_SESSION_TOKEN
environment variables.
- The
credentials
and config
files in ~/.aws
(the credentials
file takes priority).
- You can also use
aws sso
to authenticate cloudquery - you can read more about it here (opens in a new tab).
- IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).
You can read more about AWS authentication here (opens in a new tab) and here (opens in a new tab).
Environment Variables
CloudQuery can use the credentials from the AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and
AWS_SESSION_TOKEN
environment variables (AWS_SESSION_TOKEN
can be optional for some accounts). For information on obtaining credentials, see the
AWS guide (opens in a new tab).
To export the environment variables (On Linux/Mac - similar for Windows):
Shared Configuration files
The plugin can use credentials from your credentials
and config
files in the .aws
directory in your home folder.
The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials
file.
For information about obtaining credentials, see the
AWS guide (opens in a new tab).
Here are example contents for a credentials
file:
You can also specify credentials for a different profile, and instruct CloudQuery to use the credentials from this profile instead of the default one.
For example:
Then, you can either export the AWS_PROFILE
environment variable (On Linux/Mac, similar for Windows):
IAM Roles for AWS Compute Resources
The plugin can use IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).
If you configured your AWS compute resources with IAM, the plugin will use these roles automatically.
For more information on configuring IAM, see the AWS docs here (opens in a new tab) and here (opens in a new tab).
User Credentials with MFA
In order to leverage IAM User credentials with MFA, the STS "get-session-token" command may be used with the IAM User's long-term security credentials (Access Key and Secret Access Key). For more information, see here (opens in a new tab).
Then export the temporary credentials to your environment variables.
Using a Custom S3 Endpoint
If you are using a custom S3 endpoint, you can specify it using the endpoint
spec option. If you're using authentication, the region
option in the spec determines the signing region used.
To configure CloudQuery to extract from GitHub, create a .yml
file in your CloudQuery configuration directory.
The following configuration will extract information from the cloudquery/cloudquery
repository:
You must specify either orgs
or repos
in the configuration. If a repository is specified in both orgs
and repos
, it will be extracted only once, and other repositories from that organization will be ignored.
It is recommended that you use environment variable expansion for the access token in production. For example, if the access token is stored in an environment variable called GITHUB_ACCESS_TOKEN
:
The GitHub source plugin supports two authentication methods: Personal Access Token and App authentication. Which one you use is up to and the security requirements of your organization.
CloudQuery requires only read permissions (we will never make any changes to your GitHub account or organizations),
so, following the principle of least privilege, it's recommended to grant it read-only permissions to all the resources you wish to sync.
Personal Access Token
Follow this guide (opens in a new tab) on how to create a personal access token for CloudQuery.
App authentication
For App authentication, you need to create a GitHub App and install it on your organization. Follow this guide (opens in a new tab) and install the App into your organization(s). Give it all the permissions you need (read-only is recommended).
Every organization will have a unique installation ID. You can find it by going to the organization's settings page, and clicking on the "Installed GitHub Apps" tab. The installation ID is the number in the URL of the page.