Meltano v2.0 is here! Read about the release on our blog.
Once you’ve set up a Meltano project and run some pipelines on your local machine, it’ll be time to repeat this trick in production!
This page will help you figure out:
Additionally, you may want to run Meltano UI and configure it for production.
If you’re containerizing your Meltano project, you can skip steps 1 through 3 and refer primarily to the “Containerized Meltano project” subsections on this page. We also provide a Helm Chart for deploying a containerized instance of the Meltano UI to Kubernetes. More on that in the Kubernetes section.
Meltano currently does not offer project hosting as a paid offering. We recommend users look at Singerly, Astronomer.io, or Google Cloud Composer as options for hosting and running your Meltano project.
Since a Meltano project is just a directory on your filesystem containing text-based files, you can treat it like any other software development project and benefit from DataOps best practices such as version control, code review, and continuous integration and deployment (CI/CD).
As such, getting your Meltano project onto the production environment starts with getting it off of your local machine, and onto a (self-)hosted Git repository platform like GitLab or GitHub.
By default, your Meltano project comes with a .gitignore
file to ensure that
environment-specific and potentially sensitive configuration stored inside the
.meltano
directory and .env
file is not leaked accidentally. All other files
are recommended to be checked into the repository and shared between all users
and environments that may use the project.
Once your Meltano project is in version control, getting it to your production environment can take various shapes.
In general, we recommend setting up a CI/CD pipeline to run automatically whenever new changes are pushed to your repository’s default branch, that will connect with the production environment and either directly push the project files, or trigger some kind of mechanism to pull the latest changes from the repository.
A simpler (temporary?) approach would be to manually connect to the production environment and pull the repository, right now while you’re setting this up, and/or later whenever changes are made.
If you’re containerizing your Meltano project, your project-specific Docker image will already contain all of your project files.
Just like on your local machine, the most straightforward way to install Meltano
onto a production environment is to
use pip
to install the meltano
package from PyPI.
If you add meltano
(or meltano==<version>
) to your project’s requirements.txt
file, you can choose to automatically run pip install -r requirements.txt
on your
production environment whenever your Meltano project is updated to ensure you’re always
on the latest (or requested) version.
If you’re containerizing your Meltano project,
your project-specific Docker image will already contain a Meltano installation
since it’s built from the meltano/meltano
base image.
Whenever you add a new plugin to a Meltano project, it will be
installed into your project’s .meltano
directory automatically.
However, since this directory is included in your project’s .gitignore
file
by default, you’ll need to explicitly run meltano install
before any other meltano
commands whenever you clone or pull an existing Meltano project from version control,
to install (or update) all plugins specified in your meltano.yml
project file.
Thus, it is strongly recommended that you automatically run meltano install
on your
production environment whenever your Meltano project is updated to ensure you’re always
using the correct versions of plugins.
If you’re containerizing your Meltano project,
your project-specific Docker image will already contain all of your project’s plugins
since meltano install
is a step in its build process.
Meltano stores various types of metadata in a project-specific
system database, that takes
the shape of a SQLite database stored inside the project at .meltano/meltano.db
by default. Like all files stored in the .meltano
directory
(which you’ll remember is included in your project’s .gitignore
file by default), the system database is
also environment-specific.
While SQLite is great for use during local development and testing since it requires no external database to be set up, it has various limitations that make it inappropriate for use in production. Since it’s a simple file, it only supports one concurrent connection, for example.
Thus, it is is strongly recommended that you use a PostgreSQL system database in
production instead. You can configure Meltano to use it using the
database_uri
setting.
If you’re containerizing your Meltano project,
you will definitely want to use an external system database, since changes to
.meltano/meltano.db
would not be persisted outside the container.
Meltano stores all output generated by meltano elt
in .meltano/logs/elt/{state_id}/{run_id}/elt.log
,
where state_id
refers to the value of the provided --state_id
flag or the name of a scheduled pipeline, and run_id
is an autogenerated UUID.
You can use Meltano UI locally or in production to view the most recent logs of your project’s scheduled pipelines right from your browser.
If you’d like to store these logs elsewhere, you can symlink the .meltano/logs
or .meltano/logs/elt
directory to a location of your choice.
If you’re containerizing your Meltano project,
these logs will not be persisted outside the container running your pipelines
unless you exfiltrate them by mounting a volume
inside the container at /project/.meltano/logs/elt
.
You will want to mount this same volume (or directory) into the container that runs Meltano UI if you’d like to use it to view the pipelines’ most recent logs.
All of your Meltano project’s configuration that is not environment-specific
or sensitive should be stored in its meltano.yml
project file and checked into version
control.
Configuration that is environment-specific or sensitive is most appropriately managed using environment variables. Meltano Environments can be used to better manage configuration between different deployment environments. How these can be best administered will depend on your deployment strategy and destination.
If you’d like to store sensitive configuration in a secrets store, you can
consider using the chamber
CLI, which
lets you store secrets in the
AWS Systems Manager Parameter Store
that can then be exported as environment variables when
executing an arbitrary command
like meltano
.
If you’re containerizing your Meltano project, you will want to manage sensitive configuration using the mechanism provided by your container runner, e.g. Docker Secrets or Kubernetes Secrets.
meltano elt
#
If all of the above has been set up correctly, you should now be able to run
a pipeline using meltano elt
,
just like you did locally. Congratulations!
You can run the command using any mechanism capable of running executables,
whether that’s cron
, Airflow’s BashOperator
,
or any of dozens of other orchestration tools.
If you’ve added Airflow to your Meltano project as an orchestrator,
you can have it automatically run your project’s scheduled pipelines
by starting its scheduler
using meltano invoke airflow scheduler
.
Similarly, you can start its web interface
using meltano invoke airflow webserver
.
However, do take into account Airflow’s own Deployment in Production Best Practices. Specifically, you will want to configure Airflow to:
use the LocalExecutor
instead of the SequentialExecutor
default by setting the core.executor
setting
(or AIRFLOW__CORE__EXECUTOR
environment variable) to LocalExecutor
:
meltano config airflow set core.executor LocalExecutor
export AIRFLOW__CORE__EXECUTOR=LocalExecutor
use a PostgreSQL metadata database
instead of the SQLite default (sounds familiar?) by setting the core.sql_alchemy_conn
setting
(or AIRFLOW__CORE__SQL_ALCHEMY_CONN
environment variable) to a postgresql://
URI:
meltano config airflow set core.sql_alchemy_conn postgresql://<username>:<password>@<host>:<port>/<database>
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://<username>:<password>@<host>:<port>/<database>
For this to work, the psycopg2
package will
also need to be installed alongside apache-airflow
,
which you can realize by adding psycopg2
to airflow
’s pip_url
in your meltano.yml
project file (e.g. pip_url: psycopg2 apache-airflow
)
and running meltano install orchestrator airflow
.
If you’re containerizing your Meltano project,
the built image’s entrypoint
will be the meltano
command,
meaning that you can provide meltano
subcommands and arguments like elt ...
and invoke airflow ...
directly to
docker run <image-name> ...
as trailing arguments.
Now that your pipelines are running, you may want to also spin up Meltano UI, which lets you quickly check the status and most recent logs of your project’s scheduled pipelines right from your browser.
You can start Meltano UI using meltano ui
just like you would locally,
but there are a couple of settings you’ll want to consider changing in production:
By default, Meltano UI will bind to host 0.0.0.0
and port 5000
.
This can be changed using the ui.bind_host
and ui.bind_port
settings, and their respective environment variables and CLI options.
If you’d like to require users to sign in before they can access the Meltano UI, enable the ui.authentication
setting.
As described behind that link, this will also require you to set the ui.secret_key
and ui.password_salt
settings, as well as ui.server_name
or ui.session_cookie_domain
.
Users can be added using meltano user add
and will be stored in the configured system database.
If you will be running Meltano UI behind a front-end (reverse) proxy that will be responsible for SSL termination (HTTPS),
it is recommended that you enable the ui.session_cookie_secure
setting so that session cookies used for authentication are only sent along with secure requests.
You may also need to change the ui.forwarded_allow_ips
setting to get
Meltano UI to realize it should use the https
URL scheme rather than http
in the URLs it builds.
If your reverse proxy uses a health check to determine if Meltano UI is ready to accept traffic, you can use the /api/v1/health
route, which will always respond with a 200 status code.
Meltano UI can be used to make changes to your project, like adding plugins and scheduling pipelines, which is very useful locally but may be undesirable in production if you’d prefer for all changes to go through version control instead.
To disallow all modifications to project files through the UI, enable the project_readonly
setting.
If you’re containerizing your Meltano project,
the project_readonly
setting will be
enabled by default
using the MELTANO_PROJECT_READONLY
environment variable,
since any changes to your meltano.yml
project file would not be persisted outside the container.
Hosting a containerized instance of the Meltano UI on Kubernetes is made easy using the provided Meltano Helm Chart. Try it out via the Helm CLI:
# add the meltano-ui helm repository
helm repo add meltano https://meltano.gitlab.io/infra/helm-meltano/meltano-ui
# view available Chart versions
helm search repo meltano-ui
# deploy 🚀
helm install meltano-ui/meltano-ui --generate-name