Fetching packages from authenticated HTTP URIs with Gentoo Portage

Gentoo Linux's package manager Portage is designed to build packages from source code. Packages are installed from ebuilds which represent the installation instructions for a given software. Ebuilds specify the location of one or more source archives which are automatically fetched by the package manager. Several URIs are supported: http and https, ftp and sftp, rsync, and ssh.[1] This works perfectly well for open-source software, where the source tarball is publicly available. However, if the sources are not public, things get more complicated. In this post, I present different ways to fetch tarballs from URIs that require authentication.

Fetch restriction

The most appropriate way to deal with this issue is to prevent the package manager from fetching the package altogether by adding fetch restriction. If the user tries to install a fetch-restricted package, the package manager will not try to download it, but simply check whether the corresponding archive is present in /usr/portage/distfiles. If not, it will execute the pkg_nofetch ebuild function which should print a message describing the steps needed to acquire the archive. The following is an excerpt of the ebuild for the Oracle JDK in the official Portage tree using RESTRICT="fetch" and an actionable description in pkg_nofetch:

RESTRICT="fetch […]"
…
pkg_nofetch() {
  einfo "Please download ${ARCH_FILES[${ARCH}]} and move it to"
  einfo "your distfiles directory:"
  einfo
  einfo "  https://www.oracle.com/technetwork/java/javase/downloads/jdk11-downloads-5066655.html"
  einfo
  einfo "If the above mentioned URL does not point to the correct version anymore,"
  einfo "please download the file from Oracle's Java download archive:"
  einfo
  einfo "  https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase11-5116896.html"
  einfo
}

This approach involves another separate fetching step. It was not a good fit for my particular use case so I moved on to more hacky solutions.

Encoding user and password in the URI

It is possible to pass the credentials in the URI like so:

SRC_URI="https://user:password@example.com/tarball.tar.xz"

Can you do this? Yes. Should you do this? No. First of all, this prevents you from distributing your ebuild, because it now contains sensitive information. Further, if you are using this approach in multiple ebuilds and you need to change the credentials, all ebuilds need to be changed. Moreover, it is possible that passwords leak into logs. Portage uses wget to fetch the sources via HTTP and according to the Wget manual, the password can be viewed by entering ps during the download.

For all these reasons, we need to come up with something better.

Wget configuration

Since we know that Portage uses wget, we can exhaust all possibilities to configure wget to do what we want. For instance, we could specify a user and password in a global wgetrc file. This way the password will not be visible in the process call and will not leak to any logs. The credentials are also stored in a central place rather than scattered over different ebuilds.

Sounds prett good, right? Alas, there is a catch. When credentials are provided, wget (as of versions >1.10.2) will first try to connect without providing any authentication. Only if the server denies access and responds with an authentication challenge, wget will retry the connection providing the credentials. While is very reasonable, it is still possible that the target server is malicious and challenges authentication only to get hands on our credentials.

Enter .netrc. When wget receives an authentication challenge and no credentials were provided, it looks for the file ~/.netrc. The netrc file specifies credentials for a specific domain name, so we cannot be tricked into giving away credentials for to server they don't belong to. In contrast to wgetrc, we can even store credentials for multiple servers:

machine example.com login user password user-secret

Note that this works for our current user, whereas Portage is executed by the portage user by default. This means that we have to create ~portage/.netrc for wget to find the configuration during emerge. On many systems, the home directory of the portage user is mounted as a tmpfs, though,[2] and will be erased upon restart. If this is the case on your system, this solution might not be for you. However, you still have the option to have the root user fetch the packages by setting FEATURES="-userfetch" either globally or per-package. This allows you to put the credentials into ~root/.netrc where they are stored persistently. If you cannot or do not want to do this move on to my next suggestions.

Custom fetch command

Let's have a look at exactly how Portage uses wget. Portage allows the user to configure the command that is executed for fetching an archive. The command is defined in the FETCHCOMMAND variable whose defaults can be found in /usr/share/portage/config/make.globals and looks like this: [3]

FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""

Archives with an HTTP or HTTPS URI are retrieved using this command. If we added the --user and --password flags, we were able to retrieve the archive from URIs that require HTTP authentication. Due to the issue with malicious servers trying to trick us into providing our credentials we should not set the FETCHCOMMAND variable in make.conf where it affects all packages. Let's be more specific and use Portage's per-package environment configuration. Custom environments are defined in files in /etc/portage/env/. For example, let's create an environment file named mypypi for authenticating to a custom Python Package Index:

FETCHCOMMAND="wget -t 3 -T 60 --passive-ftp --user pypi-user --password pypi-user-secret -O \"\${DISTDIR}/\${FILE}\" \"\${URI}\""

We can create a file in /etc/portage/package.env to specify packages that should use our environment mypypi:

dev-python/myapplication mypypi

Assuming that all ebuilds using our custom PyPI are in an ebuild repository called "my-python-repo", we can futureproof the solution by using wildcards:

*/*::my-python-repo mypypi

This will cause Portage to use our custom fetch command for all ebuilds in the repository my-python-repo.

Other methods

This article just presents possible options for fetching from authenticated HTTP and HTTPS URIs, but there are other options to fetch from authenticated URIs in general. Rsync URIs use the SSH identity of the portage user.

Some eclasses ignore SRC_URI entirely and implement a custom fetch logic. For example, git-r3 uses EGIT_REPO_URI. You could roll a custom eclass that implements src_unpack differently.[4]

Other package managers, such as Paludis have implemented the idea of fetchers to address the issue. There has even been a discussion in the Gentoo bug tracker about extensible URI protocols in Portage which pushes in a similar direction, but it has unfortunately halted.


  1. There is also support for several VCS-specific URIs such as git but they are implemented in shell scripts that are not part of the package manager. ↩︎

  2. To check whether the home directory of the portage user is a tmpfs run df -h ~portage ↩︎

  3. In fact, there are different fetch commands for the different URI schemes, but we will simply look at HTTP here ↩︎

  4. Note that overriding src_unpack like the Git eclass does will prevent you from doing offline installs. ↩︎