Remote Repository Filtering

Remote Repository Filtering enables the resolver to filter artifacts by repository based on various (extensible) criteria.

Why?

Remote Repository Filtering (RRF) is a long requested feature of Maven. It's useful when a build uses several remote repositories. In such cases, Maven “searches” the ordered list (effective POM) of remote repositories, and artifacts are resolved using a loop and a “first found wins” strategy. This has several implications:

  • The build is slower because when an artifact is in the Nth repository, Maven first queries the previous N-1 repositories that will result in before finally finding the artifact.
  • The build “leaks” artifact requests, as those repositories are asked for artifacts they do not have. Still, those remote repository operators see the requests in the access logs.
  • To “simplify” things, users tend to use MRM “group” (or “virtual”) repositories, that cause data loss on Maven Project side (project loses artifact origin information) and ends up in disasters, as at the end these “super-uber groups” grow uncontrollably, their member count become huge (as new members are added as time passes), or created groups count grows uncontrollably, and projects start losing the knowledge about the remote repositories needed to (re)build a project. Hence these projects become un-buildable without the MRM, and projects become bound to MRM and/or environment that is usually out of project control.

What It Is

You can instruct Maven which repository can contain which artifacts. Instead of “ordered loop” searching for artifacts in remote repositories, Maven can query a repository that has the artifact first,.

With RRF, the Maven build does not slow down when new remote repositories are added, and does not leak build information unnecessarily.

What It Is Not

When it comes to dependencies, don't forget maven-enforcer-plugin rules. RRF is NOT an alternative to these enforcer rules. It is a tool to make your build faster and more private without losing build information.

Maven Central Is Special

The Maven Central repository is special in this respect, as Maven will always try to download artifacts from here, as your build, plugins, plugin dependencies, extension, etc. will most often come from it. While you can filter Maven Central, this is usually a bad idea (filtering, as in “limiting what can come from it”). On other hand, Maven Central itself offers help to prevent request leakage to it (see “prefixes” filter).

So, most often limiting “what can be fetched” from Maven Central is a bad idea. It can be done but in very, very cautious way, as otherwise you put your build at risk. RRF does not distinguish the “context” of an artifact. It merely filters them out by (artifact, remoteRepository) pair. By limiting Maven Central you can easily get into state where you break your build because a plugin depends on a filtered artifact.

RRF

The RRF feature offers a filter source service provider interface for 3rd party implementors, but it also provides 2 out of the box implementations for filtering: “prefixes” and “groupId” filters.

Both implementation operate with several files (per remote repository), and they use the term “filter basedir”. By default, filter basedir is resolved from local repository root and resolves to ${localRepo}/.remoteRepositoryFilters directory. It will be referred to in this document with the ${filterBasedir} placeholder.

To set the filter basedir, use: -Daether.remoteRepositoryFilter.${filterName}.basedir=somePath. If “somePath” is a relative path, it is resolved from the local repository root. If it is an absolute path, it is used as is.

Since Maven 3.9.x you can use an expression like ${session.rootDirectory}/.mvn/rrf/ to store filter data along with sources. session.rootDirectory will become an absolute path pointing to the root directory of the project (where usually the .mvn directory is).

When no input files are present, both implementations behave as if disabled for given repository. Moreover, the enabled settings suffixed with “.repoId” can be used to selectively enable or disable filtering for a repository (for example -Daether.remoteRepositoryFilter.prefixes.myrepo=false).

Unlike in Resolver 1.x, filtering is enabled by default; and prefixes will be dynamically discovered, and if found, used. For groupId filter user intervention is still needed to provide input files. Hence, without these, only prefix filtering will automatically kick in.

The Prefixes Filter

The “prefixes” named filter relies on a file containing a list of “repository prefixes” available from a given repository. The prefix is essentially the “starts with” of Artifact path as translated by the repository layout. Its effect is that only those artifacts will be attempted to be downloaded from given remote repository, if there is a “starts with” match between the artifact path translated by the layout, and the prefixes file published by remote repository.

Prefixes are usually published by remote repositories, hence, are kinda filtering the other way around: it is rather the remote repository advising us “do not even bother to come to me with a path that has no appropriate prefix enlisted in this file”. On the other hand, having a prefix enlisted does not provide 100% guarantee that a matching artifact is really present! For example the presence of /com/foo prefix does NOT imply that com.foo:baz:1.0 artifact is present, it merely tells “I do have something that starts with /com/foo” (for example com.foo.baz:lib:1.0). The depth of published prefixes is set by the publisher, and is usually a value between 2 and 4. It all boils down to the balance between “best coverage” and “acceptable file size” (ultimately, the prefixes file containing all the relative paths of deployed artifacts from the repository root would be 100% coverage, but the cost would be a huge file size for huge repositories like Maven Central).

As this file is (automatically) published by MC and MRMs, and using them is the simplest: they will be automatically discovered and cached (just like any artifact from given remote repository).

Manual authoring of these files, while possible, is not recommended. The best is to keep them up to date by downloading the published files from the remote repositories. In ideal circumstances no user intervention is needed as remote repository should publish prefix file and discovery should discover it.

Many MRMs and Maven Central itself publish these files. Some prefixes file examples:

The user provided prefixes files are expected in the following location by default: ${filterBasedir}/prefixes-${remoteRepository.id}.txt.

Important: Valid prefix files start with following “magic” on their very first line: ## repository-prefixes/2.0. If the first line in file is not this string, the prefix file is discarded.

To disable prefixes filter, use the following setting: -Daether.remoteRepositoryFilter.prefixes=false. To disable for single repository filtering, append to key .repoId.

The prefixes filter will “abstain” from filtering for the given remote repository, if there was no prefix file discovered, nor there is user input provided for it.

The GroupId Filter

The “groupId” named implementation is filtering based on allowed groupId of Artifact. In essence, it is a list of “allowed groupId coordinates from given remote repository”. The file contains one Artifact groupId per line along with possible modifiers.

The groupId files are expected in the following location by default: ${filterBasedir}/groupId-${remoteRepository.id}.txt.

To disable groupId filtering, use the following setting: -Daether.remoteRepositoryFilter.groupId=false. To disable for single repository filtering, append to key .repoId.

The groupId filter will “abstain” from filtering for the given remote repository, if there is no input provided for it.

The GroupId filter allows the “recording” of encountered groupIds as well, that can be used as starting point: after the “recording” is done, one can edit, remove or add entries as needed. When the groupId filter is set to “record”, it does NOT filter, but instead collects all the encountered groupIds per remote repository and saves them into properly placed file(s).

To enable GroupId Filter recording, use following setting: -Daether.remoteRepositoryFilter.groupId.record=true.

To truncate recorded file(s) instead of merging recorded entries with existing file, use following setting: -Daether.remoteRepositoryFilter.groupId.truncateOnSave=true. If enabled, the saved file will contain ONLY the groupIds that were recorded in current session, otherwise the recorded groupIds and already present ones in file will be merged, and then saved.

Format of file:

  • Lines beginning with # (hash) and blank lines are ignored
  • modifier (must be first character) ! is negation (disallow; but default entry “allow”)
  • modifier (must be first, or second if negation modifier present) = is limiter (equals; by default entry is “and below this G”)
  • a proper Maven groupId, like org.apache.maven

Example file:

# My file                   (1)
                            (2)
org.apache.maven            (3)
!=org.apache.maven.foo      (4)
!org.apache.maven.indexer   (5)
=org.apache.bar             (6)

Lines 1 and 2 are ignored. Line 3 means “allow org.apache.maven G and below”. Line 4 is “disallow org.apache.maven.foo” only" (so org.apache.maven.foo.bar is allowed due first line). Line 5 means “disallow org.apache.maven.indexer and below” and finally line 6 means “allow org.apache.bar ONLY” (so org.apache.bar.foo is NOT enabled).

One can use one special entry “root” * (asterisk) to define the “default acceptance” (that without it defaults to REJECTED). Similarly, adding !* to file defines “default acceptance” of FALSE/REJECTED as well, and adding it to file changes nothing, as this is the default acceptance (but may serve some documentation purposes). Be aware: In case a line with single asterisk * is present, the whole logic of Group filter is getting inverted, hence there is no need to add “allowed entries” (they are allowed by default), but one can add “disallowed entries” by adding !com.foo and alike.

Conflicting rules: rule parser is intentionally trivial, so in case of conflicting rules the “last wins” strategy is applied. Ideally, user should keep files sorted or handle them in a way one can detect conflicts in it.

Operation

To make RRF filters operate, as they are by default enabled, you have to make sure that:

  • prefix file can be discovered (if not for any reason, you may provide alternate input for it)
  • groupId is procided.

As said above, enabled filters does not make them active (participate in filtering): if a given remote repository does not have any input available, the filter pulls out from “voting” (does not participate in filtering, will abstain from voting). Same effect can be achieved by selectively enable filter by appending .repoId to property key.

The most common configuration in case of multiple remote repositories is the following setup: use both filters, the Maven Central prefixes should be discovered (same for any other remote repository that offers prefixes). Optionally provide groupId files for non-Central remote repositories, if needed. It results in following filter activity:

Remote Repository Prefixes Filter GroupId Filter
Maven Central active inactive
Some Remote active or inactive active

This leads to the following “constraints”:

  • “Maven Central” is asked only for those artifacts it claims it may have (prefixes)
  • “Some Remote” is asked only for allowed groupIds. If it publishes prefixes, is even better: you will not ask for things it for sure does not have.