Advanced Configuration of the HttpClient HTTP Wagon

Notice on Maven Versioning and Availability

Maven 2.2.0

Starting in Maven 2.2.0, the HttpClient wagon was the implementation in use. The remainder of this document deals specifically with the differences between the HttpClient- and Sun-based HTTP wagons.

Maven 2.2.1

Due to several critical issues introduced by the HttpClient-based HTTP wagon, Maven 2.2.1 reverted back to the Sun implementation (a.k.a. 'lightweight') of the HTTP wagon as the default for HTTP/HTTPS transfers. The issues with the HttpClient-based wagon were mainly related to checksums, transfer timeouts, and NTLM proxies, and served as the primary cause for the release of 2.2.1 in the first place.

However, starting in Maven 2.2.1 you have a choice: you can use the default wagon implementation for a given protocol, or you can select an alternative wagon provider you'd like to use on a per-protocol basis. For more information, see the Guide to Wagon Providers [3].

Maven 3.0.4

With 3.0.4, the default wagon http(s) is now the HttpClient based on Apache Http Client 4.1.2. There is now a http connection pooling to prevent reopening http(s) to remote server for each requests. This pool feature is configurable with some parameters [4].

This new defaut wagon comes with some default configuration:

  • http(s) connection pool: default to 20.
  • readTimeout: default to 1800000ms (~30 minutes) (see section Read time out below)
  • default Preemptive Authentication only with PUT (GET doesn't use anymore default Preemptive Authentication)

Introduction

Using the HttpClient-based HTTP wagon, you have a lot more control over the configuration used to access HTTP-based Maven repositories. For starters, you have fine-grained control over what HTTP headers are used when resolving artifacts. In addition, you can also configure a wide range of parameters to control the behavior of HttpClient itself. Best of all, you have the ability to control these headers and parameters for all requests, or individual request types (Maven issues GET, HEAD, and PUT requests for different parts of the artifact-management subsystem).

The Basics

Without any special configuration, Maven's HTTP wagon will use some default HTTP headers and client parameters when managing artifacts. The default headers are:

Cache-control: no-cache
Cache-store: no-store
Pragma: no-cache
Expires: 0
Accept-Encoding: gzip

In addition, PUT requests made with the HTTP wagon will use the following HttpClient parameter:

http.protocol.expect-continue=true

From the HttpClient documentation[2], this parameter provides the following functionality:

Activates 'Expect: 100-Continue' handshake for the entity enclosing methods. 
The 'Expect: 100-Continue' handshake allows a client that is sending a request 
message with a request body to determine if the origin server is willing to 
accept the request (based on the request headers) before the client sends the 
request body.

The use of the 'Expect: 100-continue' handshake can result in noticeable performance 
improvement for entity enclosing requests (such as POST and PUT) that require 
the target server's authentication.

'Expect: 100-continue' handshake should be used with caution, as it may cause 
problems with HTTP servers and proxies that do not support HTTP/1.1 protocol.

Without this setting, PUT requests that require authentication will transfer their entire payload to the server before that server issues an authentication challenge. In order to complete the PUT request, the client must then re-send the payload with the proper credentials specified in the HTTP headers. This results in twice the bandwidth usage, and twice the time to transfer each artifact.

Another option to avoid this double transfer is what's known as preemptive authentication, which involves sending the authentication headers along with the original PUT request. However, there are a few potential issues with this approach. For one thing, in the event you have an unused <server> entry that specifies an invalid username/password combination, some servers may respond with a 401 Unauthorized even if the server doesn't actually require any authentication for the request. In addition, blindly sending authentication credentials with every request regardless of whether the server has made a challenge can result in a security hole, since the server may not make provisions to secure credentials for paths that don't require authentication.

We'll discuss preemptive authentication in another example, below.

Configuring GET, HEAD, PUT, or All of the Above

In all of the examples below, it's important to understand that we can configure the HTTP settings for all requests made to a given server, or for only one method. To configure all methods for a server, you'd use the following section of the settings.xml file:

<settings>
  [...]
  <servers>
    <server>
      <id>the-server</id>
      <configuration>
        <httpConfiguration>
          <all>
            [ Your configuration here. ]
          </all>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>

On the other hand, if you can live with the default configuration for most requests - say, HEAD and GET requests, which are used to check for the existence of a file and retrieve a file respectively - maybe you only need to configure the PUT method:

<settings>
  [...]
  <servers>
    <server>
      <id>the-server</id>
      <configuration>
        <httpConfiguration>
          <put>
            [ Your configuration here. ]
          </put>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>

For clarity, the other two sections are <get> for GET requests, and <head> for HEAD requests. I know that's going to be hard to remember...

Taking Control of Your HTTP Headers

As you may have noticed above, the default HTTP headers do have the potential to cause problems. For instance, some websites set the encoding for downloading GZipped files as gzip, in spite of the fact that the HTTP request itself isn't being sent using GZip compression. If the client is using the Accept-Encoding: gzip header, this can result in the client itself decompressing the GZipped file during the transfer and writing the decompressed file to the local disk with the original filename. This can be misleading to say the least, and can use up an inordinate amount of disk space on the local computer.

To turn off this default behavior, we'll simply disable the default headers. Then, we'll need to respecify the other headers that we are still interested in, like this:

<settings>
  [...]
  <servers>
    <server>
      <id>openssl</id>
      <configuration>
        <httpConfiguration>
          <put>
            <useDefaultHeaders>false</useDefaultHeaders>
            <headers>
              <header>
                <name>Cache-control</name>
                <value>no-cache</value>
              </header>
              <header>
                <name>Cache-store</name>
                <value>no-store</value>
              </header>
              <header>
                <name>Pragma</name>
                <value>no-cache</value>
              </header>
              <header>
                <name>Expires</name>
                <value>0</value>
              </header>
              <header>
                <name>Accept-Encoding</name>
                <value>*</value>
              </header>
            </headers>
          </put>
        </httpConfiguration>
      </configuration>
    </server>
    [...]
  </servers>
  [...]
</settings>

Fine-Tuning HttpClient Parameters

Going beyond the power of HTTP request parameters, HttpClient provides a host of other configuration options. In most cases, you won't need to customize these. But in case you do, Maven provides access to specify your own fine-grained configuration for HttpClient. Again, you can specify these parameter customizations per-method (HEAD, GET, or PUT), or for all methods of interacting with a given server. For a complete list of supported parameters, see the link[2] in Resources section below.

Non-String Parameter Values

Many of the configuration parameters for HttpClient have simple string values; however, there are important exceptions to this. In some cases, you may need to specify boolean, integer, or long values. In others, you may even need to specify a collection of string values. You can specify these using a simple formatting syntax, as follows:

  1. booleans: %b,<value>
  2. integer: %i,<value>
  3. long: %l,<value> (yes, that's an 'L', not a '1')
  4. double: %d,<value>
  5. collection of strings: %c,<value1>,<value2>,<value3>,..., which could also be specified as:
    %c,
    <value1>,
    <value2>,
    <value3>,
    ...

As you may have noticed, this syntax is similar to the format-and-data strategy used by functions like sprintf() in many languages. The syntax has been chosen with this similarity in mind, to make it a little more intuitive to use.

Example: Using Preemptive Authentication

Using the above syntax, we can configure preemptive authentication for PUT requests using the boolean HttpClient parameter http.authentication.preemptive, like this:

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <httpConfiguration>
          <put>
            <params>
              <param>
                <name>http.authentication.preemptive</name>
                <value>%b,true</value>
              </param>
            </params>
          </put>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>

Ignoring Cookies

Like the example above, telling the HttpClient to ignore cookies for all methods of request is a simple matter of configuring the http.protocol.cookie-policy parameter (it uses a regular string value, so no special syntax is required):

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <httpConfiguration>
          <all>
            <params>
              <param>
                <name>http.protocol.cookie-policy</name>
                <value>ignore</value>
              </param>
            </params>
          </all>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>

The configuration above can be useful in cases where the repository is using cookies - like the session cookies that are often mistakenly turned on or left on in appservers - alongside HTTP redirection. In these cases, it becomes far more likely that the cookie issued by the appserver will use a Path that is inconsistent with the one used by the client to access the server. If you have this problem, and know that you don't need to use this session cookie, you can ignore cookies from this server with the above configuration.

Support for General-Wagon Configuration Standards

It should be noted that configuration options previously available in the HttpClient-driven HTTP wagon are still supported in addition to this new, fine-grained approach. These include the configuration of HTTP headers and connection timeouts. Let's examine each of these briefly:

HTTP Headers

In all HTTP Wagon implementations, you can add your own HTTP headers like this:

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <httpHeaders>
          <httpHeader>
            <name>Foo</name>
            <value>Bar</value>
          </httpHeader>
        </httpHeaders>
      </configuration>
    </server>
  </servers>
</settings>

It's important to understand that the above approach doesn't allow you to turn off all of the default HTTP headers; nor does it allow you to specify headers on a per-method basis. However, this configuration remains available in both the lightweight and httpclient-based Wagon implementations.

Connection Timeouts

All wagon implementations that extend the AbstractWagon class, including those for SCP, HTTP, FTP, and more, allow the configuration of a connection timeout, to allow the user to tell Maven how long to wait before giving up on a connection that has not responded. This option is preserved in the HttpClient-based wagon, but this wagon also provides a fine-grained alternative configuration that can allow you to specify timeouts per-method for a given server. The old configuration option - which is still supported - looks like this:

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <timeout>6000</timeout> <!-- milliseconds -->
      </configuration>
    </server>
  </servers>
</settings>

...while the new configuration option looks more like this:

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <httpConfiguration>
          <put>
            <connectionTimeout>10000</connectionTimeout> <!-- milliseconds -->
          </put>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>

If all you need is a per-server timeout configuration, you still have the option to use the old <timeout> parameter. If you need to separate timeout preferences according to HTTP method, you can use one more like that specified directly above.

Read time out

With Wagon 2.0 and Apache Maven 3.0.4, a default timeout of 30 minutes comes by default. If you want to change this value, you can add the following setup in your settings:

<settings>
  <servers>
    <server>
      <id>my-server</id>
      <configuration>
        <httpConfiguration>
          <put>
            <readTimeout>120000</readTimeout> <!-- milliseconds -->
          </put>
        </httpConfiguration>
      </configuration>
    </server>
  </servers>
</settings>