Sunday, April 19, 2015

Bad URL day

When you create URL from string parameters it is crucial to encode those parameters properly (and then decode them). Imagine that yours REST service is consumed by client like this:
Response response = myServiceRestClient.getWebTarget().path("/queries/something").path(stringParameter)
        .queryParam("a", a)
        .queryParam("b", b)
        .request().get();
This is standard javax.ws.rs.core way to do things, but you can expect different behavior from path method. Let me introduce the wrongdoer:
/** * Create a new {@code WebTarget} instance by appending path to the URI of * the current target instance. * <p> * When constructing the final path, a '/' separator will be inserted between * the existing path and the supplied path if necessary. Existing '/' characters * are preserved thus a single value can represent multiple URI path segments. * </p> * <p> * A snapshot of the present configuration of the current (parent) target * instance is taken and is inherited by the newly constructed (child) target * instance. * </p> * * @param path the path, may contain URI template parameters. * @return a new target instance. * @throws NullPointerException if path is {@code null}. */public WebTarget path(String path);
Do you know how wrong will be your URL when you pass "a/b" as a path string? Of course you can easily pass / as part of URL in many ways. What can go wrong this time? Crucial is this sentence "single value can represent multiple URI path segments". As you can expect "/" will cause problems (I have found a lot of 404s in logs and this is why I called this Bad URL day).

How to repair this?
Of course it is very easy to replace / with something different in url and then back to / on the other side.

A. To guava or not to guava?
In guava we can find UrlEscapers. They are quite nice, but do we need guava in every project we do? What about other languages? In microservices environment every node should know about guava way of doing encoding to communicate with other parties.

B. Java API
Java API has two quite useful classes: URLEncoder and URLDecoder. They will allow us to do things we want to. I will present few tests:
@Unrolldef "should return space for encoded space [#character]"() {
    when:
    String result = RequestUtils.decodeParameterFromUrl(character)
    then:
    result == ' '    where:
    character << [' ', '%20', '+']
}
This means that we can write space(' ') in three different ways: ' ', '%20' and '+'.
Of course we should take a look at method definition:
import java.io.UnsupportedEncodingException;import java.net.URLDecoder;import java.nio.charset.StandardCharsets;
public class RequestUtils {

    public static String decodeParameterFromUrl(String parameterFromUrl) {
        try {
            return URLDecoder.decode(parameterFromUrl, StandardCharsets.UTF_8.name());        } catch (UnsupportedEncodingException e) {
            throw new IllegalArgumentException("JVM should implement encoding from StandardCharsets set");        }
    }
}
This is quite hard to test 100% of code above because UnsupportedEncodingException is thrown only on JVMs that do not implement StandardCharsets.UTF_8, but charsets from Standard Charsets should be implemented in every JVM. According to javadoc:
package java.nio.charset;
/** * Constant definitions for the standard {@link Charset Charsets}. These * charsets are guaranteed to be available on every implementation of the Java * platform. * * @see <a href="Charset#standard">Standard Charsets</a> * @since 1.7 */public final class StandardCharsets {
/..../
What about our "/":
@Unrolldef "should allow me to pass / encoded to [#character]"() {
    when:
    String result = RequestUtils.decodeParameterFromUrl(character)
    then:
    result == '/'    where:
    character << ['/', '%2F', '%2f']
}
And what out our famous ĄĘ characters?


Do you think I did forgot about something? Maybe there is another way of testing this part with UnsupportedEncodingException? Please leave suggestions in comments below - I will try every hint and describe my trial on this blog.

No comments:

Post a Comment