--- title: "httptest: A Test Environment for HTTP Requests in R" description: "This vignette covers the core features of the httptest package, focusing on how to mock HTTP responses, how to make assertions about requests, and how to record real requests for future use as mocks." output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{httptest: A Test Environment for HTTP Requests in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, results='hide', echo=FALSE, message=FALSE} options(width=120) ``` Testing code and packages that communicate with remote servers can be painful. Dealing with authentication, bootstrapping server state, cleaning up objects that may get created during the test run, network flakiness, and other complications can make testing seem too costly to bother with. But it doesn't need to be that hard. The `httptest` package lets you test R code that constructs API requests and handles their responses, all without requiring access to the remote service during the test run. This makes tests easy to write and fast to run. `httptest` sits on top of the [testthat](https://testthat.r-lib.org/) package and provides test **contexts** that mock the network connection. These let you supply mock API responses for some requests and allow you to assert that HTTP requests were—or were not—made using custom **expectation** functions. The package further includes tools for recording the responses of real requests and storing them as fixtures that you can later load in a test run. Taken together, `httptest` lets you test that your code is making the intended requests and that it handles the expected responses correctly without depending on a connection to a remote API. This vignette covers some of the core features of the `httptest` package, focusing on how to mock HTTP responses, how to make other assertions about requests, and how to record real requests for future use as mocks. Note that `httptest` requires the `testthat` package, and it follows the testing conventions and interfaces defined there, extending them with some additional wrappers and expectations. If you're not familiar with `testthat`, see the ["Testing" chapter](https://r-pkgs.org/tests.html) chapter of Hadley Wickham's _R Packages_ book. Furthermore, `httptest` is designed for use with packages that rely on the [httr](https://httr.r-lib.org/) requests library—it is a bridge between `httr` and `testthat`. # The `with_mock_api` context The package includes three contexts, which are "with"-style functions that you wrap around other code you want to execute. The most generally useful of the three is `with_mock_api()`. In this context, HTTP requests are intercepted and mapped to local file paths. If the file exists, it is loaded and returned as the response; if it does not, an error with a message containing the request information is raised, and we can write tests that look for that error. These two different modes allow us to make assertions about two different kinds of logic: (1) given some inputs, does my code make the correct HTTP request(s); and (2) does my code correctly handle the types of responses that the server can return? ## Example To illustrate the power of `with_mock_api`, let's try to add tests to an R package that wraps a prominent web API. The `twitteR` package is quite popular ([13k downloads per month](https://cranlogs.r-pkg.org/badges/twitteR), at the time of writing), but like most R packages, it doesn't have a test suite. Indeed, it is hard to test the querying of an API that requires OAuth authentication, and thus a user account to use in testing. That's tricky to set up to run locally, and even more challenging to run on a continuous-integration service like Travis-CI. But `httptest` can help. First, we need to add `httptest` to our package test suite. Once you've configured your package to use `testthat` (`usethis::use_testthat()` is one way), run `httptest::use_httptest()`. This adds `httptest` to your package `DESCRIPTION` and makes sure it is loaded in your test setup code. Now we can start with our tests. From some experimenting, it's clear that the package checks to see if you have an OAuth token set. We're not actually going to hit the real Twitter API, so we don't need a valid token—we just need a token to exist so the R code that looks for a token finds one: ```r use_oauth_token("foo") ``` Let's write a test. Add "test-user.R" to your test directory and put a test for getting a user record, via the `getUser` function. According to the source code, this function should hit the "show user" Twitter API, documented [here](https://dev.twitter.com/rest/reference/get/users/show). But, we're going to prevent that request from actually happening by wrapping our test in the `with_mock_api`. ```r context("Get a user") with_mock_api({ test_that("We can get a user object", { user <- getUser("twitterdev") }) }) ``` When we run the tests, it fails with ``` Get a user: Error: GET https://api.twitter.com/1.1/users/show.json?screen_name=twitterdev (api.twitter.com/1.1/users/show.json-84627b.json) ``` The error message reveals a few things about how `with_mock_api` works. First, it tells us what the request method and URL was, and if there had been a request body, it would have been part of the error message as well. Second, the final part of the error message is a file name. That's the mock file that the test context was looking for and didn't find. If the file had existed, it would have been loaded and the code would have continued executing *as if the server had returned it*. Requests are translated to mock file paths according to several rules that incorporate the request method, URL, query parameters, and body. Query parameters and request bodies are incorporated into the file path by hashing—hence the `84627b` in the `getUser` example. If a request method other than GET is used, it will be appended to the end of the end of the file name. Mock file paths also have an extension appended because in an HTTP API, a "directory" itself is a resource. The extension allows distinguishing directories and files in the file system. That is, a mocked `GET("http://example.com/api/")` may read a "example.com/api.json" file, while `GET("http://example.com/api/object1/")` reads "example.com/api/object1.json", and `POST("api/object1/?a=1")` would map to "api/object1-b64371-POST.json". The file extension also gives information on content type. Files with `.json`, `.html`, `.xml`, `.txt`, `.csv`, and `.tsv` are loaded directly by `with_mock_api`, and relevant request metadata (`Content-Type` header, status code 200, etc.) are inferred. If your API doesn't return one of these types, or if you want to simulate requests with other behavior (201 Location response, or 400 Bad Request, for example), you can store full `response` objects in .R files that `with_mock_api` will `source` to load. Any request can be stored as a .R mock, but the `.json` and other media types offer a simplified, more readable alternative. Back to the `getUser` example. The error message tells us that the request it is making—`GET https://api.twitter.com/1.1/users/show.json?screen_name=twitterdev`—is what we should expect based on the API documentation, so that's good. Now let's provide a mock response. The API documentation page has an example JSON response, which looks like ```json { "id": 2244994945, "id_str": "2244994945", "name": "TwitterDev", "screen_name": "TwitterDev", "location": "Internet", "profile_location": null, "description": "Developer and Platform Relations @Twitter. We are developer advocates. We can't answer all your questions, but we listen to all of them!", "url": "https://t.co/66w26cua1O", ... } ``` Let's copy that example response to the fixture file path that the message indicated, `api.twitter.com/1.1/users/show.json-84627b.json`. When we run the tests again, there's no more error. Great! This means that `with_mock_api` loaded our mock when it reached the GET request, and the rest of the code continued executing. `getUser` returns a "user" object, so let's now assert some things about it and test some of its methods: ```r test_that("We can get a user object", { user <- getUser("twitterdev") expect_is(user, "user") expect_identical(user$name, "TwitterDev") expect_output(print(user), "TwitterDev") }) ``` We can do the same for the `lookupUsers` function. It should hit the `users/lookup.json` endpoint and the function should return a list of `user` objects: ```r test_that("lookupUsers retrieves many", { result <- lookupUsers(c("twitterapi", "twitter")) expect_is(result, "list") expect_true(all(vapply(result, inherits, logical(1), what = "user"))) }) ``` Drop the example response from the [API documentation](https://dev.twitter.com/rest/reference/get/users/lookup) in the right location, and that passes as well. We just went from zero tests to 25 percent line coverage in a few minutes, using 16 lines of code. We've tested a lot of the code that prepares the requests of the user API, and we've tested much of the code that handles the server's response, the "user" objects that get created in R, and their methods. Our resulting test directory, containing both our test files and our API fixtures, looks like this: ``` tests ├── testthat │   ├── api.twitter.com │   │   └── 1.1 │   │   └── users │   │   ├── lookup.json-342984.json │   │   └── show.json-84627b.json │   ├── setup.R │   └── test-user.R └── testthat.R ``` The full test code looks like this: ```r context("Get a user") use_oauth_token("foo") # Pulled here from setup.R for clarity with_mock_api({ test_that("We can get a user object", { user <- getUser("twitterdev") expect_is(user, "user") expect_identical(user$name, "TwitterDev") expect_identical(name(user), "TwitterDev") expect_output(print(user), "TwitterDev") }) test_that("lookupUsers retrieves many", { result <- lookupUsers(c("twitterapi", "twitter")) expect_is(result, "list") expect_true(all(vapply(result, inherits, logical(1), what = "user"))) }) }) ``` Note that none of the test code inside the `with_mock_api` block looks any different from how you'd write it if you were testing against a live server using just `testthat`. The goal is to make your tests just as natural to write as if you were using your package normally. The `with_mock_api` context handles all of the HTTP mocking seamlessly. # Recording mocks with `capture_requests` Using API documentation to build a library of fixtures is one way to set up testing using `with_mock_api`. Alternatively, you can collect real HTTP responses to use as test fixtures. `capture_requests()` is a context that records the responses from requests you make and stores them as mock files. This enables you to perform a series of requests against a live server once and then build your test suite using those mocks, running your tests in `with_mock_api`. In an interactive session, it may be easier to use the functions `start_capturing` and `stop_capturing` rather than the context manager. You can set up your R session, call `start_capturing()`, and then do whatever commands or function calls that would make HTTP requests, and the responses will be grabbed. We could do something like ```r start_capturing() searchTwitter("#rstats") stop_capturing() ``` and as a result, we'd see a file created with a path/name of `api.twitter.com/1.1/search/tweets.json-ca54df.json`. The file will contain a JSON containing status matching the search query, per the [docs](https://dev.twitter.com/rest/reference/get/search/tweets). Both the `capture_requests` context and the `start_capturing` function follow the path setting of `.mockPaths()`, which lets you specify a location other than the current working directory to which to write the response files. They also have a "simplify" argument that, when `TRUE` (the default), it records simplified `.json`, `.csv`, `.xml`, et al. files where appropriate (200 OK response with a supported `Content-Type`) and .R full "response" objects otherwise. While recording responses to use later in tests can be very convenient, we don't always want to use captured responses, or at least not blindly. Real responses may contain information you want to sanitize or redact, like usernames, emails, or tokens. And real responses may be too big or messy to want to deal with. You may want to pare back a large set of results that a query returns down to four or five results and still have enough variation to test with. ## Automatically recording/playing back: the `with_mock_dir()` context The `with_mock_dir()` context can help you create mock files when they don't exist yet, and to easily re-record them. For example: ```r with_mock_dir("httpbin-get", { a <- GET("https://httpbin.org/get") print(a) }) ``` The first time you run the code, it will create the folder `tests/testthat/httpbin-get`, and create mock files under it. The next times you run it, it will _use_ the mock files in `tests/testthat/httpbin-get`. To re-record, simply delete the folder. # Mocks are text files `httptest` stores these API mocks as plain-text files, which has several nice features, particularly relative to storing serialized (binary) R objects. You can more easily confirm that your mocks look correct, and you can more easily maintain them without having to re-record them. When you do edit them, text files are more easily handled by version-control systems like Git and Mercurial. Plain-text files can also have comments, so you can make notes as to why a certain fixture exists, what a particular value means, and so on, which will help the users of your package—and your future self! By having mocks in human-readable text files, you can also more easily extend your code. APIs are living things that evolve over time, and your code that communicates with an API needs to be able to change with them. If the API adds an additional attribute to an object, no big deal: just touch up the mocks. In addition, you can future-proof your code against that kind of API change by tweaking a fixture file. In [this example](https://github.com/Crunch-io/rcrunch/blob/49cf2526c0c54d05550b6401e0b97a0beeaa1640/inst/app.crunch.io/api/datasets/1.json#L34) from the `crunch` package, an extra, nonsense attribute was added to the JSON just to ensure that the code doesn't break if there are new, unknown features added to the API response. That way, if the API grows new features, people who are using your package don't get errors if they haven't upgraded to the latest release that recognizes the new feature. If you're responsible for the API as well as the R client code that communicates with it, the plain-text mocks can be a valuable source of documentation. Indeed, the file-system tree view of the mock files gives a visual representation of your API. For example, in the [crunch](https://crunch.io/r/crunch/) package, the mocks show an API of catalogs that contain entities that may contain other subdocuments: ``` app.crunch.io/ ├── api │   ├── accounts │   │   ├── account1 │   │   │   └── users.json │   │   └── account1.json │   ├── datasets │   │   ├── 1 │   │   │   ├── export.json │   │   │   ├── filters │   │   │   │   └── filter1.json │   │   │   ├── filters.json │   │   │   ├── permissions.json │   │   │   ├── summary-73a614.json │   │   │   ├── variables │   │   │   │   ├── birthyr │   │   │   │   │   ├── summary-73a614.json │   │   │   │   │   └── values-3d4982.json │   │   │   │   ├── birthyr.json │   │   │   │   ├── gender │   │   │   │   │   ├── summary.json │   │   │   │   │   └── values-51980f.json │   │   │   │   ├── gender.json │   │   │   │   ├── starttime │   │   │   │   │   └── values-3d4982.json │   │   │   │   ├── starttime.json │   │   │   │   ├── textVar │   │   │   │   │   └── values-641ef3.json │   │   │   │   ├── textVar.json │   │   │   │   └── weights.json │   │   │   ├── variables-d118fa.json │   │   │   └── variables.json │   │   ├── 1.json │   │   └── search-c89aba.json │   └── users.json └── api.json ``` # Testing that requests aren't made Mocking API responses isn't the only thing you might want to do in order to test your code. Sometimes, the request that matters is the one you don't make. `httptest` provides several tools to test requests without concern for the responses, as well as the ability to ensure that requests aren't made when they shouldn't be. `without_internet` is a context that simulates the situation when any network request will fail, as in when you are without an internet connection. Any HTTP request will raise an error with a well-defined shape, the same as what `with_mock_api` returns when no mock file is found. The error message has three elements: the request method (e.g. "GET"), the request URL, and the request body, if present. The verb-expectation functions, such as `expect_GET` and `expect_POST`, look for this shape. Here's a example of how `without_internet` can be used to assert that code that should not make network requests in fact does not. This is a simplified version of a test from the [httpcache](https://enpiar.com/r/httpcache/) package, a library that implements a query cache for HTTP requests in R. The point of the query cache is that only the first time you make a certain GET request should it hit the remote API; subsequent requests should read from the cache and not make a request. The test first makes a request (artificially, using `with_fake_http`, the third test context the package provides) to prime the cache. ```r with_fake_http({ test_that("Cache gets set on GET", { expect_length(cacheKeys(), 0) expect_GET(a <- GET("https://app.crunch.io/api/"), "https://app.crunch.io/api/") expect_length(cacheKeys(), 1) expect_identical(a, getCache("https://app.crunch.io/api/")) }) }) ``` Then, using `without_internet`, the test checks two things: first, that doing the same GET succeeds because it reads from cache; and second, that if you bypass the query cache, you get an error because you tried to make a network request. ```r without_internet({ test_that("When the cache is set, can read from it even with no connection", { expect_identical(GET("https://app.crunch.io/api/")$url, "https://app.crunch.io/api/") }) test_that("But uncached() prevents reading from the cache", { expect_error(uncached(GET("https://app.crunch.io/api/")), "GET https://app.crunch.io/api/") }) }) ``` This tells us that our cache is working as expected: we can get results from cache and we don't make a (potentially expensive) network request more than once. ## Assert the shape of request payloads Sometimes it is more clear what you're testing if you focus only on the requests. One case is when the response itself isn't that interesting or doesn't tell you that the request did the correct thing. For example, if you're testing a POST request that alters the state of something on the server and returns 204 No Content status on success, nothing in the response itself (which would be stored in the mock file) tells you that the request you made was shaped correctly—the response has no content. A more transparent, readable test would just assert that the POST request was made to the right URL and had the expected request body. Both `without_internet` and `with_mock_api` allow you to make assertions about requests—method, URL, and payload—that should be made. The various `expect_VERB` expectation functions facilitate this testing. In this example from the `crunch` package, inside the `with_mock_api` context, there is a catalog resource containing three entities, each of which has an "archived" attribute. The `is.archived` method returns the value of that attribute: ```r test_that("is.archived", { expect_identical(is.archived(catalog), c(FALSE, TRUE, FALSE)) }) ``` When we update the attributes of the catalog's entities, we send a PATCH request, and we only want to send values that are changing. That way, we don't unintentionally collide with any other concurrent actions happening on the server, and we can send smaller messages, which should be faster. In this example, if we were to set "archived" to `TRUE` for the second and third elements of the catalog, we'd only want to send a PATCH request that referenced the third element because the second one is already `TRUE`. ```r test_that("'archive' sets the archived attribute and only PATCHes changes", { expect_PATCH(archive(catalog[2:3]), 'https://app.crunch.io/api/datasets/', '{"https://app.crunch.io/api/datasets/3/":{"archived":true}}') }) ``` The resulting state of the system is the same whether the smaller PATCH request is sent or whether the overly verbose one is sent. If you've written logic in your R code to ensure that the smaller PATCH is sent, testing the shape of the request being made is the clearest way to demonstrate and assert that the desired behavior is happening. Another instance of when you might care more about request body shape rather than the resulting response is when there are multiple paths in your R code that should lead to the same request being made. If you can assert that all of those variations result in the same request, then when it comes to testing the response and how your code handles it, you can do that once and not have to repeat for all of the input variations. This is particularly useful in conjunction with integration tests that run against a live server because it means you can have the same test coverage with fewer integration tests. An example of this from the `crunch` package is in testing a `join` function, which has a similar syntax to the base R `merge` function. `merge` takes "by.x" and "by.y" arguments, which point to the variables in the "x" and "y" data.frames on which to match the rows when merging. It has a shortcut for the case where the variable have the same names in both data.frame, in which case you can just specify a "by" argument. To test that all of those combinations of specifying join keys result in the same request, the test defines a payload string to reuse ```r testPayload <- paste0( '{"https://app.crunch.io/api/datasets/1/joins/95c0b45fe0af492594863f818cb913d2/":', '{"left_key":"https://app.crunch.io/api/datasets/1/variables/birthyr/",', '"right_key":"https://app.crunch.io/api/datasets/3/variables/birthyr/"}}' ) ``` and then asserts that three different ways of calling `join` result in the same `PATCH` request being made ```r test_that("Can specify 'by' variables several ways", { expect_PATCH(join(ds1, ds2, by.x = ds1$birthyr, ds2$birthyr), 'https://app.crunch.io/api/datasets/1/joins/', testPayload) expect_PATCH(join(ds1, ds2, by.x = "birthyr", by.y = "birthyr"), 'https://app.crunch.io/api/datasets/1/joins/', testPayload) expect_PATCH(join(ds1, ds2, by = "birthyr"), 'https://app.crunch.io/api/datasets/1/joins/', testPayload) }) ``` Subsequent integration tests that assert that the dataset is correctly modified on the server by `join` then only test with one of those ways of specifying the "by" variables. The R code that constructs the request is fully covered by these assertions. # Just test it The goal of `httptest` is to remove a big obstacle to testing code that communicates with HTTP services: the HTTP service itself. If [`httr` makes HTTP easy](https://github.com/r-lib/httr/blob/master/R/httr.r#L1) and [`testthat` makes testing fun](https://github.com/r-lib/testthat/blob/5ed0bb15fb923eebca8eff529dd50fbf94fd717f/R/test-that.R#L177), `httptest` makes testing your code that uses HTTP a simple pleasure.