Functions for processing the response header of a libcurl request

These two functions are used to collect the contents of the header of an HTTP response via the headerfunction option of a curl handle and then processing that text into both the name: value pairs and also the initial line of the response that provides the status of the request. basicHeaderGatherer is a simple special case of basicTextGatherer with the built-in post-processing step done by parseHTTPHeader.

basicHeaderGatherer(txt = character(), max = NA)
parseHTTPHeader(lines, multi = TRUE)

Arguments

txt: any initial text that we want included with the header. This is passed to basicTextGatherer. Generally it should not be specified unless there is a good reason.
max: This is passed directly to basicTextGatherer
lines: the text as a character vector from the response header that parseHTTPHeader will convert to a status and name-value pairs.
multi: a logical value controlling whether we check for multiple HTTP headers in the lines of text. This is caused by a Continue being concatenated with the actual response. When this is TRUE, we look for the lines that start an HTTP header, e.g. HTTP 200 ..., and we use the content from the last of these.

Value

The return value is the same as basicTextGatherer, i.e. a list with update, value and reset function elements. The value element will invoke parseHTTPHeader on the contents read during the processing of the libcurl request and return that value.

References

Curl homepage https://curl.se/

Author

Duncan Temple Lang

Examples

  if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({
     h = basicHeaderGatherer()
     getURI("https://www.omegahat.net/RCurl/index.html",
              headerfunction = h$update)
     names(h$value())
     h$value()
  })
#> > h = basicHeaderGatherer()
#> > getURI("https://www.omegahat.net/RCurl/index.html", headerfunction = h$update)
#> [1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">\n<html> <head>\n<link rel=\"stylesheet\" href=\"http://www.omegahat.org/OmegaTech.css\">\n<title>RCurl</title>\n</head>\n\n<body>\n<h1>The RCurl Package</h1>\n<p align=right><a href=\"RCurl_1.96-0.tar.gz\">RCurl_1.96-0.tar.gz</a> (20 June 2014)</p>\n<p align=right><a href=\"philosophy.html\">Manual</a></p>\n\nThe RCurl package is an R-interface to the <a\nhref=\"http://curl.haxx.se\">libcurl</a> library that provides HTTP facilities. This\nallows us to download files from Web servers, post forms, use HTTPS\n(the secure HTTP), use persistent connections, upload files, use\nbinary content, handle redirects, password authentication,  etc.\n\n<p>\n The primary top-level entry points are\n<ul>\n  <li> <a href=\"installed/RCurl/html/getURL.html\">getURL()</a>\n  <li> <a href=\"installed/RCurl/html/getURL.html\">getURLContent()</a>      \n  <li> <a href=\"installed/RCurl/html/postForm.html\">getForm()</a>\n  <li> <a href=\"installed/RCurl/html/postForm.html\">postForm()</a>\n</ul>\nHowever, access to the C-level routines is also available\nvia the R code, and one can specify options to all of the\nlibcurl operations to control how they are performed.\nDocumentation about the options and commands\ncan be found at the <a href=\"http://curl.haxx.se\">libcurl web site</a>\n\n\n<p> R functions can be specified to collect text from both the\nresponse and its headers. This can be used to customize the processing\nof the requests and feed the results to higher-level processing\n(e.g. HTML parsing via the htmlTreeParse function in the <a\nhref=\"http://www.omegahat.org/RSXML\">XML package</a>).\n\n\n<p> This package will be used to implement the low-level communication\nin the <a href=\"http://www.omegahat.org/SSOAP\">SSOAP</a> package\nand other high-level packages that utilize HTTP to exchange\nrequests and data.\n\n<h2>Documentation</h2>\n<dl>\n  <dt>\n  <li> <a href=\"RCurlJSS.pdf\">Paper</a> \n       outlining the package with some advanced examples.\n  <dd>\n\n  <dt>\n  <li> <a href=\"philosophy.html\">Guide</a>\n  <dd>\n  <dt>\n  <li> <a href=\"Changes.html\">Changes across releases</a>\n  <dd>\n\n  <dt>\n  <li> Examples of using asynchronous, multiple concurrent requests.\n  <dd>\n    <ul>\n      <li><a href=\"concurrent.html\" >Concurrent downloads</a></li>\n      <li><a href=\"nestedHTML.html\" >Nested HTML requests</a></li>\n      <li><a href=\"xmlParse.html\" >XML parsing with nested requests</a></li>      \n   </ul>\n\n<dt>\n<li> <a href=\"FAQ.html\">FAQ</a>\n<dd>\n\n</dl>\n\n\n<h2>Other Approaches</h2>\n\n<dl>\n  <dt> <a href=\"http://cran.r-project.org/src/contrib/Descriptions/httpRequest.html\">httpRequest</a>\n  <dd> The httpRequest is a package on CRAN that implements a small\n      part of HTTP directly in R using sockets.\n\n  <dt> httpClient\n  <dd> I have developed the <b>httpClient</b> package using\n      R code and connections that supports additional\n      aspects of R and HTTP, such as cookies, character escaping, and also\n      SSL for HTTPS. I haven't released the code (favoring the\n      approach of building on existing C code) but can make it available if anyone\n      is interested.\n    \n</dl>\n\nWhile having code in R makes it easier to understand, explore and\nmodify, it is probably better to use existing specialized libraries\nlike libcurl rather than doing this ourself.  We gain speed and a\nlarge development community that cares about getting things right and\ntesting them.\nWe will explore the use of <a href=\"http://www.w3.org/Library\">libwww</a>\n\n<h2>Issues</h2>\nUsing the opaque data structures of the libcurl infrastructure\nmeans that we cannot easily access the file descriptors used\nin the communication. This  makes it somewhat more difficult\nto integrate these streams into  an R even loop\n(e.g. <a href=\"http://www.omegahat.org/REventLoop\">REventLoop</a>).\nWe can potentially turn them into regular connections\n(if the internal API is made \"public\").\n\n\n<h2>License</h2>\nThis is distributed under the <a href=\"http://www.omegahat.org/BSDLicense.html\">BSD license</a>\nin the same spirit as libcurl itself.\n\n<hr>\n<address><a href=\"http://www.stat.ucdavis.edu/~duncan\">Duncan Temple Lang</a>\n<a href=mailto:duncan@wald.ucdavis.edu>&lt;duncan@wald.ucdavis.edu&gt;</a></address>\n<!-- hhmts start -->\nLast modified: Mon May 25 11:35:38 PDT 2009\n<!-- hhmts end -->\n\n</body> </html>\n"
#> > names(h$value())
#> [1] "Date"           "Server"         "Last-Modified"  "ETag"          
#> [5] "Accept-Ranges"  "Content-Length" "Content-Type"   "status"        
#> [9] "statusMessage" 
#> > h$value()
#>                                                         Date 
#>                              "Wed, 29 Oct 2025 18:26:26 GMT" 
#>                                                       Server 
#> "Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k mod_fcgid/2.3.9" 
#>                                                Last-Modified 
#>                              "Sat, 21 Jun 2014 03:27:39 GMT" 
#>                                                         ETag 
#>                                     "\"10bc-4fc50312600c0\"" 
#>                                                Accept-Ranges 
#>                                                      "bytes" 
#>                                               Content-Length 
#>                                                       "4284" 
#>                                                 Content-Type 
#>                                   "text/html; charset=UTF-8" 
#>                                                       status 
#>                                                        "200" 
#>                                                statusMessage 
#>                                                         "OK"