basicHeaderGatherer.RdThese two functions are used to collect the contents of the header of
an HTTP response via the headerfunction option of a curl handle
and then processing that text into both the name: value pairs
and also the initial line of the response that provides the
status of the request.
basicHeaderGatherer is a simple special case of
basicTextGatherer with the built-in post-processing
step done by parseHTTPHeader.
basicHeaderGatherer(txt = character(), max = NA)
parseHTTPHeader(lines, multi = TRUE)any initial text that we want included with the header.
This is passed to basicTextGatherer. Generally it
should not be specified unless there is a good reason.
This is passed directly to
basicTextGatherer
the text as a character vector from the response header
that
parseHTTPHeader will convert to a status and name-value
pairs.
a logical value controlling whether we check for
multiple HTTP headers in the lines of text. This is caused
by a Continue being concatenated with the actual response.
When this is TRUE, we look for the lines
that start an HTTP header, e.g. HTTP 200 ...,
and we use the content from the last of these.
The return value is the same as basicTextGatherer,
i.e. a list with
update, value and reset function elements.
The value element will invoke parseHTTPHeader
on the contents read during the processing of the libcurl request
and return that value.
Curl homepage https://curl.se/
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({
h = basicHeaderGatherer()
getURI("https://www.omegahat.net/RCurl/index.html",
headerfunction = h$update)
names(h$value())
h$value()
})
#> > h = basicHeaderGatherer()
#> > getURI("https://www.omegahat.net/RCurl/index.html", headerfunction = h$update)
#> [1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">\n<html> <head>\n<link rel=\"stylesheet\" href=\"http://www.omegahat.org/OmegaTech.css\">\n<title>RCurl</title>\n</head>\n\n<body>\n<h1>The RCurl Package</h1>\n<p align=right><a href=\"RCurl_1.96-0.tar.gz\">RCurl_1.96-0.tar.gz</a> (20 June 2014)</p>\n<p align=right><a href=\"philosophy.html\">Manual</a></p>\n\nThe RCurl package is an R-interface to the <a\nhref=\"http://curl.haxx.se\">libcurl</a> library that provides HTTP facilities. This\nallows us to download files from Web servers, post forms, use HTTPS\n(the secure HTTP), use persistent connections, upload files, use\nbinary content, handle redirects, password authentication, etc.\n\n<p>\n The primary top-level entry points are\n<ul>\n <li> <a href=\"installed/RCurl/html/getURL.html\">getURL()</a>\n <li> <a href=\"installed/RCurl/html/getURL.html\">getURLContent()</a> \n <li> <a href=\"installed/RCurl/html/postForm.html\">getForm()</a>\n <li> <a href=\"installed/RCurl/html/postForm.html\">postForm()</a>\n</ul>\nHowever, access to the C-level routines is also available\nvia the R code, and one can specify options to all of the\nlibcurl operations to control how they are performed.\nDocumentation about the options and commands\ncan be found at the <a href=\"http://curl.haxx.se\">libcurl web site</a>\n\n\n<p> R functions can be specified to collect text from both the\nresponse and its headers. This can be used to customize the processing\nof the requests and feed the results to higher-level processing\n(e.g. HTML parsing via the htmlTreeParse function in the <a\nhref=\"http://www.omegahat.org/RSXML\">XML package</a>).\n\n\n<p> This package will be used to implement the low-level communication\nin the <a href=\"http://www.omegahat.org/SSOAP\">SSOAP</a> package\nand other high-level packages that utilize HTTP to exchange\nrequests and data.\n\n<h2>Documentation</h2>\n<dl>\n <dt>\n <li> <a href=\"RCurlJSS.pdf\">Paper</a> \n outlining the package with some advanced examples.\n <dd>\n\n <dt>\n <li> <a href=\"philosophy.html\">Guide</a>\n <dd>\n <dt>\n <li> <a href=\"Changes.html\">Changes across releases</a>\n <dd>\n\n <dt>\n <li> Examples of using asynchronous, multiple concurrent requests.\n <dd>\n <ul>\n <li><a href=\"concurrent.html\" >Concurrent downloads</a></li>\n <li><a href=\"nestedHTML.html\" >Nested HTML requests</a></li>\n <li><a href=\"xmlParse.html\" >XML parsing with nested requests</a></li> \n </ul>\n\n<dt>\n<li> <a href=\"FAQ.html\">FAQ</a>\n<dd>\n\n</dl>\n\n\n<h2>Other Approaches</h2>\n\n<dl>\n <dt> <a href=\"http://cran.r-project.org/src/contrib/Descriptions/httpRequest.html\">httpRequest</a>\n <dd> The httpRequest is a package on CRAN that implements a small\n part of HTTP directly in R using sockets.\n\n <dt> httpClient\n <dd> I have developed the <b>httpClient</b> package using\n R code and connections that supports additional\n aspects of R and HTTP, such as cookies, character escaping, and also\n SSL for HTTPS. I haven't released the code (favoring the\n approach of building on existing C code) but can make it available if anyone\n is interested.\n \n</dl>\n\nWhile having code in R makes it easier to understand, explore and\nmodify, it is probably better to use existing specialized libraries\nlike libcurl rather than doing this ourself. We gain speed and a\nlarge development community that cares about getting things right and\ntesting them.\nWe will explore the use of <a href=\"http://www.w3.org/Library\">libwww</a>\n\n<h2>Issues</h2>\nUsing the opaque data structures of the libcurl infrastructure\nmeans that we cannot easily access the file descriptors used\nin the communication. This makes it somewhat more difficult\nto integrate these streams into an R even loop\n(e.g. <a href=\"http://www.omegahat.org/REventLoop\">REventLoop</a>).\nWe can potentially turn them into regular connections\n(if the internal API is made \"public\").\n\n\n<h2>License</h2>\nThis is distributed under the <a href=\"http://www.omegahat.org/BSDLicense.html\">BSD license</a>\nin the same spirit as libcurl itself.\n\n<hr>\n<address><a href=\"http://www.stat.ucdavis.edu/~duncan\">Duncan Temple Lang</a>\n<a href=mailto:duncan@wald.ucdavis.edu><duncan@wald.ucdavis.edu></a></address>\n<!-- hhmts start -->\nLast modified: Mon May 25 11:35:38 PDT 2009\n<!-- hhmts end -->\n\n</body> </html>\n"
#> > names(h$value())
#> [1] "Date" "Server" "Last-Modified" "ETag"
#> [5] "Accept-Ranges" "Content-Length" "Content-Type" "status"
#> [9] "statusMessage"
#> > h$value()
#> Date
#> "Wed, 29 Oct 2025 18:26:26 GMT"
#> Server
#> "Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k mod_fcgid/2.3.9"
#> Last-Modified
#> "Sat, 21 Jun 2014 03:27:39 GMT"
#> ETag
#> "\"10bc-4fc50312600c0\""
#> Accept-Ranges
#> "bytes"
#> Content-Length
#> "4284"
#> Content-Type
#> "text/html; charset=UTF-8"
#> status
#> "200"
#> statusMessage
#> "OK"