# The Art Of Scripting HTTP Requests Using Curl ## Background This document assumes that you are familiar with HTML and general networking. The increasing amount of applications moving to the web has made "HTTP Scripting" more frequently requested and wanted. To be able to automatically extract information from the web, to fake users, to post or upload data to web servers are all important tasks today. Curl is a command line tool for doing all sorts of URL manipulations and transfers, but this particular document will focus on how to use it when doing HTTP requests for fun and profit. I will assume that you know how to invoke `curl --help` or `curl --manual` to get basic information about it. Curl is not written to do everything for you. It makes the requests, it gets the data, it sends data and it retrieves the information. You probably need to glue everything together using some kind of script language or repeated manual invokes. ## The HTTP Protocol HTTP is the protocol used to fetch data from web servers. It is a simple protocol that is built upon TCP/IP. The protocol also allows information to get sent to the server from the client using a few different methods, as will be shown here. HTTP is plain ASCII text lines being sent by the client to a server to request a particular action, and then the server replies a few text lines before the actual requested content is sent to the client. The client, curl, sends a HTTP request. The request contains a method (like GET, POST, HEAD etc), a number of request headers and sometimes a request body. The HTTP server responds with a status line (indicating if things went well), response headers and most often also a response body. The "body" part is the plain data you requested, like the actual HTML or the image etc. ## See the Protocol Using curl's option [`--verbose`](https://curl.se/docs/manpage.html#-v) (`-v` as a short option) will display what kind of commands curl sends to the server, as well as a few other informational texts. `--verbose` is the single most useful option when it comes to debug or even understand the curl<->server interaction. Sometimes even `--verbose` is not enough. Then [`--trace`](https://curl.se/docs/manpage.html#-trace) and [`--trace-ascii`](https://curl.se/docs/manpage.html#--trace-ascii) offer even more details as they show **everything** curl sends and receives. Use it like this: curl --trace-ascii debugdump.txt http://www.example.com/ ## See the Timing Many times you may wonder what exactly is taking all the time, or you just want to know the amount of milliseconds between two points in a transfer. For those, and other similar situations, the [`--trace-time`](https://curl.se/docs/manpage.html#--trace-time) option is what you need. it will prepend the time to each trace output line: curl --trace-ascii d.txt --trace-time http://example.com/ ## See the Response By default curl sends the response to stdout. You need to redirect it somewhere to avoid that, most often that is done with ` -o` or `-O`. # URL ## Spec The Uniform Resource Locator format is how you specify the address of a particular resource on the Internet. You know these, you have seen URLs like https://curl.se or https://yourbank.com a million times. RFC 3986 is the canonical spec. And yeah, the formal name is not URL, it is URI. ## Host The host name is usually resolved using DNS or your /etc/hosts file to an IP address and that is what curl will communicate with. Alternatively you specify the IP address directly in the URL instead of a name. For development and other trying out situations, you can point to a different IP address for a host name than what would otherwise be used, by using curl's [`--resolve`](https://curl.se/docs/manpage.html#--resolve) option: curl --resolve www.example.org:80:127.0.0.1 http://www.example.org/ ## Port number Each protocol curl supports operates on a default port number, be it over TCP or in some cases UDP. Normally you do not have to take that into consideration, but at times you run test servers on other ports or similar. Then you can specify the port number in the URL with a colon and a number immediately following the host name. Like when doing HTTP to port 1234: curl http://www.example.org:1234/ The port number you specify in the URL is the number that the server uses to offer its services. Sometimes you may use a proxy, and then you may need to specify that proxy's port number separately from what curl needs to connect to the server. Like when using a HTTP proxy on port 4321: curl --proxy http://proxy.example.org:4321 http://remote.example.org/ ## User name and password Some services are setup to require HTTP authentication and then you need to provide name and password which is then transferred to the remote site in various ways depending on the exact authentication protocol used. You can opt to either insert the user and password in the URL or you can provide them separately: curl http://user:password@example.org/ or curl -u user:password http://example.org/ You need to pay attention that this kind of HTTP authentication is not what is usually done and requested by user-oriented websites these days. They tend to use forms and cookies instead. ## Path part The path part is just sent off to the server to request that it sends back the associated response. The path is what is to the right side of the slash that follows the host name and possibly port number. # Fetch a page ## GET The simplest and most common request/operation made using HTTP is to GET a URL. The URL could itself refer to a web page, an image or a file. The client issues a GET request to the server and receives the document it asked for. If you issue the command line curl https://curl.se you get a web page returned in your terminal window. The entire HTML document that that URL holds. All HTTP replies contain a set of response headers that are normally hidden, use curl's [`--include`](https://curl.se/docs/manpage.html#-i) (`-i`) option to display them as well as the rest of the document. ## HEAD You can ask the remote server for ONLY the headers by using the [`--head`](https://curl.se/docs/manpage.html#-I) (`-I`) option which will make curl issue a HEAD request. In some special cases servers deny the HEAD method while others still work, which is a particular kind of annoyance. The HEAD method is defined and made so that the server returns the headers exactly the way it would do for a GET, but without a body. It means that you may see a `Content-Length:` in the response headers, but there must not be an actual body in the HEAD response. ## Multiple URLs in a single command line A single curl command line may involve one or many URLs. The most common case is probably to just use one, but you can specify any amount of URLs. Yes any. No limits. you will then get requests repeated over and over for all the given URLs. Example, send two GETs: curl http://url1.example.com http://url2.example.com If you use [`--data`](https://curl.se/docs/manpage.html#-d) to POST to the URL, using multiple URLs means that you send that same POST to all the given URLs. Example, send two POSTs: curl --data name=curl http://url1.example.com http://url2.example.com ## Multiple HTTP methods in a single command line Sometimes you need to operate on several URLs in a single command line and do different HTTP methods on each. For this, you will enjoy the [`--next`](https://curl.se/docs/manpage.html#-:) option. It is basically a separator that separates a bunch of options from the next. All the URLs before `--next` will get the same method and will get all the POST data merged into one. When curl reaches the `--next` on the command line, it will sort of reset the method and the POST data and allow a new set. Perhaps this is best shown with a few examples. To send first a HEAD and then a GET: curl -I http://example.com --next http://example.com To first send a POST and then a GET: curl -d score=10 http://example.com/post.cgi --next http://example.com/results.html # HTML forms ## Forms explained Forms are the general way a website can present a HTML page with fields for the user to enter data in, and then press some kind of 'OK' or 'Submit' button to get that data sent to the server. The server then typically uses the posted data to decide how to act. Like using the entered words to search in a database, or to add the info in a bug tracking system, display the entered address on a map or using the info as a login-prompt verifying that the user is allowed to see what it is about to see. Of course there has to be some kind of program on the server end to receive the data you send. You cannot just invent something out of the air. ## GET A GET-form uses the method GET, as specified in HTML like: ```html