How the web works: HTTP and CGI explained

本文提供了一次基本的Web工作原理指南,包括HTTP协议、服务器响应及客户端请求过程。它覆盖了如何从浏览器到服务器的文档传输,并解释了服务器的工作方式和脚本技术。此外,文章还探讨了缓存、代理缓存、服务器端脚本、内容协商、认证、代理HTML扩展等关键概念。

How the web works: HTTP and CGI explained

Introduction | Following links | Caching | Client request | Server response | Server-side scripting |More details | FAQs | Appendices

Other articles and hacks by me.

Introduction

About this tutorial

This is an attempt to give a basic understanding of how the web works and was written because I saw so many articles on various news groups that plainly showed that people needed to learn this. Not knowing of any one place to find this info, I decided to collect it into an article. Hope you find it useful!

It covers the HTTP protocol, which is used to transmit and receive web pages, as well as some server workings and scripting technologies. It is assumed that you already know how to make web pages and preferably some HTML as well.. It is also assumed that you have some basic knowledge of URLs. (A URL is the address of a document, what you need to be able to get hold of the document.)

I'm not entirely happy with this, so feedback would be very welcome. If you're not really a technical person and this tutorial leaves you puzzled or does not answer all your questions I'd very much like to hear about it. Corrections and opinions are also welcome.

Some background

When you browse the web the situation is basically this: you sit at your computer and want to see a document somewhere on the web, to which you have the URL.

Since the document you want to read is somewhere else in the world and probably very far away from you some more details are needed to make it available to you. The first detail is your browser. You start it up and type the URL into it (at least you tell the browser somehow where you want to go, perhaps by clicking on a link).

However, the picture is still not complete, as the browser can't read the document directly from the disk where it's stored if that disk is on another continent. So for you to be able to read the document the computer that contains the document must run a web server. A web server is a just a computer program that listens for requests from browsers and then execute them.

So what happens next is that the browser contacts the server and requests that the server deliver the document to it. The server then gives a response which contains the document and the browser happily displays this to the user. The server also tells the browser what kind of document this is (HTML file, PDF file, ZIP file etc) and the browser then shows the document with the program it was configured to use for this kind of document.

The browser will display HTML documents directly, and if there are references to images, Java applets, sound clips etc in it and the browser has been set up to display these it will request these also from the servers on which they reside. (Usually the same server as the document, but not always.) It's worth noting that these will be separate requests, and add additional load to the server and network. When the user follows another link the whole sequence starts anew.

These requests and responses are issued in a special language called HTTP, which is short for HyperText Transfer Protocol. What this article basically does is describe how this works. Other common protocols that work in similar ways are FTP and Gopher, but there are also protocols that work in completely different ways. None of these are covered here, sorry. (There is a link to some more details about FTP in the references.)

It's worth noting that HTTP only defines what the browser and web server say to each other, not how they communicate. The actual work of moving bits and bytes back and forth across the network is done by TCP and IP, which are also used by FTP and Gopher (as well as most other internet protocols).

When you continue, note that any software program that does the same as a web browser (ie: retrieve documents from servers) is called a client in network terminology and a user agent in web terminology. Also note that the server is properly the server program, and not the computer on which the server is an application program. (Sometimes called the server machine.)

What happens when I follow a link?

Step 1: Parsing the URL

The first thing the browser has to do is to look at the URL of the new document to find out how to get hold of the new document. Most URLs have this basic form: "protocol://server/request-URI". The protocol part describes how to tell the server which document the you want and how to retrieve it. The server part tells the browser which server to contact, and the request-URI is the name used by the web server to identify the document. (I use the term request-URI since it's the one used by the HTTP standard, and I can't think of anything else that is general enough to not be misleading.)

Step 2: Sending the request

Usually, the protocol is "http". To retrieve a document via HTTP the browser transmits the following request to the server: "GET /request-URI HTTP/version", where version tells the server which HTTP version is used. (Usually, the browser includes some more information as well. The details are covered later.)

One important point here is that this request string is all the server ever sees. So the server doesn't care if the request came from a browser, a link checker, a validator, a search engine robot or if you typed it in manually. It just performs the request and returns the result.

Step 3: The server response

When the server receives the HTTP request it locates the appropriate document and returns it. However, an HTTP response is required to have a particular form. It must look like this:

HTTP/[VER] [CODE] [TEXT]
Field1: Value1
Field2: Value2

...Document content here...

The first line shows the HTTP version used, followed by a three-digit number (the HTTP status code) and a reason phrase meant for humans. Usually the code is 200 (which basically means that all is well) and the phrase "OK". The first line is followed by some lines called the header, which contains information about the document. The header ends with a blank line, followed by the document content. This is a typical header:

HTTP/1.0 200 OK
Server: Netscape-Communications/1.1
Date: Tuesday, 25-Nov-97 01:22:04 GMT
Last-modified: Thursday, 20-Nov-97 10:44:53 GMT
Content-length: 6372
Content-type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
...followed by document content...

We see from the first line that the request was successful. The second line is optional and tells us that the server runs the Netscape Communications web server, version 1.1. We then get what the server thinks is the current date and when the document was modified last, followed by the size of the document in bytes and the most important field: "Content-type".

The content-type field is used by the browser to tell which format the document it receives is in. HTML is identified with "text/html", ordinary text with "text/plain", a GIF is "image/gif" and so on. The advantage of this is that the URL can have any ending and the browser will still get it right.

An important concept here is that to the browser, the server works as a black box. Ie: the browser requests a specific document and the document is either returned or an error message is returned. How the server produces the document remains unknown to the browser. This means that the server can read it from a file, run a program that generates it, compile it by parsing some kind of command file or (very unlikely, but in principle possible) have it dictated by the server administrator via speech recognition software. This gives the server administrator great freedom to experiment with different kinds of services as the users don't care (or even know) how pages are produced.

What the server does

When the server is set up it is usually configured to use a directory somewhere on disk as its root directory and that there be a default file name (say "index.html") for each directory. This means that if you ask the server for the file "/" (as in "http://www.domain.tld/") you'll get the file index.html in the server root directory. Usually, asking for "/foo/bar.html" will give you the bar.html file from the foo directory directly beneath the server root.

Usually, that is. The server can be set up to map "/foo/" into some other directory elsewhere on disk or even to use server-side programs to answer all requests that ask for that directory. The server does not even have to map requests onto a directory structure at all, but can use some other scheme.

HTTP versions

So far there are three versions of HTTP. The first one was HTTP/0.9, which was truly primitive and never really specified in any standard. This was corrected by HTTP/1.0, which was issued as a standard in RFC 1945. (See references). HTTP/1.0 is the version of HTTP that is in common use today (usually with some 1.1 extensions), while HTTP/0.9 is rarely, if ever, used by browsers. (Some simpler HTTP clients still use it since they don't need the later extensions.)

RFC 2068 describes HTTP/1.1, which extends and improves HTTP/1.0 in a number of areas. Very few browsers support it (MSIE 4.0 is the only one known to the author), but servers are beginning to do so.

The major differences are a some extensions in HTTP/1.1 for authoring documents online via HTTP and a feature that lets clients request that the connection be kept open after a request so that it does not have to be reestablished for the next request. This can save some waiting and server load if several requests have to be issued quickly.

This document describes HTTP/1.0, except some sections that cover the HTTP/1.1 extensions. Those will be explicitly labeled.

The request sent by the client

The shape of a request

Basically, all requests look like this:

[METH] [REQUEST-URI] HTTP/[VER]
[fieldname1]: [field-value1]
[fieldname2]: [field-value2]

[request body, if any]

The METH (for request method) gives the request method used, of which there are several, and which all do different things. The above example used GET, but below some more are explained. The REQUEST-URI is the identifier of the document on the server, such as "/index.html" or whatever. VER is the HTTP version, like in the response. The header fields are also the same as in the server response.

The request body is only used for requests that transfer data to the server, such as POST and PUT. (Described below.)

GETting a document

There are several request types, with the most common one being GET. A GET request basically means "send me this document" and looks like this: "GET document_path HTTP/version". (Like it was described above.) For the URL "http://www.yahoo.com/" the document_path would be "/", and for"http://www.w3.org/Talks/General.html" it is "/Talks/General.html".

However, this first line is not the only thing a user agent (UA) usually sends, although it's the only thing that's really necessary. The UA can include a number of header fields in the request to give the server more information. These fields have the form "fieldname: value" and are all put on separate lines after the first request line.

Some of the header fields that can be used with GET are:

User-Agent
This is a string identifying the user agent. An English version of Netscape 4.03 running under Windows NT would send "Mozilla/4.03 [en] (WinNT; I ;Nav)". (Mozilla is the old name for Netscape. See the  references for more details.)
Referer
The referer field (yes, it's misspelled in the standard) tells the server where the user came from, which is very useful for logging and keeping track of who links to ones pages.
If-Modified-Since
If a browser already has a version of the document in its  cache it can include this field and set it to the time it retrieved that version. The server can then check if the document has been modified since the browser last downloaded it and send it again if necessary. The whole point is of course that if the document  hasn't changed, then the server can just say so and save some waiting and network traffic.
From
This header field is a spammers dream come true: it is supposed to contain the email address of whoever controls the user agent. Very few, if any, browsers use it, partly because of the threat from spammers. However, web robots should use it, so that webmasters can contact the people responsible for the robot should it misbehave.
Authorization
This field can hold username and password if the document in question requires  authorization to be accessed.

To put all these pieces together: this is a typical GET request, as issued by my browser (Opera):

GET / HTTP/1.0
User-Agent: Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4)
Accept: */*
Host: birk105.studby.uio.no:81

HEAD: checking documents

One may somtimes want to see the headers returned by the server for a particular document, without actually downloading the document. This is exactly what the HEAD request method provides. HEAD looks and works exactly like GET, only with the difference that the server only returns the headers and not the document content.

This is very useful for programs like link checkers, people who want to see the response headers (to see what server is used or to verify that they are correct) and many other kinds of uses.

Playing web browser

You can actually play web browser yourself and write HTTP requests directly to web servers. This can be done by telnetting to port 80, writing the request and hitting enter twice, like this:

larsga - tyrfing>telnet www.w3.org 80
Trying 18.23.0.23...
Connected to www.w3.org.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 17 Feb 1998 22:24:53 GMT
Server: Apache/1.2.5
Last-Modified: Wed, 11 Feb 1998 18:22:22 GMT
ETag: "2c3136-23c1-34e1ec5e"
Content-Length: 9153
Accept-Ranges: bytes
Connection: close
Content-Type: text/html; charset=ISO-8859-1

Connection closed by foreign host.
larsga - tyrfing>

However, this works best under Unix as the Windows telnet clients I've used are not very suitable for this (and hard to set up so that it works). Instead you can use HTTPTest, a CGI script I've linked to in the references.

The response returned by the server

Outline

What the server returns consists of a line with the status code, a list of header fields, a blank line and then the requested document, if it is returned at all. Sort of like this:

HTTP/1.0 code text
Field1: Value1
Field2: Value2

...Document content here...

The status codes

The status codes are all three-digit numbers that are grouped by the first digit into 5 groups. The reason phrases given with the status codes below are just suggestions. Server can return any reason phrase they wish.

1xx: Informational

No 1xx status codes are defined, and they are reserved for experimental purposes only.

2xx: Successful

Means that the request was processed successfully.

200 OK
Means that the server did whatever the client wanted it to, and all is well.
Others
The rest of the 2xx status codes are mainly meant for script processing and are not often used.
3xx: Redirection

Means that the resource is somewhere else and that the client should try again at a new address.

301 Moved permanently
The resource the client requested is somewhere else, and the client should go there to get it. Any links or other references to this resource should be updated.
302 Moved temporarily
This means the same as the 301 response, but links should now not be updated, since the resource may be moved again in the future.
304 Not modified
This response can be returned if the client used the if-modified-since header field and the resource has not been modified since the given time. Simply means that the cached version should be displayed for the user.
4xx: Client error

Means that the client screwed up somehow, usually by asking for something it should not have asked for.

400: Bad request
The request sent by the client didn't have the correct syntax.
401: Unauthorized
Means that the client is not allowed to access the resource. This may change if the client retries with an  authorization header.
403: Forbidden
The client is not allowed to access the resource and authorization will not help.
404: Not found
Seen this one before? :) It means that the server has not heard of the resource and has no further clues as to what the client should do about it. In other words: dead link.
5xx: Server error

This means that the server screwed up or that it couldn't do as the client requested.

500: Internal server error
Something went wrong inside the server.
501: Not implemented
The request method is not supported by the server.
503: Service unavailable
This sometimes happens if the server is too heavily loaded and cannot service the request. Usually, the solution is for the client to wait a while and try again.

The response header fields

These are the header fields a server can return in response to a request.

Location
This tells the user agent where the resource it requested can be found. The value is just the URL of the new resource.
Server
This tells the user agent which web server is used. Nearly all web servers return this header, although some leave it out.
Content-length
This gives the size of the resource, in bytes.
Content-type
This describes the file format of the resource.
Content-encoding
This means that the resource has been coded in some way and must be decoded before use.
Expires
This field can be set for data that are updated at a known time (for instance if they are generated by a script). It is used to prevent browsers from caching the resource beyond the given date.
Last-modified
This tells the browser when the resource was last modified. Can be useful for mirroring, update notification etc.

Caching: agents between the server and client

The browser cache

You may have noticed that when you go back to a page you've looked at not too long before the page loads much quicker. That's because the browser stored a local copy of it when it was first downloaded. These local copies are kept in what's called a cache. Usually one sets a maximum size for the cache and a maximum caching time for documents.

This means that when a new page is visited it is stored in the cache, and if the cache is full (near the maximum size limit) some document that the browser considers unlikely to be visited again soon is deleted to make room. Also, if you go to a page that is stored in the cache the browser may find that you've set 7 days as a the maximum storage time and 8 days have now passed since the last visit, so the page needs to be reloaded.

Exactly how caches work differ between browsers, but this is the basic idea, and it's a good one because it saves both time for the user and network traffic. There are also some HTTP details involved, but they will be covered later.

Proxy caches

Browser caches are a nice feature, but when many users browse from the same site one usually ends up storing the same document in many different caches and refreshing it over and over for different uses. Clearly, this isn't optimal.

The solution is to let the users share a cache, and this is exactly what proxy caches are all about. Browsers still have their local caches, but HTTP requests for documents not in the browser cache are not sent to the server any more, instead they are sent to the proxy cache. If the proxy has the document in its cache it will just return the document (like the browser cache would), and if it doesn't it will submit the request on behalf of the browser, store the result and relay it to the browser.

So the proxy is really a common cache for a number of users and can reduce network traffic rather dramatically. It can also skew log-based statistics badly. :)

A more advanced solution than a single proxy cache is a hierarchy of proxy caches. Imagine a large ISP may have one proxy cache for each part of the country and set up each of the regional proxies to use a national proxy cache instead of going directly to the source web servers. This solution can reduce network traffic even further. More detail on this is linked to in the references.

Server-side programming

What is it and why do it?

Server-side scripts or programs are simply programs that are run on the web server in response to requests from the client. These scripts produce normal HTML (and sometimes HTTP headers as well) as output which is then fed back to the client as if the client had requested an ordinary page. In fact, there is no way for the client software to tell whether scripting has been used or not.

Technologies such as JavaScript, VBScript and Java applets all run in the client and so are not examples of this. There is a major difference between server-side and client-side scripting as the client and server are usually different computers. So if all the data the program needs are located on the server it may make sense to use server-side scripting instead of client-side. (There is also the problem that the client may not have a browser that supports the scripting technology or it may be turned off.) If the program and user need to interact often client-side scripting is probably best, to reduce the number of requests sent to the server.

So: in general, if the program needs a lot of data and infrequent interactions with the server server-side scripting is probably best. Applications that use less data and more interaction are best put in the client. Also, applications that gather data over time need to be on the server where the data file is kept.

An example of the first would be search engines like Altavista. Obviously it's not feasible to download all the documents Altavista has collected to search in them locally. An example of the last would be a simple board game. No data are needed and having to send a new request to the server for each move you make quickly gets tedious.

There is one kind of use that has been left out here: what do you do when you want a program to work on data with a lot of interaction with the user? There is no good solution right now, but there is one on the horizon called XML. (See the the references for more info.)

How it works

The details of how server-side scripting works vary widely with the technique used (and there are loads of them). However, some things remain the same. The web server receives a request just like any other, but notes that this URL does not map to a flat file, but instead somehow to a scripting area.

The server then starts the script, feeding it all the information contained in the request headers and URL. The script then runs and produces as its output the HTML and HTTP headers to be returned to the client, which the server takes care of.

CGI

CGI (Common Gateway Interface) is a way for web servers and server-side programs to interact. CGI is completely independent of programming language, operating system and web server. Currently it is the most common server-side programming technique and it's also supported by almost every web server in existence. Moreover, all servers implement it in (nearly) the same way, so that you can make a CGI script for one server and then distribute it to be run on any web server.

Like I wrote above, the server needs a way to know which URLs map to scripts and which URLs just map to ordinary HTML files. For CGI this is usually done by creating CGI directories on the server. This is done in the server setup and tells the server that all files in a particular top-level directory are CGI scripts (located somewhere on the disk) to be executed when requested. (The default directory is usually /cgi-bin/, so one can tell that URLs like this: http://www.varsity.edu/cgi-bin/search point to a CGI script. Note that the directory can be called anything.) Some servers can also be set up to not use CGI directories and instead require that all CGI programs have file names ending in .cgi.

CGI programs are just ordinary executable programs (or interpreted programs written in, say, Perl or Python, as long as the server knows how to start the program), so you can use just about any programming language you want. Before the CGI program is started the web server sets a number of environment variables that contain the information the web server received in the request. Examples of this are the IP address of the client, the headers of the request etc. Also, if the URL requested contained a ?, everything after the ? is put in an environment variable by itself.

This means that extra information about the request can be put into the URL in the link. One way this is often used is by multi-user hit counters to tell which user was hit this time. Thus, the user can insert an image on his/her page and have the SRC attribute be a link to the CGI script like this: SRC="http://stats.vendor.com/cgi-bin/counter.pl?username". Then the script can tell which user was hit and increment and display the correct count. (Ie: that of Peter and not Paul.)

The way the CGI returns its output (HTTP headers and HTML document) to the server is exceedingly simple: it writes it to standard out. In other words, in a Perl or Python script you just use the print statement. In C you use printf or some equivalent (C++ uses cout <<) while Java would use System.out.println.

More information on CGI is available in the references.

Other techniques

CGI is certainly not the only way to make server-side programs and has been much criticized for inefficiency. This last claim has some weight since the CGI program has to be loaded into memory and reexecuted from scratch each time it is requested.

A much faster alternative is programming to the server API itself. Ie: making a program that essentially becomes a part of the server process and uses an API exposed by the server. The problem with this technique is that the API is of course server-dependent and that if you use C/C++ (which is common) programming errors can crash the whole server.

The main advantage of server API programming is that it is much faster since when the request arrives the program is already loaded into memory together with whatever data it needs.

Some servers allow scripting in crash-proof languages. One example is AOLServer, which uses tcl. There are also modules available for servers like Apache, which let you do your server API programming in Perl or Python, which effectively removes the risk of programming errors crashing the server.

There are also lots and lots of proprietary (and non-proprietary) scripting languages and techniques for various web servers. Some of the best-known are ASP, MetaHTML and PHP3.

Submitting forms

The most common way for server-side programs to communicate with the web server is through ordinary HTML forms. The user fills in the form and hits the submit button, upon which the data are submitted to the server. If the form author specified that the data should be submitted via a GET request the form data are encoded into the URL, using the ? syntax I described above. The encoding used is in fact very simple. If the form consists of the fields name and email and the user fills them out as Joe and joe@hotmail.com the resulting URL looks like this: http://www.domain.tld/cgi-bin/script?name=joe&email=joe@hotmail.com.

If the data contain characters that are not allowed in URLs, these characters are URL-encoded. This basically means that the character (say ~) is replaced with a % followed by its two-digit ASCII number (say %7E). The details are available in RFC 1738 about URLs, which is linked to in the references.

POST: Pushing data to the server

GET is not the only way to submit data from a form, however. One can also use POST, in which case the request contains both headers and a body. (This is just like the response from the server.) The body is then the form data encoded just like they would be on the URL if one had used GET.

Primarily, POST should be used when the request causes a permanent change of state on the server (such as adding to a data list) and GET when this is not the case (like when doing a search).

If the data can be long (more than 256 characters) it is a bit risky to use GET as the URL can end up being snipped in transit. Some OSes don't allow environment variables to be longer than 256 characters, so the environment variable that holds the ?-part of the request may be silently truncated. This problem is avoided with POST as the data are then not pushed through an environment variable.

Some scripts that handle POST requests cause problems by using a 302 status code and Location header to redirect browsers to a confirmation page. However, according to the standard, this does not mean that the browser should retrieve the referenced page, but rather that it should resubmit the data to the new destination. See the references for details on this.

More details

Content negotiation

Imagine that you've just heard of the fancy new PNG image format, and want to use this for your images instead of the older GIF. However, GIF is supported by all browsers with some self-respect, while PNG is only supported by the most modern ones. So if you replace all your GIFs with PNGs, only a limited number of users will be able to actually see your images.

To avoid this one could set up a link to an imaginary location (lets call it /foo/imgbar). Then, when browsers request this image after seeing the location in an IMG element they will add an Accept header field to their request. (Most browsers do this with all requests.)

The accept header would then be set to "image/*, image/png", which would then mean: I prefer PNG images, but if you can't deliver that, then just send me any image you've got.

That would be in a standards-based world. We do, however, live in a decidedly non-standard world. Most browsers simply send the header "Accept: */*" which doesn't really tell the server anything at all. This may change in the future, but for now this is a nice, but useless feature.

Cookies

An inconvenient side of the HTTP protocol is that each request essentially stands on its own and is completely unrelated to any other request. This is inconvenient for scripting, because one may want to know what a user has done before this last request was issued. As long as plain HTTP is used there is really no way to know this. (There are some tricks, but they are ugly and expensive.)

To illustrate the problem: imagine a server that offers a lottery. People have to view ads to participate in the lottery, and those who offer the lottery don't want people to be able to just reload and reload until they win something. This can be done by not allowing subsequent visits from a single IP address (ie: computer) within a certain time interval. However, this causes problems as one doesn't really know if it's the same user.

People who dial up via modems to connect to the internet are usually given a new IP address from a pool of available addresses each time, which means that the same IP address may be given to two different users within an hour if the first one disconnects and another user is given the same IP later. (Also: on larger UNIX installations many people are usually using the same computer simultaneously from different terminals.)

The solution proposed by Netscape is to use magic strings called cookies. (The specification says they are called this "for no compelling reason", but the name has a long history.) The server returns a "Set-cookie" header that gives a cookie name, expiry time and some more info. When the user returns to the same URL (or some other URL, this can be specified by the server) the browser returns the cookie if it hasn't expired.

This way, our imaginary lottery could set a cookie when the user first tries the lottery and set it to expire when the user can return. The lottery script could then check if the cookie is delivered with the request and if so just tell the user to try again later. This would work just fine if browsers didn't allow users to turn off cookies...

Cookies can also be used to track the path of a user through a web site or to give pages a personalized look by remembering what the user has done before. Rather useful, but there are some privacy issues involved here.

Server logs

Most servers (if not all) create logs of their usage. This means that every time the server gets a request it will add a line in its log, which gives information about the request. Below an excerpt from an actual log:

rip.axis.se - - [04/Jan/1998:21:24:46 +0100] "HEAD /ftp/pub/software/ HTTP/1.0" 200 6312 - "Mozilla/4.04 [en] (WinNT; I)"
tide14.microsoft.com - - [04/Jan/1998:21:30:32 +0100] "GET /robots.txt HTTP/1.0" 304 158 - "Mozilla/4.0 (compatible; MSIE 4.0; MSIECrawler; Windows 95)"
microsnot.HIP.Berkeley.EDU - - [04/Jan/1998:22:28:21 +0100] "GET /cgi-bin/wwwbrowser.pl HTTP/1.0" 200 1445 "http://www.ifi.uio.no/~larsga/download/stats/" "Mozilla/4.03 [en] (Win95; U)"
isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET /download/RFCsearch.html HTTP/1.0" 200 2399 "http://www.kvarteret.uib.no/~pas/" "Mozilla/4.04 [en] (Win95; I)"
isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET /standard.css HTTP/1.0" 200 1064 - "Mozilla/4.04 [en] (Win95; I)"

This log is in the extended common log format, which is supported by most web servers. The first hit is from Netscape 4.04, the second from some robot version of MSIE 4.0, while three to five are again from Netscape 4.04. (Note that the MSIECrawler got a 304 response, which means that if used the If-modified-since header.)

A server log can be useful when debugging applications and scripts or the server setup. It can also be run through a log analyzer, which can create various kinds of usage reports. One should however be aware that these reports are not 100% accurate due to the use of caches.

A sample HTTP client

Just to illustrate, in case some are interested, here is a simple HTTP client written as a Python function that takes a host name and path as parameters and issues a GET request, printing the returned results. (This can be made even simpler by using the Python URL library, but that would make the example useless.)


# Simple Python function that issues an HTTP request

from socket import *

def http_req(server, path):

    # Creating a socket to connect and read from
    s=socket(AF_INET,SOCK_STREAM)

    # Finding server address (assuming port 80)
    adr=(gethostbyname(server),80)

    # Connecting to server
    s.connect(adr)

    # Sending request
    s.send("GET "+path+" HTTP/1.0\n\n")

    # Printing response
    resp=s.recv(1024)
    while resp!="":
	print resp
	resp=s.recv(1024)

Here is the server log entry that resulted from this call: http_req("birk105.studby.uio.no","/")

birk105.studby.uio.no - - [26/Jan/1998:12:01:51 +0100] "GET / HTTP/1.0" 200 2272 - -

The "- -"s at the end are the referrer and user-agent fields, which are empty because the request did not contain this information.

Authentication

A server can be set up to disallow access to certain URLs unless the user can confirm his/her identity. This is usually done with a user-name/password combination which is specified in the setup, but there are other methods as well.

When such a page is requested the server generally returns a "401 Not authorized" status code as mentioned above. The browser will then usually prompt the user for a user name and password, which the user supplies (if it is known!). The browser then tries again, this time adding an "Authorization" header with the user name and password as the value.

If this is accepted by the server the resource is returned just like in an ordinary request. If not, the server again responds with a 401 status code.

Server-side HTML extensions

Some servers, such as the Roxen and MetaHTML servers, allow users to embed non-HTML commands within their HTML files. These commands are then processed when the file is requested and generates the HTML that is sent to the client. This can be used to tailor the page contents and/or to access a database to deliver contents from it.

What these languages have in common is usually that they are tied to a single server and that they produce a combination of ordinary HTTP headers and HTML contents as their output.

This is important because it means that the client does not notice what is going on at all, and so can use any browser.

Authoring/maintainance: HTTP extensions for this

HTTP/1.0 only defined the GET, HEAD and POST methods and for ordinary browsing this is enough. However, one may want to use HTTP to edit and maintain files directly on the server, instead of having to go through an FTP server as is common today. HTTP/1.1 adds a number of new methods for this.

These are:

PUT
PUT uploads a new resource (file) to the server under the given URL. Exactly what happens on the server is not defined by the HTTP/1.1 spec, but authoring programs like Netscape Composer use PUT to upload a file to the web server and store it there. PUT requests should not be cached.
DELETE
Well, it's obvious, isn't it? The DELETE method requests the server to delete the resource identified by the URL. On UNIX systems this can fail if the server is not allowed to delete the file by the file system.

META HTTP-EQUIV

Not all web servers are made in such a way that it is easy to set, say the Expires header field or some other header field you might want to set for one particular file. There is a way around that problem. In HTML there is an element called META that can be used to set HTTP header fields. It was really supposed to be parsed by the web server and inserted into the response headers, but very few servers actually do this, so browsers have started implementing it instead. It's not supported in all browsers, but it can still be of some help.

The usage is basically this: put it in the HEAD-element of the HTML file and make it look like this:


<META HTTP-EQUIV="header field name" CONTENT="field value">

The Host header field

Many web hotels let a single physical machine serve what to the user looks like several different servers. For instance, http://www.foo.com/, http://www.bar.com/ and http://www.baz.com/ could all very well be served from the same web server. Still, the user would see different pages by going to each of the different URLs. To enable this, an extension to the HTTP protocol was needed to let the web server know which of these different web servers the user wanted to access.

The solution is the Host header field. When the user requests http://www.bar.com/ the web server will receive a header set to "Host: www.bar.com" and thus know which top page to deliver. It will also usually create separate logs for the different virtual servers.

Answers to some common questions

Can I prevent people from seeing my HTML source?

No. Once you let people issue an HTTP request that retrieves your document they can do whatever they want with it. No HTML tag can stop this because one could just use a Python program like the one I included above and let it save the HTML to file without even looking at it. Nor is any such tag recognized by browsers.

Some people try obfuscating the HTML by putting all the markup on a single line, but this is easily fixed by running the HTML through a pretty-printer.

In short: give it up. There's no way to protect the information in your HTML and there's no reason to try unless you put vital information in hidden form controls, and you shouldn't do that anyway.

Can I prevent people from stealing my images?

The answer is the same for this as it is for HTML: you can't do it. You can watermark them and prove that they belong to you and enter comments that say this in clear text, but it still doesn't keep people from issuing an HTTP request and saving the image.

Files in format X are not displayed correctly, why not?

This is a rather common problem for some formats: you put up a file on your web server in a new format you've never used before and when you try to download it it comes up in your browser as plain text (where it looks like complete gobbledygook) instead of being saved, played or shown or whatever.

The problem is probably that the server does not know what kind of file this is and signals it as Content-type: text/plain, which the browser happily displays as if it really were text. The solution is to configure the server so that it knows what kind of file this is and signals it correctly. The references link to the list of registered MIME types.

One thing to note here is that MSIE 3.0 (4.0?) does not honour the content-type given by the server, but instead tries to guess which format the file is in and treat it accordingly. This is a blatant violation of the HTTP specification on the part of Microsoft, but it's not the only standard violation they've committed and there's nothing to do about it except to be aware of it.

How can I pass a parameter to a web page?

If the web page is a plain HTML file: forget it. HTML files are just displayed and nothing more is ever done with them, so the concept of a parameter just doesn't apply.

What you can do is to use server-side scripting the way it is described elsewhere in this document.

How can I prevent browsers from caching my page/script?

This is done by setting the correct HTTP headers. If the Expiration header is set to a date/time in the past the output from the request will not be cached. (Note that HTTP requires browsers to keep the result in the browser history, so that going back to a non-cached page does not cause a new request.)

The expiration time can be set with a server-script or possibly by configuring the server correctly.

Should I include a slash (/) at the end of my URLs?

If the URL points to a directory and you want the server to list the contents or the index.html file: yes. (Remember: the URL may point to a file without an extension or the server may not map it to directory/file structure at all.) If you follow a URL like this one: http://www.garshol.priv.no/download the server will notice that someone requested a file that doesn't exist, but there is a directory with the same name. The server will then give a 301 response redirecting the client to the URL http://www.garshol.priv.no/download/ which the client will then try and succeed. It is worth noting here that these two are actually different URLs, which is why the server cannot return the contents of the first URL directly. (In fact, one could well argue that it shouldn't offer the redirect at all.)

All this is invisible to the user, but the user will have to wait a little longer and the server will have to work harder. So the best thing is to include the slash and avoid extra network traffic and server load.

Appendices

Explanations of some technical terms

API
An Application Programming Interface is an interface exposed by a program, part of an operating system or programming language to other programs, so that the programs that use the API can exploit the features of the program that exposes the API. One example of this would be the AWT windowing library of Java that exposes an API that can be used to write programs with graphical user interfaces. APIs are only used by programs, they are not user interfaces.
TCP/IP
The IP protocol (IP is short for Internet Protocol) is the backbone of the internet, the foundation on which all else is built. To be a part of the internet a computer must support IP, which is what is used for all data transfer on the internet. TCP is another protocl (Transport Control Protocol) that extends IP with features useful for most higher-level protocols such as HTTP. (Lots of other protocols also use TCP: FTP, Gopher, SMTP, POP, IMAP, NNTP etc.) Some protocols use UDP instead of TCP.

Acknowledgements

In closing I'd like to thank the following people who have helped me with this tutorial:

  • Jelks Cabaniss, who provided early feedback on form and contents as well as encouragement.
  • Alan J. Flavell, for detailed and highly useful criticism.
  • Jukka Korpela, for help with the markup in this document and extensive criticism on form and content, as well as a couple of useful links.
  • Christian Nybø, for suggesting that I mention telnetting to servers and making HTTPTest a little more prominent.
  • Harald Joerg, who found a number of typos, and suggested some improvements in the organization.
  • Bjørn Borud, for some hints on organization as well as a typo fix.
  • Ingrid Melve for some useful links.
  • Darren Moore for suggesting that I define some technical terms.
  • Felipe Wersen for pointing out a typo.

References

Important specifications and official pages
Proxy caches
Various

Last update 12.Oct.99 22:56, by Lars M. Garshol.
请翻译下面内容为中文, ====================================== INSTALLING SUBVERSION A Quick Guide ====================================== $LastChangedDate$ Contents: I. INTRODUCTION A. Audience B. Dependency Overview C. Dependencies in Detail D. Documentation II. INSTALLATION A. Building from a Tarball B. Building the Latest Source under Unix C. Building under Unix in Different Directories D. Installing from a Zip or Installer File under Windows E. Building the Latest Source under Windows F. Building using CMake III. BUILDING A SUBVERSION SERVER A. Setting Up Apache Httpd B. Making and Installing the Subversion Apache Server Module C. Configuring Apache Httpd for Subversion D. Running and Testing E. Alternative: 'svnserve' and ra_svn IV. PROGRAMMING LANGUAGE BINDINGS (PYTHON, PERL, RUBY, JAVA) I. INTRODUCTION ============ A. Audience This document is written for people who intend to build Subversion from source code. Normally, the only people who do this are Subversion developers and package maintainers. If neither of these labels fits you, we recommend you find an appropriate binary package of Subversion and install that. While the Subversion project doesn't officially release binary packages, a number of volunteers have made such packages available for different operating systems. Most Linux and BSD distributions already have Subversion packages ready to go via standard packaging channels, and other volunteers have built 'installers' for both Windows and OS X. Visit this page for package links: https://subversion.apache.org/packages.html For those of you who still wish to build from source, Subversion follows the Unix convention of "./configure && make", but it has a number of dependencies. B. Dependency Overview You'll need the following build tools to compile Subversion: * autoconf 2.59 or later (Unix only) * libtool 1.4 or later (Unix only) * a reasonable C compiler (gcc, Visual Studio, etc.) Subversion also depends on the following third-party libraries: * libapr and libapr-util (REQUIRED for client and server) The Apache Portable Runtime (APR) library provides an abstraction of operating-system level services such as file and network I/O, memory management, and so on. It also provides convenience routines for things like hashtables, checksums, and argument processing. While it was originally developed for the Apache HTTP server, APR is a standalone library used by Subversion and other products. It is a critical dependency for all of Subversion; it's the layer that allows Subversion clients and servers to run on different operating systems. * SQLite (REQUIRED for client and server) Subversion uses SQLite to manage some internal databases. * libz (REQUIRED for client and server) Subversion uses zlib for compressing binary differences. These diff streams are used everywhere -- over the network, in the repository, and in the client's working copy. * utf8proc (REQUIRED for client and server) Subversion uses utf8proc for UTF-8 support, including Unicode normalization. * Apache Serf (OPTIONAL for client) The Apache Serf library allows the Subversion client to send HTTP requests. This is necessary if you want your client to access a repository served by the Apache HTTP server. There is an alternate 'svnserve' server as well, though, and clients automatically know how to speak the svnserve protocol. Thus it's not strictly necessary for your client to be able to speak HTTP... though we still recommend that your client be built to speak both HTTP and svnserve protocols. * OpenSSL (OPTIONAL for client and server) OpenSSL enables your client to access SSL-encrypted https:// URLs (using Apache Serf) in addition to unencrypted http:// URLs. To use SSL with Subversion's WebDAV server, Apache needs to be compiled with OpenSSL as well. * Netwide Assembler (OPTIONAL for client and server) The Netwide Assembler (NASM) is used to build the (optional) assembler modules of OpenSSL. As of OpenSSL 1.1.0 NASM is the only supported assembler. * Berkeley DB (DEPRECATED and OPTIONAL for client and server) When you create a repository, you have the option of specifying a storage 'back-end' implementation. Currently, there are two options. The newer and recommended one, known as FSFS, does not require Berkeley DB. FSFS stores data in a flat filesystem. The older implementation, known as BDB, has been deprecated and is not recommended for new repositories, but is still available. BDB stores data in a Berkeley DB database. This back-end will only be available if the BDB libraries are discovered at compile time. * libsasl (OPTIONAL for client and server) If the Cyrus SASL library is detected at compile time, then the svn client (and svnserve server) will be able to utilize SASL to do various forms of authentication when speaking the svnserve protocol. * Python, Perl, Java, Ruby (OPTIONAL) Subversion is mostly a collection of C libraries with well-defined APIs, with a small collection of programs that use the APIs. If you want to build Subversion API bindings for other languages, you need to have those languages available at build time. * py3c (OPTIONAL, but REQUIRED for Python bindings) The Python 3 Compatibility Layer for C Extensions is required to build the Python language bindings. * KDE Framework 5, libsecret, GNOME Keyring (OPTIONAL for client) Subversion contains optional support for storing passwords in KWallet via KDE Framework 5 libraries (preferred) or kdelibs4, and GNOME Keyring via libsecret (preferred) or GNOME APIs. * libmagic (OPTIONAL) If the libmagic library is detected at compile time, it will be used to determine mime-types of binary files which are added to version control. Note that mime-types configured via auto-props or the mime-types-file option take precedence. C. Dependencies in Detail Subversion depends on a number of third party tools and libraries. Some of them are only required to run a Subversion server; others are necessary just for a Subversion client. This section explains what other tools and libraries will be required so that Subversion can be built with the set of features you want. On Unix systems, the './configure' script will tell you if you are missing the correct version of any of the required libraries or tools, so if you are in a real hurry to get building, you can skip straight to section II. If you want to gather the pieces you will need before starting out, however, you should read the following. If you're just installing a Subversion client, the Subversion team has created a script that downloads the minimal prerequisite libraries (Apache Portable Runtime, Sqlite, and Zlib). The script, 'get-deps.sh', is available in the same directory as this file. When run, it will place 'apr', 'apr-util', 'serf', 'zlib', and 'sqlite-amalgamation' directories directly into your unpacked Subversion distribution. With the exception of sqlite-amalgamation, they will still need to be configured, built and installed explicitly, and Subversion's own configure script may need to be told where to find them, if they were not installed in standard system locations. Note: there are optional dependencies (such as OpenSSL, swig, and httpd) which get-deps.sh does not download. Note: Because previous builds of Subversion may have installed older versions of these libraries, you may want to run some of the cleanup commands described in section II.B before installing the following. 1. Apache Portable Runtime 1.4 or newer (REQUIRED) Whenever you want to build any part of Subversion, you need the Apache Portable Runtime (APR) and the APR Utility (APR-util) libraries. If you do not have a pre-installed APR and APR-util, you will need to get these yourself: https://apr.apache.org/download.cgi On Unix systems, if you already have the APR libraries compiled and do not wish to regenerate them from source code, then Subversion needs to be able to find them. There are a couple of options to "./configure" that tell it where to look for the APR and APR-util libraries. By default it will try to locate the libraries using apr-config and apu-config scripts. These scripts provide all the relevant information for the APR and APR-util installations. If you want to specify the location of the APR library, you can use the "--with-apr=" option of "./configure". It should be able to find the apr-config script in the standard location under that directory (e.g. ${prefix}/bin). Similarly, you can specify the location of APR-util using the "--with-apr-util=" option to "./configure". It will look for the apu-config script relative to that directory. For example, if you want to use the APR libraries you built with the Apache httpd server, you could run: $ ./configure --with-apr=/usr/local/apache2 \ --with-apr-util=/usr/local/apache2 ... Notes on Windows platforms: * Do not use APR version 1.7.3 as that release contains a bug that makes it impossible for Subversion to use it properly. This issue only affects APR builds on Windows. This issue was fixed in APR version 1.7.4. See: https://lists.apache.org/thread/xd5t922jvb9423ph4j84rsp5fxks1k0z * If you check out APR and APR-util sources from their Subversion repository, be sure to use a native Windows SVN client (as opposed to Cygwin's version) so that the .dsp files get carriage-returns at the ends of their lines. Otherwise Visual Studio will complain that it doesn't recognize the .dsp files. Notes on Unix platforms: * If you check out APR and APR-util sources from their Subversion repository, you need to run the 'buildconf' script in each library's directory to regenerate the configure scripts and other files required for compiling the libraries. Afterwards, configure, build, and install both libraries before running Subversion's configure script. For example: $ cd apr $ ./buildconf $ ./configure <options...> $ make $ make install $ cd .. $ cd apr-util $ ./buildconf $ ./configure <options...> $ make $ make install $ cd .. 2. SQLite (REQUIRED) Subversion requires SQLite version 3.24.0 or above. You can meet this dependency several ways: * Use an SQLite amalgamation file. * Specify an SQLite installation to use. * Let Subversion find an installed SQLite. To use an SQLite-provided amalgamation, just drop sqlite3.c into Subversion's sqlite-amalgamation/ directory, or point to it with the --with-sqlite configure option. This file also ships with the Subversion dependencies distribution, or you can download it from SQLite: https://www.sqlite.org/download.html 3. Zlib (REQUIRED) Subversion's binary-differencing engine depends on zlib for compression. Most Unix systems have libz pre-installed, but if you need it, you can get it from http://www.zlib.net/ 4. utf8proc (REQUIRED) Subversion uses utf8proc for UTF-8 support. Configure will attempt to locate utf8proc by default using pkg-config and known paths. If it is installed in a non-standard location, then use: --with-utf8proc=/path/to/libutf8proc Alternatively, a copy of utf8proc comes bundled with the Subversion sources. If configure should use the bundled copy, use: --with-utf8proc=internal 5. autoconf 2.59 or newer (Unix only) This is required only if you plan to build from the latest source (see section II.B). Generally only developers would be doing this. 6. libtool 1.4 or newer (Unix only) This is required only if you plan to build from the latest source (see section II.B). Note: Some systems (Solaris, for example) require libtool 1.4.3 or newer. The autogen.sh script knows about that. 7. Apache Serf library 1.3.4 or newer (OPTIONAL) If you want your client to be able to speak to an Apache server (via a http:// or https:// URL), you must link against Apache Serf. Though optional, we strongly recommend this. In order to use ra_serf, you must install serf, and run Subversion's ./configure with the argument --with-serf. If serf is installed in a non-standard place, you should use --with-serf=/path/to/serf/install instead. Apache Serf can be obtained via your system's package distribution system or directly from https://serf.apache.org/. For more information on Apache Serf and Subversion's ra_serf, see the file subversion/libsvn_ra_serf/README. 8. OpenSSL (OPTIONAL) ### needs some updates. I think Apache Serf automagically handles ### finding OpenSSL, but we may need more docco here. and w.r.t ### zlib. The Apache Serf library has support for SSL encryption by relying on the OpenSSL library. a. Using OpenSSL on the client through Apache Serf On Unix systems, to build Apache Serf with OpenSSL, you need OpenSSL installed on your system, and you must add "--with-ssl" as a "./configure" parameter. If your OpenSSL installation is hard for Apache Serf to find, you may need to use "--with-libs=/path/to/lib" in addition. In particular, on Red Hat (but not Fedora Core) it is necessary to specify "--with-libs=/usr/kerberos" for OpenSSL to be found. You can also specify a path to the zlib library using "--with-libs". Under Windows, you can specify the paths to these libraries by passing the options --with-zlib and --with-openssl to gen-make.py. b. Using OpenSSL on the Apache server You can also add support for these features to an Apache httpd server to be used for Subversion using the same support libraries. The Subversion build system will not provide them, however. You add them by specifying parameters to the "./configure" script of the Apache Server instead. For getting SSL on your server, you would add the "--enable-ssl" or "--with-ssl=/path/to/lib" option to Apache's "./configure" script. Apache enables zlib support by default, but you can specify a nonstandard location for the library with the "--with-z=/path/to/dir" option. Consult the Apache documentation for more details, and for other modules you may wish to install to enhance your Subversion server. If you don't already have it, you can get a copy of OpenSSL, including instructions for building and packaging on both Unix systems and Windows, at: https://www.openssl.org/ 9. Berkeley DB 4.X (DEPRECATED and OPTIONAL) You need the Berkeley DB libraries only if you are building a Subversion server that supports the older BDB repository storage back-end, or a Subversion client that can access local BDB repositories via the file:// URI scheme. The BDB back-end has been deprecated and is not recommended for new repositories. BDB may be removed in Subversion 2.0. We recommend the newer FSFS back-end for all new repositories. FSFS does not require the Berkeley DB libraries. If in doubt, the 'svnadmin info' command, added in Subversion 1.9, can identify whether an existing repository uses BDB or FSFS. The current recommended version of Berkeley DB is 4.4.20 or newer, which brings auto-recovery functionality to the Berkeley DB database environment. If you must use an older version of Berkeley DB, we *strongly* recommend using 4.3 or 4.2 over the 4.1 or 4.0 versions. Not only are these significantly faster and more stable, but they also enable Subversion repositories to automatically clean up database journal files to save disk space. You'll need Berkeley DB installed on your system. You can get it from: http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html If you have Berkeley DB installed in a place not searched by default for includes and libraries, add something like this: --with-berkeley-db=db.h:/usr/local/include/db4.7:/usr/local/lib/db4.7:db-4.7 to your `configure' switches, and the build process will use the Berkeley DB header and library in the named directories. You may need to use a different path, of course. Note that in order for the detection to succeed, the dynamic linker must be able to find the libraries at configure time. 10. Cyrus SASL library (OPTIONAL) If the Simple Authentication and Security Layer (SASL) library is detected on your system, then the Subversion client and svnserve server can utilize its abilities for various forms of authentication. To learn more about SASL or to get the source code, visit: http://freshmeat.net/projects/cyrussasl/ 11. Apache Web Server 2.2.X or newer (OPTIONAL) (https://httpd.apache.org/download.cgi) The Apache httpd server is one of two methods to make your Subversion repository available over a network - the other is a custom server program called svnserve, which requires no extra software packages. Building Subversion, the Apache server, and the modules that Apache needs to communicate with Subversion are complicated enough that there is a whole section at the end of this document that describes how it is done: See section III for details. 12. Python 3.x or newer (https://www.python.org/) (OPTIONAL) Subversion does not require Python for its basic operation. However, Python is required for building and testing Subversion and for using Subversion's SWIG Python bindings or hook scripts coded in Python. The majority of Subversion's test suite is written in Python, as is part of Subversion's build system. In more detail, Python is required to do any of the following: * Use the SWIG Python bindings. * Use the ctypes Python bindings. * Use hook scripts coded in Python. * Build Subversion from a tarball on Unix-like systems and run Subversion's test suite as described in section II.B. * Build Subversion on Windows as described in section II.E. * Build Subversion from a working copy checked out from Subversion's own repository (whether or not running the test suite). * Build the SWIG Python bindings. * Build the ctypes Python bindings. * Testing as described in section III.D. The Python bindings are used by: * Third-party programs (e.g., ViewVC) * Scripts distributed with Subversion itself in the tools/ subdirectory. * Any in-house scripts you may have. Python is NOT required to do any of the following: * Use the core command-line binaries (svn, svnadmin, svnsync, etc.) * Use Subversion's C libraries. * Use any of Subversion's other language bindings. * Build Subversion from a tarball on Unix-like systems without running Subversion's test suite Although this section calls for Python 3.x, Subversion still technically works with Python 2.7. However, Support for Python 2.7 is being phased out. As of 1 January 2020, Python 2.7 has reached end of life. All users are strongly encouraged to move to Python 3. Note: If you are using a Subversion distribution tarball and want to build the Python bindings for Python 2, you should rebuild the build environment in non-release mode by running 'sh autogen.sh' before running the ./configure script; see section II.B for more about autogen.sh. 13. Perl 5.8 or newer (Windows only) (OPTIONAL) To build Subversion under any of the MS Windows platforms, you will also need Perl 5.8 or newer to run apr-util's w32locatedb.pl script. 14. pkg-config (Unix only, OPTIONAL) Subversion uses pkg-config to find appropriate options used at build time. 15. D-Bus (Unix only, OPTIONAL) D-Bus is a message bus system. D-Bus is required for support for KWallet and GNOME Keyring. pkg-config is needed to find D-Bus headers and library. 16. Qt 5 or Qt 4 (Unix only, OPTIONAL) Qt is a cross-platform application framework. QtCore, QtDBus and QtGui modules are required for support for KWallet. pkg-config is needed to find Qt headers and libraries. 17. KDE 5 Framework libraries or KDELibs 4 (Unix only, OPTIONAL) Subversion contains optional support for storing passwords in KWallet. Subversion will look for KF5Wallet, KF5CoreAddons, KF5I18n APIs by default, and needs kf5-config to find them. The KDELibs 4 api is also supported. KDELibs contains core KDE libraries. Subversion uses libkdecore and libkdeui libraries when support for KWallet is enabled. kde4-config is used to get some necessary options. pkg-config, D-Bus and Qt 4 are also required. If you want to build support for KWallet, then pass the '--with-kwallet' option to `configure`. If KDE is installed in a non-standard prefix, then use: --with-kwallet=/path/to/KDE/prefix 18. GLib 2 (Unix only, OPTIONAL) GLib is a general-purpose utility library. GLib is required for support for GNOME Keyring. pkg-config is needed to find GLib headers and library. 19. GNOME Keyring (Unix only, OPTIONAL) Subversion contains optional support for storing passwords in GNOME Keyring. pkg-config is needed to find GNOME Keyring headers and library. D-Bus and GLib are also required. If you want to build support for GNOME Keyring, then pass the '--with-gnome-keyring' option to `configure`. 20. Ctypesgen (OPTIONAL) Ctypesgen is Python wrapper generator for ctypes. It is used to generate a part of Subversion Ctypes Python bindings (CSVN). If you want to build CSVN, then pass the '--with-ctypesgen' option to `configure`. If ctypesgen.py is installed in a non-standard place, then use: --with-ctypesgen=/path/to/ctypesgen.py For more information on CSVN, see subversion/bindings/ctypes-python/README. 21. libmagic (OPTIONAL) Subversion's configure script attempts to find libmagic automatically. If it is installed in a non-standard location, then use: --with-libmagic=/path/to/libmagic/prefix The files include/magic.h and lib/libmagic.so.1.0 (or similar) are expected beneath this prefix directory. If they cannot be found Subversion will be compiled without support for libmagic. If libmagic is installed but support for it should not be compiled in, then use: --with-libmagic=no If configure should fail when libmagic is not present, but only the default locations should be searched, then use: --with-libmagic 22. LZ4 (OPTIONAL) Subversion uses LZ4 compression library version r129 or above. Configure will attempt to locate the system library by default using pkg-config and known paths. If it is installed in a non-standard location, then use: --with-lz4=/path/to/liblz4 If configure should use the version bundled with the sources, use: --with-lz4=internal 23. py3c (OPTIONAL) Subversion uses the Python 3 Compatibility Layer for C Extensions (py3c) library when building the Python language bindings. As py3c is a header-only library, it is needed only to build the bindings, not to use them. Configure will attempt to locate py3c by default using pkg-config and known paths. If it is installed in a non-standard location, then use: --with-py3c=/path/to/py3c/prefix The library can be downloaded from GitHub: https://github.com/encukou/py3c On Unix systems, you can also use the provided get-deps.sh script to download py3c and several other dependencies; see the top of section I.C for more about get-deps.sh. D. Documentation The primary documentation for Subversion is the free book "Version Control with Subversion", a.k.a. "The Subversion Book", obtainable from https://svnbook.red-bean.com/. Various additional documentation exists in the doc/ subdirectory of the Subversion source. See the file doc/README for more information. II. INSTALLATION ============ Subversion support three different build systems: - Autoconf/make, for Unix builds - Visual Studio vcproj, for Windows builds - CMake, for both Unix and Windows The first two have been in use since 2001. Sections A-E below describe the classic build system. The CMake build system was created in 2024 and is still under development. It will be included in Subversion 1.15 and is expected to be the default build system starting with Subversion 1.16. Section F below describes the CMake build system. A. Building from a Tarball ------------------------------ 1. Building from a Tarball Download the most recent distribution tarball from: https://subversion.apache.org/download/ Unpack it, and use the standard GNU procedure to compile: $ ./configure $ make # make install You can also run the full test suite by running 'make check'. Even in successful runs, some tests will report XFAIL; that is normal. Failed runs are indicated by FAIL or XPASS results, or a non-zero exit code from "make check". B. Building the Latest Source under Unix ------------------------------------- These instructions assume you have already installed Subversion and checked out a working copy of Subversion's own code -- either the latest /trunk code, or some branch or tag. You also need to have already installed whatever prerequisites that version of Subversion requires (if you haven't, the ./configure step should complain). You can discard the directory created by the tarball; you're about to build the latest, greatest Subversion client. This is the procedure Subversion developers use. First off, if you have any Subversion libraries lying around from previous 'make installs', clean them up first! # rm -f /usr/local/lib/libsvn* # rm -f /usr/local/lib/libapr* # rm -f /usr/local/lib/libserf* Start the process by running "autogen.sh": $ sh ./autogen.sh This script will make sure you have all the necessary components available to build Subversion. If any are missing, you will be told where to get them from. (See the 'Dependency Overview' in section I.) Note: if the command "autoconf" on your machine does not run autoconf 2.59 or later, but you do have a new enough autoconf available, then you can specify the correct one with the AUTOCONF variable. (The AUTOHEADER variable is similar.) This may be required on Debian GNU/Linux, where "autoconf" is actually a Perl script that attempts to guess which version is required -- because of the interaction between Subversion's and APR's configuration systems, the Perl script may get it wrong. So for example, you might need to do: $ AUTOCONF=autoconf2.59 sh ./autogen.sh Once you've prepared the working copy by running autogen.sh, just follow the usual configuration and build procedure: $ ./configure $ make # make install (Optionally, you might want to pass --enable-maintainer-mode to the ./configure script. This enables debugging symbols in your binaries (among other things) and most Subversion developers use it.) Since the resulting binary depends on shared libraries, the destination library directory must be identified in your operating system's library search path. That is in either /etc/ld.so.conf or $LD_LIBRARY_PATH for Linux systems and in /etc/rc.conf for FreeBSD, followed by a run of the 'ldconfig' program. Check your system documentation for details. By identifying the destination directory, Subversion will be able to dynamically load repository access plugins. If you try to do a checkout and see an error like: subversion/libsvn_ra/ra_loader.c:209: (apr_err=170000) svn: Unrecognized URL scheme 'https://svn.apache.org/repos/asf/subversion/trunk' It probably means that the dynamic loader/linker can't find all of the libsvn_* libraries. C. Building under Unix in Different Directories -------------------------------------------- It is possible to configure and build Subversion on Unix in a directory other than the working copy. For example $ svn co https://svn.apache.org/repos/asf/subversion/trunk svn $ cd svn $ # get SQLite amalgamation if required $ chmod +x autogen.sh $ ./autogen.sh $ mkdir ../obj $ cd ../obj $ ../svn/configure [...with options as appropriate...] $ make puts the Subversion working copy in the directory svn and builds it in a separate, parallel directory obj. Why would you want to do this? Well there are a number of reasons... * You may prefer to avoid "polluting" the working copy with files generated during the build. * You may want to put the build directory and the working copy on different physical disks to improve performance. * You may want to separate source and object code and only backup the source. * You may want to remote mount the working copy on multiple machines, and build for different machines from the same working copy. * You may want to build multiple configurations from the same working copy. The last reason above is possibly the most useful. For instance you can have separate debug and optimized builds each using the same working copy. Or you may want a client-only build and a client-server build. Using multiple build directories you can rebuild any or all configurations after an edit without the need to either clean and reconfigure, or identify and copy changes into another working copy. D. Installing from a Zip or Installer File under Windows ----------------------------------------------------- Of all the ways of getting a Subversion client, this is the easiest. Download a Zip or self-extracting installer via: https://subversion.apache.org/packages.html#windows For a Zip file extract the DLLs and EXEs to a directory of your choice. Included in the download are among other tools the SVN client, the SVNADMIN administration tool and the SVNLOOK reporting tool. You may want to add the bin directory in the Subversion folder to your PATH environment variable so as to not have to use the full path when running Subversion commands. To test the installation, open a DOS box (run either "cmd" or "command" from the Start menu's "Run..." menu option), change to the directory you installed the executables into, and run: C:\test>svn co https://svn.apache.org/repos/asf/subversion/trunk svn This will get the latest Subversion sources and put them into the "svn" subdirectory. If using a self-extracting .exe file, just run it instead of unzipping it, to install Subversion. E. Building the Latest Source under Windows ---------------------------------------- E.1 Prerequisites * Microsoft Visual Studio. Any recent (2005+) version containing the Visual C++ component will work (E.g. Professional, Express, Community Edition). Make sure you enable C++ support during setup. * Python 2.7 or higher, downloaded from https://www.python.org/ which is used to generate the project files. * Perl 5.8 or higher from https://www.perl.org/get.html * Awk is needed to compile Apache. Source code is available in tools\dev\awk, run the buildwin.bat program to compile. * Apache apr, apr-util, and optionally apr-iconv libraries, version 1.4 or later (1.2 for apr-iconv). If you are building from a Subversion checkout and have not downloaded Apache 2, then get these 3 libraries from https://www.apache.org/dist/apr/. * SQLite 3.24.0 or higher from https://www.sqlite.org/download.html (3.39.4 or higher recommended) * ZLib 1.2 or higher is required and can be obtained from http://www.zlib.net/ * Either a Subversion client binary from https://subversion.apache.org/packages.html to do the initial checkout of the Subversion source or the zip file source distribution. Additional Options * [Optional] Apache Httpd 2 source, downloaded from https://httpd.apache.org/download.cgi, these instructions assume version 2.0.58. This is only needed for building the Subversion server Apache modules. ### FIXME Apache 2.2 or greater required. * [Optional] Berkeley DB for backend support of the server components are available from http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/downloads/index-082944.html (Version 4.4.20 or in specific cases some higher version recommended) For more information see Section I.C.9. * [Optional] Openssl can be obtained from https://www.openssl.org/source/ * [Optional] NASM can be obtained from http://www.nasm.us/ * [Optional] A modified version of GNU libintl, called svn-win32-libintl.zip, can be used for displaying localized messages. Available at: http://subversion.tigris.org/servlets/ProjectDocumentList?folderID=2627 * [Optional] GNU gettext for generating message catalog (.mo) files from message translations. You can get the latest binaries from http://gnuwin32.sourceforge.net/. You'll need the binaries (gettext-0.14.1-bin.zip) and dependencies (gettext-0.14.1-dep.zip). E.2 Notes The Apache Serf library supports secure connections with OpenSSL and on-the-wire compression with zlib. If you want to use the secure connections feature, you should pass the option "--with-openssl" to the gen-make.py script. See Section I.C.7 for more details. E.3 Preparation This section describes how to unpack the files to make a build tree. * Make a directory SVN and cd into it. * Either checkout Subversion: svn co https://svn.apache.org/repos/asf/subversion/trunk src-trunk or unpack the zip file distribution and rename the directory to src-trunk. * Install Visual Studio Environment. You either have to tell the installer to register environment variables or run VCVARS32.BAT before building anything. If you are using a newer Visual Studio, use the 'Visual Studio 20xx Command Prompt' on the Start menu. * Install Python and add it to your path * Install Perl (it should add itself to the path) ### Subversion doesn't need perl. Only some dependencies need it (OpenSSL and some apr scripts) * Copy AWK (awk95.exe) to awk.exe (e.g. SVN\awk\awk.exe) and add the directory containing it (e.g. SVN\awk) to the path. ### Subversion doesn't need awk. Only some dependencies need it (some apr scripts) * [Optional] Install NASM and add it to your path ### Subversion doesn't need NASM. Only some dependencies need it optionally (OpenSSL) * [Optional] If you checked out Subversion from the repository and want to build Subversion with http/https access support then install the Apache Serf sources into SVN\src-trunk\serf. * [Optional] If you want BDB backend support, extract the Berkeley DB files into SVN\src-trunk\db4-win32. It's a good idea to add SVN\src-trunk\db4-win32\bin to your PATH, so that Subversion can find the Berkeley DB DLLs. [NOTE: This binary package of Berkeley DB is provided for convenience only. Please don't address questions about Berkeley DB that aren't directly related to using Subversion to the project mailing list.] If you build Berkeley DB from the source, you will have to copy the file db-x.x.x\build_win32\db.h to SVN\src-trunk\db4-win32\include, and all the import libraries to SVN\src-trunk\db4-win32\lib. Again, the DLLs should be somewhere in your path. ### Just use --with-serf instead of the hardcoded path * [Optional] If you want to build the server modules, extract Apache source into SVN\httpd-2.x.x. * If you are building from a checkout of Subversion, and you are NOT building Apache, then you will need the APR libraries. Depending on how you got your version of APR, either: - Extract the APR, APR-util and APR-iconv source distributions into SVN\apr, SVN\apr-util, and SVN\apr-iconv respectively. Or: - Extract the apr, apr-util and apr-iconv directories from the srclib folder in the Apache httpd source into SVN\apr, SVN\apr-util, and SVN\apr-iconv respectively. ### Just use --with-apr, etc. instead of the hardcoded paths * Extract the ZLib sources into SVN\zlib if you are not using the zlib included in the dependencies zip file. ### Just use --with-zlib instead of the hardcoded path * [Optional] If you want secure connection (https) client support extract OpenSSL into SVN\openssl ### And pass the path to both serf and gen-make.py * [Optional] If you want localized message support, extract svn-win32-libintl.zip into SVN\svn-win32-libintl and extract gettext-x.x.x-bin.zip and gettext-x.x.x-dep.zip into SVN\gettext-x.x.x-bin. Add SVN\gettext-x.x.x-bin\bin to your path. * Download the SQLite amalgamation from https://www.sqlite.org/download.html and extract it into SVN\sqlite-amalgamation. See I.C.12 for alternatives to using the amalgamation package. E.4 Building the Binaries To build the binaries either follow these instructions. Start in the SVN directory you created. Set up the environment (commands should be one line even if wrapped here). C:>set VER=trunk C:>set DIR=trunk C:>set BUILD_ROOT=C:\SVN C:>set PYTHONDIR=C:\Python27 C:>set AWKDIR=C:\SVN\Awk C:>set ASMDIR=C:\SVN\asm C:>set SDKINC="C:\Program Files\Microsoft SDK\include" C:>set SDKLIB="C:\Program Files\Microsoft SDK\lib" C:>set GETTEXTBIN=C:\SVN\gettext-0.14.1-bin\bin C:>PATH=%PATH%;%BUILD_ROOT%\src-%DIR%\db4-win32;%ASMDIR%; %PYTHONDIR%;%AWKDIR%;%GETTEXTBIN% C:>set INCLUDE=%SDKINC%;%INCLUDE% C:>set LIB=%SDKLIB%;%LIB% OpenSSL < 1.1.0 C:>cd openssl C:>perl Configure VC-WIN32 [*] C:>call ms\do_masm C:>nmake -f ms\ntdll.mak C:>cd out32dll C:>call ..\ms\test C:>cd ..\.. *Note: Use "call ms\do_nasm" if you have nasm instead of MASM, or "call ms\do_ms" if you don't have an assembler. Also if you are using OpenSSL >= 1.0.0 masm is no longer supported. You will have to use do_nasm or do_ms in this case. OpenSSL >= 1.1.0 C:>cd openssl C:>perl Configure VC-WIN32 C:>nmake C:>nmake test C:>cd .. Apache 2 This step is only required for building the server dso modules. ### FIXME Apache 2.2 or greater required. Old build instructions for VC6. C:>set APACHEDIR=C:\Program Files\Apache Group\Apache2 C:>msdev httpd-2.0.58\apache.dsw /MAKE "BuildBin - Win32 Release" APR If you downloaded APR / APR-UTIL / APR_ICONV by source, you will have to build these libraries first. Building these libraries on Windows is straight forward and in most cases as simple as issuing these two commands: C:>nmake -f Makefile.win C:>nmake -f Makefile.win install Please refer to the build instructions provided by the library source for actual build instructions. ZLib If you downloaded the zlib source, you will have to build ZLib first. Building ZLib using Visual Studio should be quite simple. Just open the appropriate solution and build the project zlibstat using the IDE. Please refer to the build instructions provided by the library source for actual build instructions. Note that you'd make sure to define ZLIB_WINAPI in the ZLib config header and move the lib-file into the zlib root-directory. Please note that you MUST NOT build ZLib with the included assembler optimized code. It is known to be buggy, see for example the discussion https://svn.haxx.se/dev/archive-2013-10/0109.shtml. This means that you must not define ASMV or ASMINF. Note that the VS projects in contrib\visualstudio define these in the Debug configuration. Apache Serf ### Section about Apache Serf might be required/useful to add. ### scons is required too and Apache Serf needs to be configured prior to ### be able to build Subversion using: ### scons APR=[PATH_TO_APR] APU=[PATH_TO_APU] OPENSSL=[PATH_TO_OPENSSL] ### ZLIB=[PATH_TO_ZLIB] PREFIX=[PATH_TO_SERF_DEST] ### scons check ### scons install Subversion Things to note: * If you don't want to build mod_dav_svn, omit the --with-httpd option. The zip file source distribution contains apr, apr-util and apr-iconv in the default build location. If you have downloaded the apr files yourself you will have to tell the generator where to find the APR libraries; the options are --with-apr, --with-apr-util and --with-apr-iconv. * If you would like a debug build substitute Debug for Release in the msbuild command. * There have been rumors that Subversion on Win32 can be built using the latest cygwin, you probably don't want the zip file source distribution though. ymmv. * You will also have to distribute the C runtime dll with the binaries. Also, since Apache/APR do not provide .vcproj files, you will need to convert the Apache/APR .dsp files to .vcproj files with Visual Studio before building -- just open the Apache .dsw file and answer 'Yes To All' when the conversion dialog pops up, or you can open the individual .dsp files and convert them one at a time. The Apache/APR projects required by Subversion are: apr-util\libaprutil.dsp, apr\libapr.dsp, apr-iconv\libapriconv.dsp, apr-util\xml\expat\lib\xml.dsp, apr-iconv\ccs\libapriconv_ccs_modules.dsp, and apr-iconv\ces\libapriconv_ces_modules.dsp. * If the server dso modules are being built and tested Apache must not be running or the copy of the dso modules will fail. C:>cd src-%DIR% If Apache 2 has been built and the server modules are required then gen-make.py will already have been run. If the source is from the zip file, Apache 2 has not been built so gen-make.py must be run: C:>python gen-make.py --vsnet-version=20xx --with-berkeley-db=db4-win32 --with-openssl=..\openssl --with-zlib=..\zlib --with-libintl=..\svn-win32-libintl Then build subversion: C:>msbuild subversion_vcnet.sln /t:__MORE__ /p:Configuration=Release C:>cd .. The binaries have now been built. E.5 Packaging the binaries You now need to copy the binaries ready to make the release zip file. You also need to do this to run the tests as the new binaries need to be in your path. You can use the build/win32/make_dist.py script in the Subversion source directory to do that. [TBD: Describe how to do this. Note dependencies on zip, jar, doxygen.] E.6 Testing the Binaries [TBD: It's been a long, long while since it was necessary to move binaries around for testing. win-tests.py does that automagically. Fix this section accordingly, and probably reorder, putting the packaging at the end.] The build process creates the binary test programs but it does not copy the client tests into the release test area. C:>cd src-%DIR% C:>mkdir Release\subversion\tests\cmdline C:>xcopy /S /Y subversion\tests\cmdline Release\subversion\tests\cmdline If the server dso modules have been built then copy the dso files and dlls into the Apache modules directory. C:>copy Release\subversion\mod_dav_svn\mod_dav_svn.so "%APACHEDIR%"\modules C:>copy Release\subversion\mod_authz_svn\mod_authz_svn.so "%APACHEDIR%"\modules C:>copy svn-win32-%VER%\bin\intl.dll "%APACHEDIR%\bin" C:>copy svn-win32-%VER%\bin\iconv.dll "%APACHEDIR%\bin" C:>copy svn-win32-%VER%\bin\libdb42.dll "%APACHEDIR%\bin" C:>cd .. Put the svn-win32-trunk\bin directory at the start of your path so you run the newly built binaries and not another version you might have installed. Then run the client tests: C:>PATH=%BUILD_ROOT%\svn-win32-%VER%\bin;%PATH% C:>cd src-%DIR% C:>python win-tests.py -c -r -v If the server dso modules were built configure Apache to use the mod_dav_svn and mod_authz_svn modules by making sure these lines appear uncommented in httpd.conf: LoadModule dav_module modules/mod_dav.so LoadModule dav_fs_module modules/mod_dav_fs.so LoadModule dav_svn_module modules/mod_dav_svn.so LoadModule authz_svn_module modules/mod_authz_svn.so And further down the file add location directives to point to the test repositories. Change the paths to the SVN directory you created (paths should be on one line even if wrapped here): <Location /svn-test-work/repositories> DAV svn SVNParentPath C:/SVN/src-trunk/Release/subversion/tests/cmdline/ svn-test-work/repositories </Location> <Location /svn-test-work/local_tmp/repos> DAV svn SVNPath c:/SVN/src-trunk/Release/subversion/tests/cmdline/ svn-test-work/local_tmp/repos </Location> Then restart Apache and run the tests: C:>python win-tests.py -c -r -v -u http://localhost C:>cd .. F. Building using CMake -------------------- Get the sources, either a release tarball or by checking out the official repository. The CMake build system currently only exists in /trunk and it will be included in the 1.15 release. The process for building on Unix and Windows is the same. $ python gen-make.py -t cmake $ cmake -B out [build options] $ cmake --build out "out" in the commands above is the build directory used by CMake. Build options can be added, for example: $ cmake -B out -DCMAKE_INSTALL_PREFIX=/usr/local/subversion -DSVN_ENABLE_RA_SERF=ON Build options can be listed using: $ cmake -LH Windows tricks: - Modern versions of Microsoft Visual Studio provide support for CMake projects out-of-box, including intellisense, integrated options editor, test explorer, and more. In order to use it for Subversion, open the source directory with Visual Studio, and the configuration should start automatically. For editing the cache (options), do right-click to the CMakeLists.txt file and clicking `CMake Settings for Subversion` will open the editor. After the required settings are configured, hit `F7` in order to build. For more info, check the article bellow: https://learn.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio - There is a useful tool for bootstrapping the dependencies, vcpkg. It provides ports for the most of the Subversion's dependencies, which then could be installed via a single command. To start using it, download the registry from GitHub, bootstrap vcpkg, and install the dependencies: $ git clone https://github.com/microsoft/vcpkg $ cd vcpkg && .\bootstrap-vcpkg.bat -disableMetrics $ .\vcpkg install apr apr-util expat zlib sqlite3 [any other dependency] After this is done, vcpkg can be integrated into CMake by passing the vcpkg toolchain to CMAKE_TOOLCHAIN_FILE option. In order to do it with Visual Studio, open the CMake cache editor as explained in the previous step, and put the following into `CMake toolchain file` field, where VCPKG_ROOT is the path to vcpkg registry: <VCPKG_ROOT>/scripts/buildsystems/vcpkg.cmake III. BUILDING A SUBVERSION SERVER ============================ Subversion has two servers you can choose from: svnserve and Apache. svnserve is a small, lightweight server program that is automatically compiled when you build Subversion's source. Apache is a more heavyweight HTTP server, but tends to have more features. This section primarily focuses on how to build Apache and the accompanying mod_dav_svn server module for it. If you plan to use svnserve instead, jump right to section E for a quick explanation. A. Setting Up Apache Httpd ----------------------- 1. Obtaining and Installing Apache Httpd 2 Subversion tries to compile against the latest released version of Apache httpd 2.2+. The easiest thing for you to do is download a source tarball of the latest release and unpack that. If you have questions about the Apache httpd 2.2 build, please consult the httpd install documentation: https://httpd.apache.org/docs-2.2/install.html At the top of the httpd tree: $ ./buildconf $ ./configure --enable-dav --enable-so --enable-maintainer-mode The first arg says to build mod_dav. The second arg says to enable shared module support which is needed for a typical compile of mod_dav_svn (see below). The third arg says to include debugging information. If you built Subversion with --enable-maintainer-mode, then you should do the same for Apache; there can be problems if one was compiled with debugging and the other without. Note: if you have multiple db versions installed on your system, Apache might link to a different one than Subversion, causing failures when accessing the repository through Apache. To prevent this from happening, you have to tell Apache which db version to use and where to find db. Add --with-dbm=db4 and --with-berkeley-db=/usr/local/BerkeleyDB.4.2 to the configure line. Make sure this is the same db as the one Subversion uses. This note assumes you have installed Berkeley DB 4.2.52 at its default locations. For more info about the db requirement, see section I.C.9. You may also want to include other modules in your build. Add --enable-ssl to turn on SSL support, and --enable-deflate to turn on compression support, for example. Consult the Apache documentation for more details. All instructions below assume you configured Apache to install in its default location, /usr/local/apache2/; substitute appropriately if you chose some other location. Compile and install apache: $ make && make install B. Making and Installing the Subversion Apache Server Module --------------------------------------------------------- Go back into your subversion working copy and run ./autogen.sh if you need to. Then, assuming Apache httpd 2.2 is installed in the standard location, run: $ ./configure Note: do *not* configure subversion with "--disable-shared"! mod_dav_svn *must* be built as a shared library, and it will look for other libsvn_*.so libraries on your system. If you see a warning message that the build of mod_dav_svn is being skipped, this may be because you have Apache httpd 2.x installed in a non-standard location. You can use the "--with-apxs=" option to locate the apxs script: $ ./configure --with-apxs=/usr/local/apache2/bin/apxs Note: it *is* possible to build mod_dav_svn as a static library and link it directly into Apache. Possible, but painful. Stick with the shared library for now; if you can't, then ask. $ rm /usr/local/lib/libsvn* If you have old subversion libraries sitting on your system, libtool will link them instead of the `fresh' ones in your tree. Remove them before building subversion. $ make clean && make && make install After the make install, the Subversion shared libraries are in /usr/local/lib/. mod_dav_svn.so should be installed in /usr/local/libexec/ (or elsewhere, such as /usr/local/apache2/modules/, if you passed --with-apache-libexecdir to configure). Section II.E explains how to build the server on Windows. C. Configuring Apache Httpd for Subversion --------------------------------------- The following section is an abbreviated version of the information in the Subversion Book (https://svnbook.red-bean.com). Please read chapter 6 for more details. The following assumes you have already created a repository. For documentation on how to do that, see README. The following also assumes that you have modified /usr/local/apache2/conf/httpd.conf to reflect your setup. At a minimum you should look at the User, Group and ServerName directives. Full details on setting up apache can be found at: https://httpd.apache.org/docs-2.2/ First, your httpd.conf needs to load the mod_dav_svn module. If you pass --enable-mod-activation to Subversion's configure, 'make install' target should automatically add this line for you. In any case, if Apache HTTPD gives you an error like "Unknown DAV provider: svn", then you may want to verify that this line exists in your httpd.conf: LoadModule dav_svn_module modules/mod_dav_svn.so NOTE: if you built mod_dav as a dynamic module as well, make sure the above line appears after the one that loads mod_dav.so. Next, add this to the *bottom* of your httpd.conf: <Location /svn/repos> DAV svn SVNPath /absolute/path/to/repository </Location> This will give anyone unrestricted access to the repository. If you want limited access, read or write, you add these lines to the Location block: AuthType Basic AuthName "Subversion repository" AuthUserFile /my/svn/user/passwd/file And: a) For a read/write restricted repository: Require valid-user b) For a write restricted repository: <LimitExcept GET PROPFIND OPTIONS REPORT> Require valid-user </LimitExcept> c) For separate restricted read and write access: AuthGroupFile /my/svn/group/file <LimitExcept GET PROPFIND OPTIONS REPORT> Require group svn_committers </LimitExcept> <Limit GET PROPFIND OPTIONS REPORT> Require group svn_committers Require group svn_readers </Limit> ### FIXME Tutorials section refers to old 2.0 docs These are only a few simple examples. For a complete tutorial on Apache access control, please consider taking a look at the tutorials found under "Security" on the following page: https://httpd.apache.org/docs-2.0/misc/tutorials.html In order for 'svn cp' to work (which is actually implemented as a DAV COPY command), mod_dav needs to be able to determine the hostname of the server. A standard way of doing this is to use Apache's ServerName directive to set the server's hostname. Edit your /usr/local/apache2/conf/httpd.conf to include: ServerName svn.myserver.org If you are using virtual hosting through Apache's NameVirtualHost directive, you may need to use the ServerAlias directive to specify additional names that your server is known by. If you have configured mod_deflate to be in the server, you can enable compression support for your repository by adding the following line to your Location block: SetOutputFilter DEFLATE NOTE: If you are unfamiliar with an Apache directive, or not exactly sure about what it does, don't hesitate to look it up in the documentation: https://httpd.apache.org/docs-2.2/mod/directives.html. NOTE: Make sure that the user 'nobody' (or whatever UID the httpd process runs as) has permission to read and write the Berkeley DB files! This is a very common problem. D. Running and Testing ------------------- Fire up apache 2: $ /usr/local/apache2/bin/apachectl stop $ /usr/local/apache2/bin/apachectl start Check /usr/local/apache2/logs/error_log to make sure it started up okay. Try doing a network checkout from the repository: $ svn co http://localhost/svn/repos wc The most common reason this might fail is permission problems reading the repository db files. If the checkout fails, make sure that the httpd process has permission to read and write to the repository. You can see all of mod_dav_svn's complaints in the Apache error logfile, /usr/local/apache2/logs/error_log. To run the regression test suite for networked Subversion, see the instructions in subversion/tests/cmdline/README. For advice about tracing problems, see "Debugging the server" in https://subversion.apache.org/docs/community-guide/. E. Alternative: 'svnserve' and ra_svn ----------------------------------- An alternative network layer is libsvn_ra_svn (on the client side) and the 'svnserve' process on the server. This is a simple network layer that speaks a custom protocol over plain TCP (documented in libsvn_ra_svn/protocol): $ svnserve -d # becomes a background daemon $ svn checkout svn://localhost/usr/local/svn/repository You can use the "-r" option to svnserve to set a logical root for repositories, and the "-R" option to restrict connections to read-only access. ("Read-only" is a logical term here; svnserve still needs write access to the database in this mode, but will not allow commits or revprop changes.) 'svnserve' has built-in CRAM-MD5 authentication (so you can use non-system accounts), and can also be tunneled over SSH (so you can use existing system accounts). It's also capable of using Cyrus SASL if libsasl2 is detected at ./configure time. Please read chapter 6 in the Subversion Book (https://svnbook.red-bean.com) for details on these features. IV. PROGRAMMING LANGUAGE BINDINGS (PYTHON, PERL, RUBY, JAVA) ======================================================== For Python, Perl and Ruby bindings, see the file ./subversion/bindings/swig/INSTALL For Java bindings, see the file ./subversion/bindings/javahl/README
最新发布
06-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值