| Controlling Access to Your Documents |
If you are a CSE web content provider, you may find that the default access controls for your content aren't what you need. For example, home pages default to being served only to hosts in .cs.washington.edu - most people override that default to publish more widely. To override the defaults, you need to create an access control file. This document tells you how and will even build the file for you (but you have to copy it into place).
CSE web servers support a small variety of schemes for restricting web access to your documents. This document
Go to:
[Last modified 09/30/08 at 07:04AM PDT.]
System-wide access policy is controlled by system configuration files read by the server at startup (or when a running server is signalled to re-read its configuration). Since it is impractical for an administrator to edit those files to control access to resources maintained by other content providers, there are also mechanisms to permit you, as a web page author, to control certain aspects of access to your documents.
The server we currently run on www.cs.washington.edu (Apache/2.2.8 (Fedora) DAV/2 mod_pubcookie/3.3.3 mod_ssl/2.2.8 OpenSSL/0.9.8g) permits users to control read access to their documents on a per-directory tree basis - that is, you may specify who may read all the documents in a directory and its descendants - or a per-file basis.
Our server is configured to allow the default system-wide read policy to be overridden by rules in an access control file called .htaccess. In order to allow the policy established in such a file to apply to subdirectories, the server searches for such files at each level in a URL.
For example, when the URL http://www.cs.washington.edu/homes/turing/ is served, the server looks for .htaccess files in the server root /, in /homes/, and in /homes/turing/, opening and processing each such file each and every time that URL is requested. That can impact the performance of the server, so this user-level access control is a poor choice for files that are frequently requested.
Note that the server runs without any special privilege. So, just as is necessary for any content that is served, the .htaccess file must be world-readable. This is true of any other auxiliary authorization control files.
Besides .htaccess files, there are these types of auth control files:
There are three types of auth that users may specify in an access control file, and there are directives that allow you to combine them:
In addition, we offer support of a more powerful variant of CSENetID known as pubcookie (AKA "UWNetID") authentication. This type of authentication, which is supported by UW Computing and Communications, is used only with resources accessed via HTTPS (secure HTTP) URLs, and uses your UW kerberos credentials instead of your CSE kerberos credentials. For more information, see Pubcookie.
There are a variety of methods used to access web resources. Reading a file, the typical case, uses the GET method. Sending data to a script may use POST, GET, or (rarely) PUT methods. Each of these access methods may be separately controlled (but the authorization tool below generates a single control scheme for all the methods you specify).
Auth can get pretty expensive, in part because the authentication control files are opened and parsed for each request. Consider a page in a directory with a simple access control file. This hypothetical page links to ten images that live in the same directory. Typically, only the text in the HTML file is sensitive information, but all the files in the directory are protected, and each results in a unique request. That means that the .htaccess file is opened and parsed eleven times each time the page is read.
In the case of CSENetID auth, it's even more costly, because each request results in a substantial amount of computation as the cookie is validated.
Remember, too, that auth control applies to an entire directory tree. If there are subdirectories, each and every request for resources in the tree cause I/O and computation to occur.
There are three main approaches to cheapening the use of auth:
This mechanism uses a list of hostname specifications to be denied access and a list of hostname specifications to be allowed access to decide whether to permit access to a particular host. There are two flavors to this: "first list those to be denied access, then explicitly allow access to others," and "first list those to be allowed access, then those to be denied access. These two approaches use the following incantations, respectively:
order deny,allow order allow,denySo, for example, to allow all hosts from the cs.washington.edu domain access to index.html, but no others, one specifies
<Files index.html>
order deny,allow
deny from all
allow from cs.washington.edu
</Files>
To specifically exclude hosts from hacker.org from accessing .html files, specify
<FilesMatch "\.html$">
order allow,deny
allow from all
deny from hacker.org
</FilesMatch>
N.B.: A very common error is to put a space after the comma in the order directive. That simple error will generate a server error and have the effect of denying access to all comers.
With our server, this hostname/IP-based access control is implemented by a server module called mod_access. Here is mod_access documentation.
This form will allow the creation of two commonly-needed types of hostname-based authorization files:
Fill in the form below, press Submit, then copy the temporary file we create for you in /cse/www/tmp/ to a file called .htaccess in the directory you wish to protect.
Password-based auth comes in these flavors here at CSE:
In the (very common) case where a single username is sufficient, or when you want to allow access to any user listed in the password file, only the latter (password) file is necessary. These files are conventionally called .htgroup and .htpasswd, respectively. The group file is generated with a text editor, while the password file is generated with a tool called htpasswd.
It is considered a security hazard to place these files in any directory published by the web server, because they could be analyzed by a malicious user. For example, an attempt could be made to break a password by searching a dictionary for words that encrypt to the one of the values in the password file. Nevertheless, this is how we typically manage the files at CSE, because it simplifies administration. Therefore, in the event that web resources to be controlled by basic auth are particularly sensitive, please contact the webmaster to arrange a more secure strategy, such as storing your authentication files in a directory not exported by the web server.
[Click here to see examples of CSENetID.]
[Click here to see examples of UWNetID.]
This tool creates auth control files for the following commonly-needed types of auth:
To use this tool to generate auth files for password-based auth, fill in the form below and press Submit. We'll create temporary files for you to copy to your content directory.
To combine hostname-based auth with password-based auth - in other words, to use both the Require and Allow directives - our web server software provides the Satisfy directive. Satisfy any means to permit access to those users that satisfy any of the authorization requirements associated with the content - for example, either browsing from a local host or supplying the requested username/password - while Satisfy all requires that all the authorization requirements be met - for example, both browsing from a local host and supplying the requested username/password.
Below are examples of two .htaccess files that use the satisfy any construct to control access to the URL http://www.cs.washington.edu/doggy/. In these example, users browsing from outside the .cs.washington.edu domain will be prompted for credentials, while those browsing from a local host will be allowed access without being shaken down for proof of their identities. (We don't have a tool that supports hybrid auth yet.)
| CSENetID Auth | Basic Auth |
|---|---|
order deny,allow deny from all Allow from .cs.washington.edu AuthName "Lamer" AuthType CSENetID Require valid-user Satisfy any |
order deny,allow deny from all Allow from .cs.washington.edu AuthUserFile /cse/www/doggy/.htpasswd AuthName "Lamer" AuthType Basic Require user measles Satisfy any |
Specifying all instead of any in these examples would mean that users must both browse from a local host and supply the credentials.
A word about the AuthUserFile directive: if the argument path starts with a /, it needs to be a full path to the file; otherwise, it's relative to the "server root." The server root for www.cs.washington.edu is /cse/www, so an equivalent to /cse/www/doggy/.htpasswd would be doggy/.htpasswd. The web server also understands all the locally-supported "canonical paths," such as /homes/june/, /homes/gws/, /homes/iws/, /cse/, and /projects/. The basic rule is that the web server needs to be able to find and read the file or web access to your resource will be denied to all users.
Note these relevant facts:
Consider as an example the following directory tree:
~user/www/
cseonly/
.htaccess1
content/
basic/
.htaccess2
.htpasswd
content@
We wish to restrict access to the URL http://www.cs.washington.edu/homes/user/cseonly/content/ to any user with CSENetID credentials. The contents of .htaccess1 would be:
authtype csenetid require valid-user
Note that ~user/www/basic/content is a symlink to ~user/www/cseonly/content/.
We wish to restrict access to the URL http://www.cs.washington.edu/homes/user/basic/content/ to those who know the username/password "basicly"/"ylcisab". The contents of .htaccess2 would be:
authtype basic authname "Basic Auth Required" authuserfile /homes/june/user/www/basic/.htpasswd require user basicly
(Of course, you would need to create /homes/june/user/basic/.htpasswd with an entry for user basicly first.) Because the URLs are distinct, but the content is the same, the goal of mixing password-based authentication mechanisms is met.
Comments on this file to webmaint.