Controlling Access to Your Documents

If you are a CSE web content provider, you may find that the default access controls for your content aren't what you need. For example, home pages default to being served only to hosts in .cs.washington.edu - most people override that default to publish more widely. To override the defaults, you need to create an access control file. This document tells you how and will even build the file for you (but you have to copy it into place).

CSE web servers support a small variety of schemes for restricting web access to your documents. This document

Go to:

[Last modified 09/30/08 at 07:04AM PDT.]

Authorization Files

System-wide access policy is controlled by system configuration files read by the server at startup (or when a running server is signalled to re-read its configuration). Since it is impractical for an administrator to edit those files to control access to resources maintained by other content providers, there are also mechanisms to permit you, as a web page author, to control certain aspects of access to your documents.

The server we currently run on www.cs.washington.edu (Apache/2.2.8 (Fedora) DAV/2 mod_pubcookie/3.3.3 mod_ssl/2.2.8 OpenSSL/0.9.8g) permits users to control read access to their documents on a per-directory tree basis - that is, you may specify who may read all the documents in a directory and its descendants - or a per-file basis.

Our server is configured to allow the default system-wide read policy to be overridden by rules in an access control file called .htaccess. In order to allow the policy established in such a file to apply to subdirectories, the server searches for such files at each level in a URL.

For example, when the URL http://www.cs.washington.edu/homes/turing/ is served, the server looks for .htaccess files in the server root /, in /homes/, and in /homes/turing/, opening and processing each such file each and every time that URL is requested. That can impact the performance of the server, so this user-level access control is a poor choice for files that are frequently requested.

Note that the server runs without any special privilege. So, just as is necessary for any content that is served, the .htaccess file must be world-readable. This is true of any other auxiliary authorization control files.

Besides .htaccess files, there are these types of auth control files:

password files
When basic auth is used, user names and passwords provided by users are matched against those stored in a password file. The passwords are encrypted in the file. Conventionally, this file is called .htpasswd, a good choice because our server will not serve files starting with .ht.
group files
When you wish to allow access to a subset of users in an authentication database, you can either list the users, or specify a name of a group. The names of and members of the groups are listed in a group file, conventionally called .htgroup. Group files are most useful with CSENetID auth.
Auth Schemes

There are three types of auth that users may specify in an access control file, and there are directives that allow you to combine them:

hostname-based auth
Only users connecting from a restricted set of DNS domain names are permitted access. For example, policy could be established that permits only hosts from .cs.washington.edu access.

CSENetID auth
Users must authenticate with their CSE kerberos username and password. This requires an "SSL-enabled" browser (such as any late model of Netscape or Internet Explorer, but not Lynx) and results in a cookie that can be used to authenticate to any resource protected by CSENetID. The cookie is valid for a limited time (currently ten hours) or until the end of the browser session (whichever comes first). More information is here. This type of auth is a local extension to the web server. (NB: www.cs.washington.edu supports CSENetID authentication only on HTTP URLs, not HTTPS URLs.)

basic auth
Users are prompted by their browsers - if capable - for a username and password. These are checked against a database of users and encrypted passwords to control access. Typically, the browser will cache the username and password so that accesses to any document in the authentication realm will not require repeated authorization during that browser session. The password is passed over the wire unencrypted, which is why you should never use your login password with this mechanism.

hybrid auth
Imagine that you have content that In such cases you can combine hostname-based with basic or CSENetID authorization in either an "or" or an "and" relation to get such finer control.

In addition, we offer support of a more powerful variant of CSENetID known as pubcookie (AKA "UWNetID") authentication. This type of authentication, which is supported by UW Computing and Communications, is used only with resources accessed via HTTPS (secure HTTP) URLs, and uses your UW kerberos credentials instead of your CSE kerberos credentials. For more information, see Pubcookie.

Methods

There are a variety of methods used to access web resources. Reading a file, the typical case, uses the GET method. Sending data to a script may use POST, GET, or (rarely) PUT methods. Each of these access methods may be separately controlled (but the authorization tool below generates a single control scheme for all the methods you specify).

Keeping It Cheap

Auth can get pretty expensive, in part because the authentication control files are opened and parsed for each request. Consider a page in a directory with a simple access control file. This hypothetical page links to ten images that live in the same directory. Typically, only the text in the HTML file is sensitive information, but all the files in the directory are protected, and each results in a unique request. That means that the .htaccess file is opened and parsed eleven times each time the page is read.

In the case of CSENetID auth, it's even more costly, because each request results in a substantial amount of computation as the cookie is validated.

Remember, too, that auth control applies to an entire directory tree. If there are subdirectories, each and every request for resources in the tree cause I/O and computation to occur.

There are three main approaches to cheapening the use of auth:

  1. Keep all non-sensitive content outside of directories with auth. For example, put all your images in a distinct directory tree from your secured documents. That limits the number of times the auth control files must be read and processed.
  2. Use the <Files> and <FilesMatch> directives to limit the application of auth to truly sensitive files. That limits computation.
  3. For frequently-requested files, ask the web server administrator to add auth directives to the server configuration files. Those files are read once at server startup instead of for each request.
Hostname-based Auth

This mechanism uses a list of hostname specifications to be denied access and a list of hostname specifications to be allowed access to decide whether to permit access to a particular host. There are two flavors to this: "first list those to be denied access, then explicitly allow access to others," and "first list those to be allowed access, then those to be denied access. These two approaches use the following incantations, respectively:

  order deny,allow
  order allow,deny
So, for example, to allow all hosts from the cs.washington.edu domain access to index.html, but no others, one specifies
    <Files index.html>
     order deny,allow
     deny from all
     allow from cs.washington.edu
    </Files>

To specifically exclude hosts from hacker.org from accessing .html files, specify

    <FilesMatch "\.html$">
     order allow,deny
     allow from all
     deny from hacker.org
    </FilesMatch>

N.B.: A very common error is to put a space after the comma in the order directive. That simple error will generate a server error and have the effect of denying access to all comers.

With our server, this hostname/IP-based access control is implemented by a server module called mod_access. Here is mod_access documentation.

Hostname Authorization Tool

This form will allow the creation of two commonly-needed types of hostname-based authorization files:

Fill in the form below, press Submit, then copy the temporary file we create for you in /cse/www/tmp/ to a file called .htaccess in the directory you wish to protect.

Basename of temporary file:
Methods to be controlled:
Allow all    Allow some
Domain/hostname:
Domain/hostname:
Domain/hostname:
Domain/hostname:
  
Password-based Auth

Password-based auth comes in these flavors here at CSE:

basic auth
Basic auth depends upon one or two files (besides .htaccess) to specify which users to authorize:

In the (very common) case where a single username is sufficient, or when you want to allow access to any user listed in the password file, only the latter (password) file is necessary. These files are conventionally called .htgroup and .htpasswd, respectively. The group file is generated with a text editor, while the password file is generated with a tool called htpasswd.

It is considered a security hazard to place these files in any directory published by the web server, because they could be analyzed by a malicious user. For example, an attempt could be made to break a password by searching a dictionary for words that encrypt to the one of the values in the password file. Nevertheless, this is how we typically manage the files at CSE, because it simplifies administration. Therefore, in the event that web resources to be controlled by basic auth are particularly sensitive, please contact the webmaster to arrange a more secure strategy, such as storing your authentication files in a directory not exported by the web server.

digest
Digest auth is similar to basic auth-- there is a file of encrypted passwords on the server that is consulted to authenticate users, and the content provider provides a list of usernames from that file (or groups of usernames from that file) that are authorized to access the affected content. But unlike basic auth, the password does not cross the wire, and the passwords on the server are strongly encrypted-- two key advantages. Digest auth uses the MD5 one-way hash algorithm on both the server and client side, while basic auth uses the deprecated crypt algorithm on the server side and effectively no encryption algorithm on the client side. On the downside, some very old web browsers don't support digest auth (Netscape 4 is an example). Use the htdigest program to generate the password file. See Digest authentication in the apache documentation for details.
CSENetID
With the CSENetID scheme, users authenticate to a "web login" service on an SSL-enabled "secure" server, using the kerberos login names and passwords for their CSE accounts. There is therefore no need for a .htpassword file, though you still need to specify which users are to be granted access- either your own list of users, or the name of a predefined group such as fac_cs, grad_cs, or one of a few others for which the web server administrator has created an entry in a system-wide web groups file (/www/auth/group). You have, then, these choices:

[Click here to see examples of CSENetID.]

UWNetID
UWNetID-- developed at UW by C&C, and now an open source project called "PubCookie"--is quite similar to CSENetID. Key functional differences include: Most CSE web servers support UWNetID.

[Click here to see examples of UWNetID.]

Username/Password Authorization Tool

This tool creates auth control files for the following commonly-needed types of auth:

To use this tool to generate auth files for password-based auth, fill in the form below and press Submit. We'll create temporary files for you to copy to your content directory.

Basename of temporary file(s):
Authentication realm:

I will use basic auth: Number of usernames:
I will use CSENetID auth and specify the usernames:

I will use CSENetID auth and pre-defined groups:
I will use CSENetID auth allow access to all CSE users:

Methods to be controlled:
  
Hybrid Auth

To combine hostname-based auth with password-based auth - in other words, to use both the Require and Allow directives - our web server software provides the Satisfy directive. Satisfy any means to permit access to those users that satisfy any of the authorization requirements associated with the content - for example, either browsing from a local host or supplying the requested username/password - while Satisfy all requires that all the authorization requirements be met - for example, both browsing from a local host and supplying the requested username/password.

Below are examples of two .htaccess files that use the satisfy any construct to control access to the URL http://www.cs.washington.edu/doggy/. In these example, users browsing from outside the .cs.washington.edu domain will be prompted for credentials, while those browsing from a local host will be allowed access without being shaken down for proof of their identities. (We don't have a tool that supports hybrid auth yet.)

CSENetID Auth Basic Auth
  order deny,allow
  deny from all
  Allow from .cs.washington.edu
  AuthName "Lamer"
  AuthType CSENetID
  Require valid-user
  Satisfy any
  order deny,allow
  deny from all
  Allow from .cs.washington.edu
  AuthUserFile /cse/www/doggy/.htpasswd
  AuthName "Lamer"
  AuthType Basic
  Require user measles
  Satisfy any

Specifying all instead of any in these examples would mean that users must both browse from a local host and supply the credentials.

A word about the AuthUserFile directive: if the argument path starts with a /, it needs to be a full path to the file; otherwise, it's relative to the "server root." The server root for www.cs.washington.edu is /cse/www, so an equivalent to /cse/www/doggy/.htpasswd would be doggy/.htpasswd. The web server also understands all the locally-supported "canonical paths," such as /homes/june/, /homes/gws/, /homes/iws/, /cse/, and /projects/. The basic rule is that the web server needs to be able to find and read the file or web access to your resource will be denied to all users.

Frequently Asked Questions
How effective is "security through obscurity" at protecting my web resources? That is, if it isn't linked, can others still find it?
There are two ways that people might find unlinked resources. Firstly, local research users can often see your web files in the file system. Secondly, your URLs will appear in web logs-- certainly locally, and perhaps remotely. Our web logs are exported to all CSE research hosts as /cse/www/logs/. And, we publish a nightly logfile analysis (http://www.cs.washington.edu/usage/) that is accessible to all users with a CSE account; the most popular 200 URLs for the past week are listed in that report (here). Also, if your document links to remote resources, the URLs of your documents are likely to appear in the HTTP_REFERER fields of web logfile entries at the sites where the linked documents are hosted.
My users don't all have CSE nor UW accounts, so I'm forced to use basic auth. How can I restrict access to allow only HTTPS?
Use the SSLRequireSSL directive in your .htaccess file. Users who try to access your resource via HTTP will then get an error. Or consider using digest auth.
I have a resource to share with both CSE and non-CSE users. How can I do that?
You can't mix password-based authentication mechanisms such as basic auth, CSENetID, and UWNetID to access the same URL. So your choices are (1) using basic or digest auth, (2) restricting access to certain named hosts, (3) making the same resource appear at distinct URLs with distinct authentication policies for each. Below, I explain one way to implement option (3). (N.B.: the audience for this answer is technical users only.)

Note these relevant facts:

Consider as an example the following directory tree:

  ~user/www/
    cseonly/
      .htaccess1
      content/
    basic/
      .htaccess2
      .htpasswd
      content@

We wish to restrict access to the URL http://www.cs.washington.edu/homes/user/cseonly/content/ to any user with CSENetID credentials. The contents of .htaccess1 would be:

  authtype csenetid
  require valid-user

Note that ~user/www/basic/content is a symlink to ~user/www/cseonly/content/.

We wish to restrict access to the URL http://www.cs.washington.edu/homes/user/basic/content/ to those who know the username/password "basicly"/"ylcisab". The contents of .htaccess2 would be:

  authtype basic
  authname "Basic Auth Required"
  authuserfile /homes/june/user/www/basic/.htpasswd
  require user basicly

(Of course, you would need to create /homes/june/user/basic/.htpasswd with an entry for user basicly first.) Because the URLs are distinct, but the content is the same, the goal of mixing password-based authentication mechanisms is met.

Glossary
authentication
authentication refers to the process of establishing the identity of users.
authorization
authorization refers to the process of controlling access to resources.
authentication realm
authentication realm is a symbolic name you may give to the resource being protected. This helps the user decide what username/password to provide when basic auth is used.
cookie
An HTTP cookie is a small chunk of data that's generated by a web server and passed back and forth between the server and browser on each request within a document tree. It's primarily a way to maintain "state."
credentials
Proof of identity. For example, in the case of CSENetID auth, login name and password.
SSL
SSL is the secure socket layer, the technology used to encrypt the conversation between a web browser and a web server.
Don't Take Our Word for It

Comments on this file to webmaint.