From: Andreas Gruenbacher <agruen@suse.de>,
      "Michael Kerrisk" <michael.kerrisk@gmx.net>

After Andreas Gruenbacher suggested a change to vfs_permission() 
( http://marc.theaimsgroup.com/?l=linux-kernel&m=106631710402696&w=2 ) 
and thus also the access() system call, we've done some further 
research on what happens with access() on other Unix 
implementations.  The conclusions we've reached are that on Linux 
the operation of access():

a) doesn't conform to SUSv3, and
b) isn't consistent with other Unix implementations (results from 
   various implementations are shown below).

First a little delving into SUSv3:

[[
>From the specification for <unistd.h>:

    The constants F_OK, R_OK, W_OK, and X_OK and the expressions 
    R_OK|W_OK, R_OK|X_OK, and R_OK|W_OK|X_OK shall all have 
    distinct values.

>From the specification for access():
    If any access permissions are checked, each shall be checked 
    individually, as described in the Base Definitions volume of 
    IEEE Std 1003.1-2001, Chapter 3, Definitions. If the process 
    has appropriate privileges, an implementation may indicate 
    success for X_OK even if none of the execute file permission 
    bits are set.

    ERRORS
    [EACCES] Permission bits of the file mode do not 
    permit the requested access, or search permission is 
    denied on a component of the path prefix. 

>From the rationale for access():
    In early proposals, some inadequacies in the access() 
    function led to the creation of an eaccess() function 
    because:

    1. Historical implementations of access() do not test 
       file access correctly when the process' real user 
       ID is superuser. In particular, they always return
       zero when testing execute permissions without regard 
       to whether the file is executable.

    [...]

    However, the historical model of eaccess() does not 
    resolve problem (1), so this volume of IEEE Std 
    1003.1-2001 now allows access() to behave in the desired 
    way because several implementations have corrected the 
    problem. 
    [...]

    The sentence concerning appropriate privileges and 
    execute permission bits reflects the two possibilities 
    implemented by historical implementations when checking
    superuser access for X_OK.

    New implementations are discouraged from returning X_OK 
    unless at least one execution permission bit is set.
]]

The implication of all this is that for the call:

    access(files, mode)

it is bits/masks that are relevant for the check, rather than a 
direct test for equality of the form `mode == X_OK', which is 
currently what vfs_permission() in effect implements.

Furthermore, when testing for X_OK **for a privileged process**, 
an implementation of access() is permitted two possibilities:

[a] always say that X_OK is allowed
[b] allow X_OK only if execute permission is granted to at 
    least one permission class (user/group/other) for the 
    file.  SUSv3 encourages this latter implementation.

Using a test program (see below) run under the superuser 
account, we tested what access() returns on various 
implementations when applied to a file on which all 
permissions are disabled:

----------  1 500  100  0 Oct 26 10:39 /tmp/t_access

On FreeBSD 4.8, AIX 4.3 and 5.1, and Solaris 9, we see the following:

access(/tmp/t_access, 0) returns 0
access(/tmp/t_access, R_OK) returns 0
access(/tmp/t_access, W_OK) returns 0
access(/tmp/t_access, X_OK) returns 0
access(/tmp/t_access, R_OK | W_OK) returns 0
access(/tmp/t_access, R_OK | X_OK) returns 0
access(/tmp/t_access, W_OK | X_OK) returns 0
access(/tmp/t_access, R_OK | W_OK | X_OK) returns 0

These results are consistent with interpretation [a] above.

On HP-UX 11 and Tru64 4.0f and 5.1B, we see:

access(/tmp/t_access, 0) returns 0
access(/tmp/t_access, R_OK) returns 0
access(/tmp/t_access, W_OK) returns 0
access(/tmp/t_access, X_OK) returns -1
access(/tmp/t_access, R_OK | W_OK) returns 0
access(/tmp/t_access, R_OK | X_OK) returns -1
access(/tmp/t_access, W_OK | X_OK) returns -1
access(/tmp/t_access, R_OK | W_OK | X_OK) returns -1

This is consistent with the (SUSv3 preferred) interpretation [b] above.

On Linux 2.4.21 and 2.6.0-test9, we see the following:

access(/tmp/t_access, 0) returns 0
access(/tmp/t_access, R_OK) returns 0
access(/tmp/t_access, W_OK) returns 0
access(/tmp/t_access, X_OK) returns -1
access(/tmp/t_access, R_OK | W_OK) returns 0
access(/tmp/t_access, R_OK | X_OK) returns 0
access(/tmp/t_access, W_OK | X_OK) returns 0
access(/tmp/t_access, R_OK | W_OK | X_OK) returns 0

In other words, Linux�s access() / vfs_permission() is treating the 
case �mode == X_OK� as a special case, distinct from all of the 
other bit-mask combinations that include X_OK.  Really it should 
operate using either interpretation [a] or (preferably) [b] above.  
I have appended Andreas� patch against 2.6.0-test9, which 
implements interpretation [b].  Please apply.

Cheers,

Michael Kerrisk

##### Test program #####

/* t_access_root.c */

#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>

#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"

static void
errExit(char *msg)
{
    perror(msg);
    exit(EXIT_FAILURE);
} /* errExit */

static void
accessTest(char *file, int mask, char *mstr)
{
    printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */

int
main(int argc, char *argv[])
{
    int fd, perm, uid, gid;
    char *testpath;
    char cmd[PATH_MAX + 20];

    testpath =  (argc > 1) ? argv[1]                    : TESTPATH;
    perm =      (argc > 2) ? strtoul(argv[2], NULL, 8)  : PERM;
    uid =       (argc > 3) ? atoi(argv[3])              : UID;
    gid =       (argc > 4) ? atoi(argv[4])              : GID;

    unlink(testpath);

    fd = open(testpath, O_RDWR | O_CREAT, 0);
    if (fd == -1) errExit("open");

    if (fchown(fd, uid, gid) == -1) errExit("fchown");
    if (fchmod(fd, perm) == -1) errExit("fchown");
    close(fd);

    snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
    system(cmd);

    if (seteuid(uid) == -1) errExit("seteuid");

    accessTest(testpath, 0, "0");
    accessTest(testpath, R_OK, "R_OK");
    accessTest(testpath, W_OK, "W_OK");
    accessTest(testpath, X_OK, "X_OK");
    accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
    accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
    accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
    accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");

    exit(EXIT_SUCCESS);
} /* main */




 25-akpm/fs/ext2/acl.c      |    2 +-
 25-akpm/fs/ext3/acl.c      |    2 +-
 25-akpm/fs/jfs/acl.c       |    2 +-
 25-akpm/fs/namei.c         |    2 +-
 25-akpm/fs/xfs/xfs_inode.c |    2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff -puN fs/ext2/acl.c~access-vfs_permission-fix fs/ext2/acl.c
--- 25/fs/ext2/acl.c~access-vfs_permission-fix	Wed Nov 12 11:46:11 2003
+++ 25-akpm/fs/ext2/acl.c	Wed Nov 12 11:46:11 2003
@@ -322,7 +322,7 @@ check_groups:
 
 check_capabilities:
 	/* Allowed to override Discretionary Access Control? */
-	if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
+	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
 		if (capable(CAP_DAC_OVERRIDE))
 			return 0;
 	/* Read and search granted if capable(CAP_DAC_READ_SEARCH) */
diff -puN fs/ext3/acl.c~access-vfs_permission-fix fs/ext3/acl.c
--- 25/fs/ext3/acl.c~access-vfs_permission-fix	Wed Nov 12 11:46:11 2003
+++ 25-akpm/fs/ext3/acl.c	Wed Nov 12 11:46:11 2003
@@ -327,7 +327,7 @@ check_groups:
 
 check_capabilities:
 	/* Allowed to override Discretionary Access Control? */
-	if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
+	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
 		if (capable(CAP_DAC_OVERRIDE))
 			return 0;
 	/* Read and search granted if capable(CAP_DAC_READ_SEARCH) */
diff -puN fs/jfs/acl.c~access-vfs_permission-fix fs/jfs/acl.c
--- 25/fs/jfs/acl.c~access-vfs_permission-fix	Wed Nov 12 11:46:11 2003
+++ 25-akpm/fs/jfs/acl.c	Wed Nov 12 11:46:11 2003
@@ -191,7 +191,7 @@ check_capabilities:
 	 * Read/write DACs are always overridable.
 	 * Executable DACs are overridable if at least one exec bit is set.
 	 */
-	if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
+	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
 		if (capable(CAP_DAC_OVERRIDE))
 			return 0;
 
diff -puN fs/namei.c~access-vfs_permission-fix fs/namei.c
--- 25/fs/namei.c~access-vfs_permission-fix	Wed Nov 12 11:46:11 2003
+++ 25-akpm/fs/namei.c	Wed Nov 12 11:46:11 2003
@@ -190,7 +190,7 @@ int vfs_permission(struct inode * inode,
 	 * Read/write DACs are always overridable.
 	 * Executable DACs are overridable if at least one exec bit is set.
 	 */
-	if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
+	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
 		if (capable(CAP_DAC_OVERRIDE))
 			return 0;
 
diff -puN fs/xfs/xfs_inode.c~access-vfs_permission-fix fs/xfs/xfs_inode.c
--- 25/fs/xfs/xfs_inode.c~access-vfs_permission-fix	Wed Nov 12 11:46:11 2003
+++ 25-akpm/fs/xfs/xfs_inode.c	Wed Nov 12 11:46:11 2003
@@ -3722,7 +3722,7 @@ xfs_iaccess(
 	 * Read/write DACs are always overridable.
 	 * Executable DACs are overridable if at least one exec bit is set.
 	 */
-	if ((orgmode & (S_IRUSR|S_IWUSR)) || (inode->i_mode & S_IXUGO))
+	if (!(orgmode & S_IXUSR) || (inode->i_mode & S_IXUGO))
 		if (capable_cred(cr, CAP_DAC_OVERRIDE))
 			return 0;
 

_