Monday, May 14, 2007

Continuing the OpenAFS debugging.

The problem is that when starting up afs, the salvaging runs continuously, because fileserver with (from FileLog):

VL_RegisterAddrs rpc failed; The IP address exists on a different server; repair it
VL_RegisterAddrs rpc failed; See VLLog for details
Fatal error in library initialization, exiting!!

I'm guessing it has something to do with my server having two IP-addresses: one intranet and one Internet (it acts as a firewall/router). In order to prevent having the intranet address sent to extranet servers, I've specified the extranet address in the NetRestrict configuration file.

I'm also experiencing the strangeness reported in this post, that even though it looks like I have only one IP address (the extranet one) when I do vos listvldb, when I do changeaddr, only some of them change.

I had a similar problem recently, when the fix in the post worked, but it doesn't now.

I ran volserver in the debugger, and it seems to have a list of addresses which is incorrent.

And then, all of a sudden, it worked. The fileserver is up and running. I guess I'll have to revisit this next time I get this problem. Argh.
Everything is giving me trouble, now it's OpenAFS, latest stable version 1.4.4

One bugfix: src/bozo/bosserver.c:1109, fclose should be moved up to the first part of the preceding if statement, gives SEGFAULT if log file can't be opened.

Sunday, May 13, 2007

I'm now debugging a problem where Subversion (the Subclipse Eclipse Subversion extension) gets a PROPFILE request failed on '/' because the Appache httpd server crashes in the mod_auth_kerb authentication.

Relevant versions are : Apache 2.0.54, Heimdal 0.8.1, a hacked mod_auth_kerb 5.3, eclipse 3.2.2 and Subversion JavaHL Win32 Binaries 1.2.0 for the Eclipse plugin.

A core dump examination shows that the Heimdal GSSAPI library crashes in _gss_spnego_display_name (spnego/context_stubs.c:313) because input_name is NULL, and it tries to do a name->mech and gets a segmentation fault.

Looking at the input_name that is passed to gss_display_name gives:

(gdb) print *((struct _gss_name *) input_name)
$8 = {gn_type = {length = 0, elements = 0x84a8a00}, gn_value = {
length = 139102728, value = 0x0}, gn_mn = {slh_first = 0x75}}

This name is passed from from mod_auth_kerb, authenticate_user_gss:
major_status = gss_display_name(&minor_status, client_name, &output_token, NULL);

It in turn came from gss_accept_sec_context.

Debugging (gdb /usr/sbin/httpd, run -X), break in gss_accept_sec_context:
*src_name = GSS_C_NO_NAME

I'm tracing through to see where it gets set. Currently in:
_gss_spnego_accept_sec_context
In spnego/accept_sec_context.c:1025, *src_name is set to GSS_C_NO_NAME=NULL.
Line 1041, acceptor_start called.
gss_decapsulate_token - looks like len == 0 in gdb, but doesn't return

Trace:

acceptor_start (spnego/accept_sec_context:578) calls gss_decapsulate_token
gss_decapsulate_token (mech/gss_decapsulate_token.c:50) calls der_get_oid


Weird: second parameter to gss_decapsulate_token in spnego/accept_sec_context:578 is GSS_SPNEGO_MECHANISM, which is paramter oid. Has length 6. Returns a heim_oid with:
length=7, components[] = {1, 3, 6, 1, 5, 5, 2}, size=6

gss_decapsulate_token, line 54 calls decode_GSSAPIContextToken.
Then goes into the if (der_heim_oid_cmp(&ct.thisMech, &o) == 0)

Back in acceptor_start, calls der_match_tag_and_length, then decode_NegTokenInit.
Line 628 calls select_mech.

Goes into the "Translate broken MS Kerberos OID"? Does ret = gss_duplicate_oid(minor_status, &_gss_spnego_mskrb_mechanism_oid_desc, mech_p)? or gss_duplicate_oid(minor_status, &mech->gm_mech_oid, mech_p);?
select_mech returns 0.

Think mech_cred is set to acceptor_cred->negotiated_cred_id, but unsure.
acceptor_start calls gss_accept_sec_context (recursively).
Line 222 calls m->gm_accept_sec_context (=_gsskrb5_accept_sec_context), returns major_status 851968. Calls _gss_mg_error. Calls m->gm_display_status (=_gsskrb5_display_status).

Line 196, buf=" Miscellaneous failure (see text)"
"Decrypt integrity check failed"

Line 713: sets preferred_mech_type despite all failed?

Returns ret(1) on line 772.

Back at first gss_accept_sec_context, line 233.

New name made at line 246:
print *name
$115 = {gn_type = {length = 0, elements = 0x0}, gn_value = {length = 0,
value = 0x0}, gn_mn = {slh_first = 0x84a9638}}
Line 289, returns status 1.

Blows on line 313 of _gss_spnego_display_name ( spnego/context_stubs.c:313) because name is NULL. Called from:
#0 _gss_spnego_display_name (minor_status=0xbfad6f1c, input_name=0x0,
output_name_buffer=0xd0000, output_name_type=0xd0000)
at spnego/context_stubs.c:313
#1 0xb6fdaa01 in gss_display_name (minor_status=0xbfad6f1c,
input_name=0x84a9638, output_name_buffer=0xbfad6f28, output_name_type=0x0)
at mech/gss_display_name.c:67
#2 0xb7367f55 in authenticate_user_gss (r=0x84a6130, conf=0x8230930,
auth_line=0x84a7625 "", negotiate_ret_value=0xbfad7478)
at src/mod_auth_kerb.c:1437
#3 0xb7368834 in kerb_authenticate_user (r=0x84a6130)
at src/mod_auth_kerb.c:1614
#4 0x0809b145 in ap_run_check_user_id (r=0x84a6130) at request.c:69





acceptor_start, line 664: Call on line 653 failed.
Gets to line 698

Thursday, May 10, 2007

I'd like to get Apache 2 to access files on my AFS file system. Ideally both when no user is authenticated, and with user's credentials when a user has authenticated via Kerberos 5.

I have authentication via Kerberos 5 working with the auth_kerb_module (mod_auth_kerb.so).

I can't quite wrap my head around how ticket caches and afs PAGs (Process Authentication Groups) work.

Here's a writeup on PAGs and web server authentication.

Questions: How are PAGs related to the actual credentials? By the KRB5CCNAME environment variable? (I'm running the Heimdal Kerberos implementation).

Ideally the auth_kerb_module could for each access make sure the apache process/thread has tokens/PAG for either the authenticated user or the www kerberos/afs user.

In theory the mod would do k_hasafs to intialize library, k_setpag() to create a new (empty) PAG, get the proper kerberos 5 credentials (either for the user or Apache's srvtab), convert them to afs tickets with krb5_afslog, perform the request, and then after the request destroy the pag with k_unlog.

More to come (I hope)...