[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with iprop



As a suggestion for what to test: 1) disk full conditions on the /var  
partition, and 2) (intermittent?) network problems.  I also once saw  
a problem with a slave feed back to the master and cause excess  
growth in the master's (binary change) log file.  The master's state  
in turn caused some different anomalous behavior on the other slaves.

I apologize for not being able to give useful debugging information,  
but that's what I have.  We will love you for fixing the problems,  
but we understand that these second-order error conditions are hard  
to fix.  I've spent more time than I wanted reading the iprop code  
and I can only say that the basic logic is very good.  Of course more  
comprehensive error checking and logging might be helpful, but that's  
almost as open ended as asking for bug fixes for all the remaining  
bugs.  ;-)

As a side note one of the UMICH guys cornered me after my discussion  
of these issues at SLAC and said they had seen all the same problems  
with their own incremental prop solution for MIT.

On Jul 26, 2007, at 12:39 PM, Love Hörnquist Åstrand wrote:

> Hello,
>
> I'm sorry iprop is in such sorry state. I going to look it over in  
> the next few months, and would be very happy if there was ways I  
> can reproduce the problems you are seeing that.
>
> If you can't, I've got to start to run in on a large production  
> realm close to me and see if that reproduce the problems for me.
>
> Love
>
>
> 25 jul 2007 kl. 09.55 skrev Dr A V Le Blanc:
>
>> Until recently I've been running heimdal 0.7.2.dfsg.1-10 on Debian
>> etch systems, and I have had occasional problems with iprop, which
>> have been getting worse.
>>
>> First, I am getting thousands of error messages on the slaves:
>>
>>      ipropd-slave[8760]: kadm5_log_replay: 2469: Entry already  
>> exists in database
>>
>> There are so many of these that the disks are filling up; for  
>> example,
>> yesterday at 06:25:32 there were 172 of them, at 06:25:33 there were
>> 457, and at 06:25:34 there were 475.  These message are in /var/ 
>> log/auth.log.
>> Moreover the binary log file in /var/lib/heimdal-kdc/log seems to  
>> grow
>> enormously large on the slave machines, filling the 4gb partition in
>> a few days.  Moreover, on the master machine, the ipropd-master  
>> process
>> keeps getting killed by the kernel, which logs this message in
>> /var/log/kern.log:
>>
>>      kernel: Out of memory: Killed process 1545 (ipropd-master)
>>
>> Since the database syncronisation gets lost so frequently, I have
>> cron jobs which check every ten minutes on all machines and
>> restart the iprop master or slave processes, at least if they are
>> not running; for example, this script gets run every ten minutes
>> on the slave:
>>
>>      if [ ! -r /var/run/heimdal-kdc.pid ];then exit;fi
>>      if [ ! -r /proc/`cat /var/run/heimdal-kdc.pid`/stat ];then  
>> exit 0;fi
>>      if [ -r /var/run/ipropd-slave.pid ] ; then
>>        if [ -r /proc/`cat /var/run/ipropd-slave.pid`/stat ];then  
>> exit 0;fi
>>      fi
>>      . /etc/default/heimdal-kdc
>>      start-stop-daemon --start --quiet --background --make-pidfile  
>> --pidfile /var/run/ipropd-slave.pid --exec /usr/sbin/ipropd-slave  
>> -- "$SLAVE_PARAMS"
>>      exit $?
>>
>> Anyway, the situation is getting worse, and so I decided to backport
>> the available heimdal 0.8.1 Debian packages to etch and to try those.
>> Now the master iprop process is dying without giving an error  
>> message,
>> but the logs are filling up with messages like this:
>>
>>      ipropd-master[5151]: send_diffs: failed to find previous  
>> entry: kadm5_log_previous: log entry have consistency failure,  
>> length wrong
>>
>> Clearly, whatever else, neither version of iprop is succeeding in  
>> playing
>> the log messages properly on the slaves.  Has anyone any insight  
>> to offer
>> before I try reporting this as a bug on the Debian lists?  Am I  
>> missing
>> something obvious?
>>
>>      -- Owen
------------------------------------------------------------------------
The opinions expressed in this message are mine,
not those of Caltech, JPL, NASA, or the US Government.
Henry.B.Hotz@jpl.nasa.gov, or hbhotz@oxy.edu