State tables and state transition graphs

Tue Sep 8 06:02:54 CEST 2009

On Mon, Sep 7, 2009 at 10:23 PM, Teddy Hogeborn<teddy at fukt.bsnet.se> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> We're pleased that you find the project interesting.  Nominally, the
> information you seek should be available on the web site by combining
> the README file with the manual pages, which all explain in great
> detail the inner workings of both server and client, but maybe I can
> be of some help by trying to distill what you want.

I can combine the online sources available and compose a rough outline
of the system, but an accurate description requires that many details
be nailed down explicitly rather than just taken for granted, assumed,
or determined by personal/idiosyncratic interpretations.

I could also review the source code, but that would tell me what the
system actually does rather than what the developers intended it to
do.

>
> See below for my attempt at a listing of server states, but let me
> first answer your direct questions.
>
> Lee Winter <lee.j.i.winter at gmail.com> writes:
>
>> Case 1:  Client boots first.
>>     In this case I think the client simply waits for the server to
>> boot and announce itself.  The fundamental assumption is that the
>> server can handle a request from the client before it has
>> established that the client is up and running.  Is that true?
>
> Yes.

Good.

>
>> Case 2:  Server first, client boots immediately thereafter.  This is
>> quite similar to case #1, but the first message received here is the
>> client's broadcast request, while in case #1 the first message
>> received would be the server's announcement.
>
> We avoid having to deal with all this directly in our code by using
> the Avahi library (for the ZeroConf protocol).

I am quite familiar with zeroconf and avahi.  But there are lots of
ways to (mis-)use their capbilities (c.f., microso~1 window~1
implementations).  So I had to ask.

>  The Mandos server
> starts to listen on a port, and says to the Avahi library: "Hey, I'm a
> Mandos server, tell anyone who wants to find Mandos servers where to
> find me.", and the Avahi daemon then does the rest.
>
> The client, in effect, says to the Avahi library "Look for Mandos
> servers, and when you find one, call this function, OK?".  That
> function, which connects to a Mandos server and tries to get a
> password from it, will then be called once for every Mandos server
> found until the function succeeds in getting a password.  If no Mandos
> servers provide a password, the client will therefore automatically
> wait until such servers appear.
>
> So you see, Avahi does a lot of the heavy lifting for us.  We don't
> maintain any significant state ourselves for the sake of clients
> finding servers and server announcing themselves.

Right.  At the app level there is no difference between the two
states.  But from the perspective of watching the transactions over
the wire there would only be one difference, that being whose message
actually got through to start the exchange, but after which the
exchanges would be identical.

> In regards to the server keeping track of the state of clients, you
> should note that this does not affect the clients;

Good.

> Mandos clients do
> not run anything special once they have gotten the password and
> started booting off the unlocked device.  It's all done on the server,
> and the design is fairly simple:  The server periodically tries to run
> a so-called "checker" program (by default "fping").

Understood.

>From the existing documentation it is clear that you already know that
this approach is weak.  In fact by moving a cable from the client to a
pacifier machine responding to the client IP address an adversary can
give himself an unlimited amount of time to diddle with the client
before trying to boot it.

Do you have stronger alternatives in mind for the checker/heatbeat?

> If the program
> succeeds, it means that a client is alive and a timer for the
> "disabling" of that client is reset.  If the timer is allowed to run
> out, the client will be marked as "disabled", which means that it will
> subsequently be denied passwords from the server.  The timer for each
> client is constantly running, and only the periodic success of the
> checker is keeping the timer from running out and the client from
> being disabled.  Again, this has nothing to do with the Mandos client,
> which only exists in the initrd and whose job is finished after
> booting.

OK.  The doc has to explicitly state somshere that when the server
disables a client that state change is permanent and that a manual
override is required to re-enable the client.  I didn't see anything
in the doc that addressed that issue.  (So I had to ask).

>
>> Case 6.  Server boots, client boots, goes down long enough for the
>> server to notice but not long enough to trigger the timeout.  Is any
>> state lost?
>
> Each time a checker succeds, the disabling timer will be reset and the
> client will be viewed as just a good client as it was before.  Please
> note: When a checker *fails*, this does exactly *nothing* to the
> internal state of the server!  The timer will, however, not be reset,
> and it will inexorably go on ticking and eventually trigger a
> disabling of the client.  But a failing checker does not trigger a
> state change.

OK, but it might be worth logging the failure as a way to keep
explicit track of possibly surreptitious reboots (few people actually
check the uptime of their machines)

>
>> Case 7.  Server boots, client boots, goes down longer than the
>> timeout, and then reboots with manual intervention to overcome the
>> timeout.  Some time later the client reboots within the timeout
>> interval.  Does the server provide service to the client
>
> No, it is still "disabled".
>
>> or is some manual intervention necessary on the server to reset the
>> expired timeout?
>
> Yes.  In the current version (1.0.11), you'd restart the server to get
> all clients "enabled" again.  For the next major version (tentatively
> numbered "1.1"), we've implemented a D-Bus API to the server, and plan
> to provide both a command line tool and a text GUI tool to manipulate
> server internals, so you'll be able to go in and see all the clients
> and their status, re-enable clients or do "one-off" approval of
> reboots.
>
> But for now, you'll have to restart the Mandos server to enable a
> disabled client.

So every server reboot re-enables all clients?  That looks to me like
a security hole.  In connection with the simplicity of the "checker"
(in clusters this is called the hearbeat, which term you might want to
consider) I am certain that there is a hole.  And multiple servers
won't block it.  In fact they make it easier to exploit.

I know that the current threat model is not one that includes a
sophisticated seizure.  But it needs to.  When non-seizable servers
were first described on sci.crypt over 10 years ago people liked the
idea, but it turns out that only a few failed seizures leads the
seizers to develop more sophisticated seizure techniques.  So a year
or so after mandos makes it into stable the threat model will be
radically different.  After all this is an open-source project, so the
adversaries will be able to plan exactly how to circumvent the system.

Trivial example:
Step 1: splice into the network cable to the target machine with a
hub, two patch cords, and a laptop running wireshark.  Connection time
lost = ~1.0 seconds
Step 2:  Learn the target machines IP address.
Step 3.  Detach the laptop from the hub and reset its IP to match the
target machine's IP.
Step 4:  Move the cable from the target machine to the laptop.
Additional connection time lost ~1.0 seconds.  Splice complete.
Step 5:  Take the target machine to the lab and boot it.
Step 6:  VPN from the on-site laptop to the lab
Step 7.  Have the laptop forward the checker/hearbeat packets through
the VPN to the target machine.
Step 8.  Watch the target machine complete the boot process.

If there are lots of machines to seize and multiple mandos servers
with cross-boot support the above technique can be used to move one of
the servers to the lab so all of the seized machines can be accessed.

So I suggest that the server has to authenicate the clients during
checking or the timeout will be ineffective.  Perhaps this could be a
more advanced version with a heatbeat daemon on the client.  Or maybe
it would be enough to use IPSEC AH protocol under the checking.
I believe this issue bears further discussion

>
>> I'm sure the list above does not visit every possible state, but it
>> is my starting point.  Are there other states of a server-to-client
>> relationship besides
>> a. server down
>> b. server just booted (no client status info yet)
>> c. server sees client alive
>> d. server sees client dead < timeout
>> e. server sees client dead >= timeout?
>
> All of "b", "c", and "d" are the same state (except that in state "c",
> the server also resets the timer for the client, but the timer always
> continues to tick.).  It's also a bit of a misnomer to say that in
> state "e" the server "sees client dead", because the server doesn't
> "see" the client dead.  It's just that when the server hasn't seen the
> client *alive* for some time it automatically disables the client from
> getting passwords, irrespectively of what happens with the actual
> checking of the client.

OK.

>
> Let's see if I can take a stab at enumerating the different states:
>
> 0. Server stopped.
> 1. Server running, client enabled.
> 2. Server running, client enabled, checker running.
> 3. Server running, client disabled.
>
> Rules:
>  i) In state 0, when changing to state 1, start a timer with a
>     timeout.
>  ii) In states 1 or 2, a timer timeout will cause a change to state 3.
> iii) In state 1, wait for a bit and then change to state 2.
>  iv) In state 2, when a checker completes successfully, reset the
>     timer before changing to state 1.  If the checker is
>     unsuccessful, just change to state 1 without touching the timer.

Rephrased in pseudo code

Counters
-------------
request timer -- interval between check requests, default 300 sec.
response timer -- window for keep-alive response, default 3600 sec.

Server processing
--------------------------
logical thread #1:
     while ( is_enabled(client) ) {
          wait(request_to);
          check( client );
    }

logical thread #2:
     while ( is_enabled(client) )
          if ( !listen_for_response( client, response_to ) ) /* false
if listen timed out */
               disable( client );

>
> I hope this will be enlightening.

Yes.  More importantly it excludes many alternative possibilities

>
>> For each of the above usage cases please tell me whether I should
>> expect the client to reboot unattended
>
> In states "a" and "e" (and "0" and "3"), the client can not boot
> unattended.  In all other states, it will be able to do so.

Good, thanks.

>
> /Teddy Hogeborn
>
> P.S.  Do I have your permission to re-send your mails and mine to the
> public mandos-dev mailing list?

Of course.  I did not see where to subscribe.  Can you provide a sign-up link?

Lee Winter
NP Engineering
Nashua, New Hampshire