DCI 4 networking : MbufManager

Overview

0 Currently outstanding

C is acceptable as an illustrative language but is not ideally suited to a definition language. Language neutral versions of all structures, etc, need producing. Not all macros contained within the reference mbuf.h file are documented although their behaviour is readily determinable from this specification.

More should probably be said about unsafe data shadowing safe data.

The dci4 client interface contract appendix needs tightening up.

1 Conventions

'quoted strings' should be envisaged in an italic font.

"quoted strings" indicate phrases with specific technical interpretation, typically only when first introduced.

[bracketed text] is an aside to reviewers. Comments on such text is encouraged.

77 hyphen (-) characters denote table and figure delimitation.

2 Basics of operation

A "client" is a program (typically a module and in practise probably constrained to be a module) that uses the facilities of the "MbufManager".

Clients establish a session with the MbufManager on initialisation, use memory management facilities from the mbuf manager during their normal operation, and then close the session with the MbufManager just prior to their shutdown.

Memory is manipulated through the use of a descriptor structure, called an mbuf. An mbuf describes a number of contiguous bytes in memory.

The MbufManager imposes some restrictions, requirements and conventions upon the use of mbufs. A group of client typically implement a further interface contract of their own - the DCI4 client interface contract is such an interface (see Appendix).

3 MbufManager goals

Hide any mechanics not necessary for client operation
Provide single mbuf pool, rather than one per protocol
Provide facilties suitable for device drivers and protocol modules
Provide a balanced compromise between conflicting design goals
Provide negligable long term fragmentation
Provide efficient implementations of facilities offered
Permit pre-allocation of packet storage space (DCI2 doesn't)
Permit modular component upgrade path (DCI2 doesn't)

4 The mbuf structure and its uses

Mbufs are used to provide a descriptive structure layer that describe and dictate access to a conceptual block of memory.

In conventional memory management, the user manipulates a pointer to a block of memory. With mbuf memory management, the user manipulates a pointer to a descriptive structure (an "mbuf"), which itself provides the means to obtain a pointer to the block of memory that it "describes". Further, these structures are chained together to form a linked list, providing a form of scatter/gather memory description. The chain of mbufs describes a single conceptual block of memory, even though the actual memory used might well be scattered throughout real memory.

Thus, an extra layer of structuring is inserted between the user and the block of memory being manipulated when mbuf memory management is used.

By accessing a block of memory through an mbuf structure, it is possible to record size, type and some degree of linkage information. By manipulating the fields within an mbuf, it is possible to efficiently add and remove data from a described" block of memory in a variety of fashions.

Mbufs are allocated and freed by the MbufManager. Programs that requestallocation and freeing operations are called clients.

A client never has a "struct mbuf" itself, only pointers that at some stage came from the MbufManager. (This allows the size of an mbuf to grow later; in particular MbufManagers may differ in the amount of private data stored with the mbuf.)

The C definition of an mbuf structure:

struct ifnet;
struct pkthdr {
    int len; /* total packet length */
    struct ifnet *rcvif; /* receiving interface */
};
typedef struct mbuf {
    struct mbuf *m_next; /* next mbuf in chain */
    struct mbuf *m_list; /* next mbuf in list (clients only) */
    ptrdiff_t m_off; /* current offset to data from mbuf itself */
    size_t m_len; /* current byte count */
    const ptrdiff_t m_inioff; /* original offset to data from mbuf itself */
    const size_t m_inilen; /* original byte count (for underlying data) */
    unsigned char m_type; /* client use only */
    const unsigned char m_sys1; /* MbufManager use only */
    const unsigned char m_sys2; /* MbufManager use only */
    unsigned char m_flags; /* client use only */
    struct pkthdr m_pkthdr; /* client use only */
} dci4_mbuf;

struct mbuf *m_next;

Although mbufs may be manipulated individually, they are almost always used as an mbuf chain. The 'm_next' field of the mbuf structure points at the next mbuf in an mbuf chain. There is no back pointer. Successive mbufs describe conceptually "later" bytes of memory, even if the underlying blocks of actual memory used to hold these bytes are not stored consecutively or "later" in memory. The end of an mbuf chain is indicated by the 'm_next' field containing the NULL pointer.

A client may allocate an mbuf chain for internal use or for communicating data to another (MbufManager) client. An mbuf chain may exist only for a small fraction of a second or it may be used and retained for weeks. It is the task of the MbufManager to ensure that such usage does not cause anything other than negligible memory fragmentation.

ptrdiff_t m_off;

The allocation of an mbuf is always accompanied by the allocation of an additional block of memory (see the discussion on unsafe data later for the exceptions to this). It is this additional block of memory that the mbuf is said to "describe". Access to this described memory is through manipulation of the address of the mbuf itself and fields contained within the mbuf. The location of this memory, relative to the mbuf itself, is not defined, other than the 'm_off' field contains a suitable bias to access this memory.

size_t m_len;

The 'm_off' field is the bias to add to the address of the mbuf to obtain the address of the first byte of data described by that mbuf. The 'm_len' field specifies the number of bytes contained in the described data. This might be envisaged as follows:

Relationships between the address of an mbuf, and the 'm_off', 'm_next' and 'm_len' fields.

                      Mbufs                      Described
                                                    data
     mbuf
     pointer ------> ========= --------+-------> =========
                     |       |        / \        |       |
                     | m_off | ------/   \       |       |
                     |       |            \      |       |
                     | m_len | -----------+      |       |
                     |       |             \     |       |
                     | m_next| --\          \    |       |
                     |       |    \          \   |       |
                     =========     \         --> =========
                                   /
            /---------------------/
            |
            \------> ========= --------+-------> =========
                     |       |        / \        |       |
                     | m_off | ------/   \       |       |
                     |       |            \      |       |
                     | m_len | -----------+      |       |
                     |       |             \     |       |
                     | m_next| -->NULL      \    |       |
                     |       |               \   |       |
                     =========               --> =========

Notes:

Only three of the fields of an mbuf are indicated. This is for clarity purposes. The example illustrates an mbuf chain formed from two mbufs.

Example C code to flatten an mbuf chain into the single sequence of bytes that it conceptually describes:

void flatten_mbuf_chain(struct mbuf *mp, char *buffer)
{
    for ( ; mp != NULL; mp = mp->m_next)
    {
        memcpy(buffer, mtod(mp, char *), mp->m_len);
        buffer += mp->m_len;
    }
}

Notes:

As with most example programs, no thought has been given to the handling of exceptional circumstances.

This algorithm applies equally to safe and unsafe mbuf chains.

'mtod' is a macro, and adds the address of the first byte of the mbuf to the value of the 'm_off' field of the mbuf, yielding the address of the first byte of data described by the mbuf. It also performs a type cast. 'mtod' is supplied as a C macro for convenience.

Only byte alignment is required for the data described by the 'm_off' and 'm_len' fields. All clients must fully cope with non-word aligned addresses and lengths when manipulating the data described by an mbuf, although optimisations for aligned data are encouraged, as is the generation of aligned data. An mbuf structure itself is always at least word aligned in memory.

When the MbufManager performs an allocation for a client and returns an mbuf chain to a client, that client is deemed to have taken "ownership" of that mbuf chain. The precise implications of ownership form part of the interface contract between clients of the MbufManager. For example, the DCI4 specification specifies an interface contract between compliant modules using mbufs. One of the prime responsibilities of ownership is to free the mbuf chain at some stage (or transfer ownership). Whenever the MbufManager returns an mbuf chain, it transfer ownership of this chain at the same time.

Whenever the MbufManager receives an mbuf chain it also takes ownership of that chain. A summary of ownership transfer for direct entry points is given later. Exceptions to these rules are details individually throughout the text.

In order to minimise long term fragmentation (and for various other implementation reasons), the sizes of the underlying memory blocks that may be allocated for an mbuf to describe are constrained to a number of sizes. This permits the situation where an mbuf describes only some of the underlying memory actually allocated.

const ptrdiff_t m_inioff;

const size_t m_inilen;

The 'm_inioff' and 'm_inilen' fields provide a description of the underlying block of memory in the same way as 'm_off' and 'm_len' fields (respectively) provide a description of the described block of memory (which resides somewhere within the underling block, although not necessarily always at the start or end of it). Any values of 'm_len' and 'm_off' are permitted provided that the described block of memory is fully contained within the underlying block described by 'm_inioff' and 'm_inilen' (but see the field validity table below).

struct mbuf *m_list;

unsigned char m_type;

The 'm_list' and 'm_type' fields are provided for the convenience of the client. The contents of these fields when an mbuf is passed from one client to another is part of the interface contract between clients. When a client owns an mbuf chain, it can set these fields to whatever values it requires.

If these fields are used, a client must explicitly initialise them. The MbufManager never examines these fields.

const unsigned char m_sys1;

const unsigned char m_sys2;

The 'm_sys1' and 'm_sys2' fields are private fields maintained by the MbufManager. They should never be read or written by a client under any circumstances, even transiently. The MbufManager is entitled to asynchronously examine these three fields if it requires.

unsigned char m_flags;

struct pkthdr m_pkthdr;

The 'm_flags' field, and the 'm_pkthdr' field in version 0.15 or later of the MbufManager, are provided for the convenience of the client . The contents of these fields when an mbuf is passed from one client to another is part of the interface contract between clients. When a client owns an mbuf chain, it can set these fields to whatever values it requires.

If these fields are used, a client must explicitly initialise them. The MbufManager never examines these fields.

When an mbuf chain is freed, the storage required for the mbuf(s) comprising the chain and the underlying memory associated with each mbuf is placed backinto the free pool and becomes available for subsequent re-allocation. It is not possible to free only one of the mbuf and the underlying storage associated with that mbuf. This ensures that, as long as an mbuf chain is allocated, then the data it describes is also be correctly allocated.

5 Unsafe data

"Unsafe data" is a concept that breaks some of the rules just outlined. In particular, the memory described by an unsafe mbuf is not underlying memory allocated by the MbufManager in the fashion just described.

When an unsafe mbuf is allocated, the MbufManager does not allocate associated underlying storage. Rather, the mbuf is available for the client to set the 'm_off' and 'm_len' fields such that a portion of memory beyond the control of the MbufManager is described by the mbuf. The only requirement of the MbufManager on the 'm_off' and 'm_len' fields for an unsafe mbuf is that the data they describe must be valid whenever an MbufManager operation implicitly or explicitly accesses them. In practise, the only time when these fields may hold random values is when the mbuf (chain) is being freed or transiently during update within a client.

The MbufManager can tell whether an mbuf describes safe or unsafe data from examination of the mbuf. When an unsafe mbuf is freed, there is no freeing action performed on the associated data.

It is because there is no directly enforceable relationship (by the MbufManager) between the lifetime of an unsafe mbuf and the data it describes that the data is termed "unsafe". "Unsafe data" does not imply incorrect behaviour. The phrase is used as is a reminder of the additional constraints on the described data, and serves to encourage the programmer to take the necessary extra precautions.

Unsafe data is often used when it is not necessary to copy the data into a safe mbuf chain, eliminating a copy operation and increasing performance. A client may arrange its internal strategies to permit the use of unsafe data as an optimisation.

In order for the concept of unsafe data to be useful, some statement about the lifetime and validity of the described data must be possible. The client interface contract normally adds further detail to the requirements for unsafe mbufs.

Whenever an unsafe mbuf is supplied to the MbufManager in a context that the MbufManager may examine the data described, then the client pledges that this data will remain valid until that call into the MbufManager returns.

In practise there are two useful ways in which unsafe mbuf chains may be manipulated:

All use of the data is completed before the recipient (client or MbufManager) returns execution control to the supplying client. It is still theresponsibility of the recipient client to free the mbuf chain.
The recipient client copies the unsafe data described into a safe mbuf chain before control returns back to the supplying client. Responsibilityfor freeing the unsafe mbuf chain still lies with the recipient client.

Ideally, all recipient clients would be capable of processing all mbuf chains they receive prior to returning control. Such clients could always use unsafe data in an efficient manner.

As a worst case fallback, whenever a client is supplied with an mbuf chain it always performs an 'ensure_safe' operation to ensure that the data is safe; this always entails data copying for unsafe mbuf chains.

In practise, "ensuring" (see the description of "ensuring" later) potentially unsafe mbufs chain to safe mbuf chains only when necessary is a reasonable compromise.

It is possible (indeed permitted) to allocate a safe mbuf and then generate a second reference to the same data with an unsafe mbuf. Data described in an mbuf chain may thus be "referenced" in almost the same way any other data may

be used in an unsafe mbuf. The difference is that an unsafe mbuf will be required for each describing mbuf in the chain, rather than a single unsafe mbuf to describe a single conceptual region of memory. Additional constraints are also imposed to ensure that the original mbuf chain is not freed before the unsafe mbuf chain. Freeing the safe mbuf chain after the return of the call where the unsafe mbuf chain is used will achieve correct operation.

In some traditional mbuf implementations, use is made of native memory management facilities to provide the ability to remap memory into "mbuf visible" regions, thus avoiding memory copying. This facility is not available in this specification. To some degree, unsafe data lessens this lack.

The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf are initialised according to the allocation method used. See later for details.

Summary of mbuf field validity: client <=> mbuf manager

Field From mbuf To mbuf

===== ========= =======

m_next valid valid

m_list NULL invalid

m_off valid invalid

m_len valid invalid

m_inioff valid valid*

m_inilen valid valid*

m_type invalid invalid

m_sys1 opaque opaque

m_sys2 opaque opaque

m_sys3 opaque opaque

m_pkthdr invalid invalid

Notes:

Field: the name of a field within an mbuf structure.

From mbuf: an mbuf chain being passed from the MbufManager to a client.

To mbuf: an mbuf chain being passed from a client to the mbuf manager.

valid: the field meets the criteria stated within this document.

valid*: unsafe mbufs need not describe real memory in the underlying storage.

There is never any freeing of the underlying storage of an unsafe mbuf.

NULL: the NULL pointer.

invalid: any value may be present. No manipulation of such a value should ever be made. Either the field should be ignored or it should be initialised prior to use.

opaque: never read and never written by any client.

The 'm_inioff' and 'm_inilen' fields of a safe mbuf are never altered by a client - only read. The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf may be altered by the client.

6 MbufManager sessions

The period of time when a client may allocate (and otherwise use) mbufs from the MbufManager is termed a "session". A session is initiated (opened) and terminated (closed) with SWI calls. A client must allocate and maintain an MbufManager control structure (an "mbctl" structure) for a duration encompassing a session (typically, the client has a static structure within it's data area for this purpose). A pointer to this structure is supplied to the MbufManager during both initialisation and termination calls and all direct entry points.

This structure contains, amongst other things, a set of function pointers. These function pointers provide direct entry points into individual routines within the MbufManager. They are initialised by the MbufManager during session initialisation and remain valid until session termination. All of the performance critical routines of the MbufManager (such as mbuf allocation and freeing) are accessed through these direct entry points, which incur considerably less overhead than SWI routines. These entry points are designed to permit the easy inter-operation of assembler and APCS code (such as that generated by the NorCroft C compiler), and roughly obey APCS. A list of entry/exit characteristics follows (using APCS register naming convention):

a1 always points at an mbctl structure for all direct entry calls
the processor must be in supervisor mode (but see MBC_USERMODE)
a1-a4 are the only parameter registers
a2-a4 and ip are corrupted by the call
a1 is either the call result or corrupted
other registers preserved by call
the processor flags are preserved by the call
no V set error convention (incompatible with APCS)
in general, an error results in a1=0 on exit
IRQ state preserved across call
IRQs may be disable during calls
IRQs may be enabled during calls ONLY if specifically documented
FIQs assumed enabled on entry
FIQs preserved across calls

Currently, no direct entry point routine will enable interrupts if they are disabled on entry.

C definition of an MbufManager control structure:

typedef struct mbctl
{
    /* reserved for MbufManager use in establishing context */
    int opaque; /* MbufManager use only */
    /* Client initialises before session is established */
    size_t mbcsize; /* size of mbctl structure from client */
    unsigned int mbcvers; /* client version of MbufManager spec */
    unsigned long flags; /* */
    size_t advminubs; /* Advisory desired minimum underlying block size */
    size_t advmaxubs; /* Advisory desired maximum underlying block size */
    size_t mincontig; /* client required min ensure_contig value */
    unsigned long spare1; /* Must be set to zero on initialisation */
    /* MbufManager initialises during session establishment */
    size_t minubs; /* Minimum underlying block size */
    size_t maxubs; /* Maximum underlying block size */
    size_t maxcontig; /* Maximum contiguify block size */
    unsigned long spare2; /* Reserved for future use */
    /* Allocation routines */
    struct mbuf * /* MBC_DEFAULT */
    (* alloc)
    (struct mbctl *, size_t bytes, void *ptr);
    struct mbuf * /* Parameter driven */
    (* alloc_g)
    (struct mbctl *, size_t bytes, void *ptr, unsigned long flags);
    struct mbuf * /* MBC_UNSAFE */
    (* alloc_u)
    (struct mbctl *, size_t bytes, void *ptr);
    struct mbuf * /* MBC_SINGLE */
    (* alloc_s)
    (struct mbctl *, size_t bytes, void *ptr);
    struct mbuf * /* MBC_CLEAR */
    (* alloc_c)
    (struct mbctl *, size_t bytes, void *ptr);
    /* Ensuring routines */
    struct mbuf *
    (* ensure_safe)
    (struct mbctl *, struct mbuf *mp);
    struct mbuf *
    (* ensure_contig)
    (struct mbctl *, struct mbuf *mp, size_t bytes);
    /* Freeing routines */
    void
    (* free)
    (struct mbctl *, struct mbuf *mp);
    void
    (* freem)
    (struct mbctl *, struct mbuf *mp);
    void
    (* dtom_free)
    (struct mbctl *, struct mbuf *mp);
    void
    (* dtom_freem)
    (struct mbctl *, struct mbuf *mp);
    /* Support routines */
    struct mbuf * /* No ownership transfer though */
    (* dtom)
    (struct mbctl *, void *ptr);
    int /* Client retains mp ownership */
    (* any_unsafe)
    (struct mbctl *, struct mbuf *mp);
    int /* Client retains mp ownership */
    (* this_unsafe)
    (struct mbctl *, struct mbuf *mp);
    size_t /* Client retains mp ownership */
    (* count_bytes)
    (struct mbctl *, struct mbuf *mp);
    struct mbuf * /* Client retains old, new ownership */
    (* cat)
    (struct mbctl *, struct mbuf *old, struct mbuf *new);
    struct mbuf * /* Client retains mp ownership */
    (* trim)
    (struct mbctl *, struct mbuf *mp, int bytes, void *ptr);
    struct mbuf * /* Client retains mp ownership */
    (* copy)
    (struct mbctl *, struct mbuf *mp, size_t off, size_t len);
    struct mbuf * /* Client retains mp ownership */
    (* copy_p)
    (struct mbctl *, struct mbuf *mp, size_t off, size_t len);
    struct mbuf * /* Client retains mp ownership */
    (* copy_u)
    (struct mbctl *, struct mbuf *mp, size_t off, size_t len);
    struct mbuf * /* Client retains mp ownership */
    (* import)
    (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr);
    struct mbuf * /* Client retains mp ownership */
    (* export)
    (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr);
} dci4_mbctl;

Prior to establishing a session, the client initialises the following fields of the mbctl structure:

size_t mbcsize;
unsigned int mbcvers;
unsigned long flags;
size_t advminubs;
size_t advmaxubs;
size_t mincontig;
unsigned long spare1;

The values a client initialises these fields to are defined as follows:

mbcsize: The size of the mbctl structure. This is the size of the structure understood by the compiler/assembler at compilation time. Future versions of this specification may add other fields. In C, one might use "sizeof(struct mbctl)".

mbcvers: The version of the MbufManager specification that the client is implemented against. This is the major version times one hundred plus the minor version. Minor version number changes indicate bug fixes and the possible introduction of small and upwardly compatible changes. Major revision number changes indicate major and possibly not entirely backwardly compatible changes.

flags: This bitset supplies various pieces of information to the MbufManager. See the description of the SWI Mbuf_OpenSession SWI later on for details of suitable values to enter in this field.

advminubs: Advisory minimum underlying block size. This value advises the MbufManager of the smallest underlying block size that the client thinks appropriate for its requirements. Traditional mbuf clients might well use the orignal value of MLEN (112 in most cases) for this value. If no particular value seems appropriate, a client should set this field to zero.

advmaxubs: Advisory maximum underlying block size. This value advises the MbufManager of the largest underlying block size that the client thinks appropriate for its requirements. An ethernet device driver client might well use the ethernet MTU (maximum transmission unit) value of 1500. If no particular value seems appropriate, a client should set this field to zero.

mincontig: This specifies the maximum size the client will ever specify to the "contiguify" routine. If the MbufManager can never meet the value specified, it will refuse to open the session. If no particular value seems appropriate, a client should set this field to zero.

spare1: This field must be initialised to zero.

The contents of all other fields of the mbctl structure are irrelevant at the start of session initiation.

The next stage of session initiation is the issuing of an SWI Mbuf_OpenSession SWI call, supplying the address of this mbctl structure to the MbufManager as a parameter. If the session requested can be support by the MbufManager then it will initialise all other fields of the mbctl structure before returning. If the session cannot be supported, then no fields of the mbctl structure will be modified by the MbufManager and an error will be returned.

If no error is returned, the session is established and the client may use the direct entry points now available.

A session is terminated with the SWI Mbuf_CloseSession SWI call, supplying it the address of the same mbctl structure used to establish the connection.

int opaque;

The 'opaque' field is for the use of the MbufManager. It is initialised during session establishment. It must never be read or written by a client during a session.

All the direct entry points take a fixed first parameter of the address ofthe mbctl structure used to establish the session. This permits the MbufManager to establish any necessary context.

7 Allocation routines

struct mbuf * /* Parameter driven */
(* alloc_g)
(struct mbctl *, size_t bytes, void *ptr, unsigned long flags);

There is one general purpose allocation routine (alloc_g), and a number of more specialised allocation routines. All of these are accessed through the direct entry addresses contained in the initialised mbctl structure. The particular values the MbufManager supplies for the addresses of the direct entry point routines are chosen to be as optimal as possible for the clients indicated requirements. The functionality of these specific routines may be accessed through the general purpose routine; they are provided solely for performance reasons.

A successful allocation returns a chain of mbufs that satisfy all the criteria of the allocation. An unsuccessful allocation returns the NULL pointer. A NULL pointer indicates either a lack of some resource (typically mbufs or underlying storage) or a set of criteria that cannot be satisifed. If, for whatever reason, an allocation is constrained to a single mbuf, then the 'm_next' field of that mbuf will always be zero. In other words, whatever the entry flags may suggest, effectively, an mbuf chain is always returned.

The 'flags' bitset provides a list of constraints and deviations that are to be applied to an allocation.

The default allocation has all bits of the 'flags' bitset clear. In particular, the default allocation is for safe data. The defined bits are as follows:

MBC_DEFAULT 0x00000000ul

Bit(s)	Name	Meaning
0	MBC_UNSAFE
1	MBC_SINGLE
2	MBC_CLEAR
2-31		Reserved, must be zero

Allocation consists of up to three internal phases:

allocation
clearing
copying

Roughly speaking; the allocation phase always happens, the clearing phase happens when the MBC_CLEAR bit is set, and the copying phase happens when 'ptr' is not the NULL pointer.

If the allocation phase fails the clearing and copying phases are always skipped and the NULL pointer returned. The clearing and copying phases are not capable of failing (merely not happening).

8 The allocation phase

Table of different allocation options

MBC_UNSAFE bytes MBC_SINGLE type

0 0 ? 1

0 \= 0 0 2

0 \= 0 1 3

1 0 ? 4

1 \= 0 ? 5

Notes:

Column headings:

MBC_UNSAFE: the value of the MBC_UNSAFE bit

bytes: the value of the 'bytes' parameter

MBC_SINGLE: the value of the MBC_SINGLE bit

type: reference to detailed description

Column contents:

0: equal to zero, or flag clear

1: flag set

\=0: not equal to zero

?: any value (0 or 1 for bits)

1) MBC_UNSAFE = 0, bytes = 0, MBC_SINGLE = ?

The first available mbuf is chosen (so the setting of MBC_SINGLE is irrelevant). The actual size of the described data returned is unknown in advance, other than it is equal to or larger than the minimum underlying block size of the MbufManager (the 'minubs' field of the mbctl structure). 'm_len' and 'm_off' are set to reflect the underlying block (ie 'm_len' =

'm_inilen' and 'm_off' = 'm_inioff', respectively). Should a clearing or copying phase occur, then the value used for 'bytes' will be the value of 'm_len' in the newly allocated mbuf. Such copying might be the start of a variable sized mbuf chain building algorithm.

m_next: NULL - always only one mbuf

m_list: NULL

m_off: describes underlying block

m_len: size of underlying block

m_inioff: describes underlying block

m_inilen: size of underlying block

2) MBC_UNSAFE = 0, bytes \= 0, MBC_SINGLE = 0

A chain of an arbitary number of mbufs is allocated, with a total described data size of 'bytes' bytes.

m_next: chain of mbufs returned

m_list: NULL

m_off: describes allocated memory

m_len: summed over the chain, gives 'bytes'

m_inioff: describes underlying block

m_inilen: size of underlying block

3) MBC_UNSAFE = 0, bytes \= 0, MBC_SINGLE = 1

Precisely one mbuf is allocated to describe the required number of bytes. It is possible for such allocations to fail due to not being able to locate an mbuf and underlying block with sufficient size.

m_next: NULL - always only one mbuf

m_list: NULL

m_off: describes allocated memory

m_len: 'bytes'

m_inioff: describes underlying block

m_inilen: size of underlying block

4) MBC_UNSAFE = 1, bytes = 0, MBC_SINGLE = ?

A single unsafe mbuf is allocated and set to describe no data. The value of 'ptr' is irrelevant. The MBC_CLEAR flag will be forced clear. 'm_off' and 'm_inioff' will describe the same value as the NULL pointer, and 'm_len' and 'm_inilen' will be zero. This is the only circumstance in which an mbuf (anywhere in an allocated chain) is allocated with zero in the 'm_len' field and returned directly from an allocation routine. (Note the anomoly for the 'copy' routine when asked to duplicate zero bytes.) The data described by an unsafe mbuf is not suitable for 'dtom', unless the data described is a "shadow" or "reference" to some previously allocated safe data, in which case 'dtom' will return the mbuf pointer for the original, safe, mbuf.

m_next: NULL - always only one mbuf

m_list: NULL

m_off: describes the NULL pointer

m_len: zero

m_inioff: describes the NULL pointer

m_inilen: zero

5) MBC_UNSAFE = 1, bytes \= 0, MBC_SINGLE = ?

A single unsafe mbuf is allocated. The 'm_len' field is set to the value of 'bytes'. 'm_off' is set to describe 'ptr', whatever the value of 'ptr'. This means suppling 'ptr' as the NULL pointer will cause the unsafe mbuf returned to have a non-zero byte count but for the data described to occur from address zero onwards. This is only useful when 'm_off' is later initialised to describe real memory. The data described by an unsafe mbuf is not suitable for 'dtom', unless the data described is a "shadow" or "reference" to some previously allocated safe data, in which case 'dtom' will return the mbuf pointer for the original, safe, mbuf.

m_next: NULL - always only one mbuf

m_list: NULL

m_off: describes 'ptr'

m_len: 'bytes'

m_inioff: describes 'ptr'

m_inilen: 'bytes'

9 The clearing phase

The clearing phase will set to zero all the underlying bytes in the allocated mbuf chain. The number of bytes zeroed is directly independent of the 'bytes' and 'ptr' values ('bytes' as zero indirectly dictates the number of bytes allocated). The clearing phase only occurs if the following conditions are all met:

The allocation phase succeeded
The MBC_CLEAR bit is set *
The MBC_UNSAFE bit is clear

*: The MBC_CLEAR bit can be cleared during the allocation phase. This clearing overrides any value the bit may have had on entry to the allocation routine and prevents the clearing phase from occurring.

10 The copying phase

The copying phase copies data from 'ptr' into the described data. The number of bytes copied is the number of bytes described by the mbuf chain. This is the value of 'bytes' supplied to the allocation routine if 'bytes' was non-zero, and the underlying block size if 'bytes' was zero. The copying phase may be viewed as importing data into an mbuf chain from "raw" memory.

A client cannot determine if copied bytes were cleared during the clearing phase or not. The copying phase only occurs if the following conditions are all met:

The allocation phase succeeded
The MBC_UNSAFE bit is clear
'ptr' is not the NULL pointer

Summary of allocator routines and implicit or explicit 'flags' settings

Allocator Control over flags

alloc MBC_DEFAULT (0)

alloc_g parameter to the call

alloc_s MBC_SINGLE

alloc_u MBC_UNSAFE

alloc_c MBC_CLEAR

11 Freeing mbufs

void
(* free)
(struct mbctl *, struct mbuf *mp);
void
(* freem)
(struct mbctl *, struct mbuf *mp);
void
(* dtom_free)
(struct mbctl *, struct mbuf *mp);
void
(* dtom_freem)
(struct mbctl *, struct mbuf *mp);

Once an mbuf chain or an individual mbuf is finished with, it is freed and its resources become available for re-allocation. Variants of the free call are available that free either just a single mbuf (without examining the 'm_next' field of the mbuf supplied) or the entire chain it describes. The routine that frees a single mbuf is called 'free' and the routine that potentially frees multiple mbufs is called 'freem'.

Additionally, routines that perform the equivalent of a 'dtom' call followed by a freeing call are provided.

No action is performed by a free call if supplied the NULL pointer.

Summary of mbuf and mbuf chain freeing routines:

free Free single mbuf (ignores 'm_next' field)

freem Free entire mbuf chain (uses 'm_next' field)

dtom_free Performs 'dtom' action then behaves the same as 'free'

dtom_freem Performs 'dtom' action then behaves the same as 'freem'

12 Support and ensuring routines

struct mbuf *
(* dtom)
(struct mbctl *, void *ptr);

Under the right circumstances, it is possible to perform a transformation from the address of any byte described by an mbuf to the address of the mbuf describing that byte. This transformation is performed with the 'dtom' routine. The presence of a 'm_next' field, and the lack of a hypothetical 'm_prev' field means that 'dtom' provides access to a portion of the conceptually described data, starting with the first byte described by the mbuf that describes the supplied address, and extending to the end of the mbuf chain. The client cannot directly determine if the mbuf returned by 'dtom' is the first mbuf in a chain or an mbuf part-way along a chain.

The required circumstances for 'dtom' to operate correctly are that the described data is safe.

Applying 'dtom' to an unsuitable address will return the NULL pointer. The NULL pointer is always an unsuitable address for 'dtom'.

The 'dtom' and 'mtod' transformations are not entirely symmetrical. 'dtom' will always return the address of the mbuf owning the underlying storage referenced, indepedent of the number of unsafe mbufs also referencing that storage.

The transformation performed by 'dtom' is necessarily based on the address supplied; this includes whether the address is within the region(s) of memory controlled by the MbufManager or not. For this reason, if a safe mbuf chain has been "shadowed" with an unsafe mbuf chain, then 'dtom' will always return the original safe mbuf. Further, freeing the chain returned by 'dtom' will free the original, safe mbuf chain, leaving the unsafe mbuf chain describing (through reference) now unknown data (this certainly warrants the description "unsafe data" and is one of the reasons for the 'ensure_safe' routine, although in this particular case it would be far too late to make the call to 'ensure_safe'). It is the clients responsibility to ensure that such problems do not occur (typically through appropriate interface contracts between clients).

struct mbuf *
(* ensure_contig)
(struct mbctl *, struct mbuf *mp, size_t bytes);

For a protocol client, the removal of protocol layers (headers or trailers) when a packet passes up a protocol stack is often made easier if all of the bytes constituting a particular header level are contiguous. (This permits the protocol to conceptually overlay the received packet with a structure describing the protocol header.) The 'ensure_contig' routine is used to ensure such "contiguousness" requirements are met by an mbuf chain, and "contiguifies" the specified number of bytes at the head of the mbuf chain supplied. Described data that is contiguified is also always at least word aligned for the first byte. This helps with the overlaying of word orientated structures.

struct mbuf *
(* ensure_safe)
(struct mbctl *, struct mbuf *mp);

If safe data is required and the safeness of an mbuf chain is uncertain, then the 'ensure_safe' routine may be used to ensure that data in the returned mbuf chain is safe.

In both cases (that is, 'ensure_contig' and 'ensure_safe'), "ensure" is used as a technical term meaning:

the data (mbuf chain) meets the required condition

THEN

return the data unmodified

ELSE

return some data that does meet the required condition

The process of generating data that does meet the required condition involves allocating one or more mbufs with appropriate constraints (equivalent to MBC_SINGLE, and MBC_DEFAULT allocation constraints) and replacing existing mbufs in the chain with these new mbufs. Any existing mbuf that is replaced is automatically freed.

Any of the ensure routines may fail if they have to perform allocations and there is a lack of resource. If this happens, then the entire mbuf chain supplied is freed and the NULL pointer is returned.

If the ensure operation does not fail, then the returned mbuf chain will meet the desired criteria. Whether an ensure operation fails or succeeds, the client must use the returned mbuf chain. There is an ownership transfer to the MbufManager whilst the ensure operation is performed and then another ownership transfer back to the client of the resulting mbuf chain that meets the criteria - these two mbuf chains may happen to be the same, but the loss and regain of ownership means a client cannot tell.

If the mbuf chain meets the indicated criteria on entry, then the ensure routine cannot fail and will always return the supplied pointer without modification.

Under some circumstances the 'ensure_contig' routine may be able to avoid an allocation by moving data around within the existing mbuf chain (this requires some underlying bytes not described by the mbuf chain itself). If these circumstances apply, they cannot generate a failure condition themselves. Thus, if only shuffling is required, then 'ensure_contig' cannot fail, but if shuffling and allocation are required, then 'ensure_contig' can fail through lack of resources for the allocation.

Note that an mbuf chain may contain a mixture of mbufs, each with its own characteristics. For example, an individual mbuf may contain safe or unsafe data, it may meet some contiguity requirement or not and there may or may not be underlying bytes described. An mbuf chain may be composed of any mixture of such mbufs. The ensure routines arrange that all necessary mbufs within a chain meet the desired criteria.

'ensure_contig' cannot operate on unsafe mbufs and will fail (ie freethe supplied chain and return NULL) if asked to do so.

'ensure_safe' returns an mbuf chain where every byte described by it is known to be safe.

'ensure_contig' returns an mbuf chain where the first 'N' bytes are known to be contiguous.

int
(* any_unsafe)
(struct mbctl *, struct mbuf *mp);
int
(* this_unsafe)
(struct mbctl *, struct mbuf *mp);

A client may determine if an mbuf chain has any unsafe data in it with the 'any_unsafe' routine. This returns 0 for either no unsafe data (ie all data safe) or if supplied the NULL pointer. It returns 1 if the mbuf chain supplied contains unsafe data. The 'this_unsafe' returns the same values but only examines the mbuf supplied - that is, it does not follow the 'm_next' field.

size_t
(* count_bytes)
(struct mbctl *, struct mbuf *mp);

The number of bytes described by an mbuf chain may be quickly determined with the 'count_bytes' routine. If supplied the NULL pointer, then 0 is returned.

struct mbuf *
(* cat)
(struct mbctl *, struct mbuf *old, struct mbuf *new);

One mbuf chain may be appended to the end of another mbuf chain with the 'cat' routine. The mbuf chain that gets appended to is the first mbuf parameter ('old'). The mbuf chain to be appended is the second mbuf parameter ('new'). Note that there is no ownership transfer of the second mbuf parameter. If 'old' is the NULL pointer, the 'new' is returned without examination. If 'old' is not the NULL pointer and 'new' is the NULL pointer then 'old' is returned without any modifications made to it.

struct mbuf *
(* trim)
(struct mbctl *, struct mbuf *mp, int bytes, void *ptr);

The 'trim' routine is used to remove bytes from either the head or the tail of an mbuf chain. It may optionally copy the bytes described to another piece of memory (performing a "flattening" operation in the process). All trimming is performed by adjusting 'm_off' and 'm_len'. No mbufs are removed (unlinked and freed) from the either the head or the tail of the chain. If 'bytes' is greater than zero, then it specifies the number of bytes to remove from the head of the mbuf chain. If 'bytes' is zero then no alterations are performed and no data is copied. If 'bytes' is less than zero, then the absolute value of 'bytes' is the number of bytes to remove from the tail of the mbuf chain.

If the NULL pointer is supplied for the mbuf chain, no operations are performed. If 'ptr' is not the NULL pointer, then any bytes "trimmed" from the mbuf chain will be copied into the supplied area of memory. If 'ptr' is the NULL pointer, then no data copying is performed and only a trimming operation occurs. The magic value M_COPYALL may be used for the 'bytes' parameter to indicate the entire mbuf chain. This is only useful if a copy is also being performed, although it will always correctly set the mbuf chain to describe zero bytes. If the number of bytes to trim (after tail adjustment

if applicable) is greater than the number of bytes described by the mbuf chain, then behaviour is as if M_COPYALL was supplied for a trim byte count.

struct mbuf *
(* copy)
(struct mbctl *, struct mbuf *mp, size_t off, size_t len);
struct mbuf *
(* copy_p)
(struct mbctl *, struct mbuf *mp, size_t off, size_t len);
struct mbuf *
(* copy_u)
(struct mbctl *, struct mbuf *mp, size_t off, size_t len);

The 'copy' routine is used to duplicate a portion of an mbuf chain. An mbuf chain is always allocated, even if it eventually describes zero bytes. This distinguishes successful allocations that required no data from unsuccessful allocations and makes the behaviour of the 'len' parameter more orthogonal. The returned mbuf chain will have a minimum byte count of zero and a maximum byte count equal to the byte count of the supplied mbuf chain. The portion

of the mbuf chain copied is the intersecting region between the described data of the supplied mbuf chain and the region starting 'off' bytes into the supplied chain and continuing for 'len' bytes. The first byte described by an mbuf chain is byte 0.

The 'alloc' allocator routine is used. This means the returned mbuf chain will be safe, may have any number of mbufs, is directly suitable for the 'dtom' routine and any unused underlying storage may hold random values. If a lack of resources occur, the NULL pointer is returned. If 'mp' is the NULL pointer, then the NULL pointer is returned. If 'len' holds the magic value M_COPYALL, then all remaining bytes in the mbuf chain from 'off' onwards are copied. M_COPYALL has the value 0x7f000000, in hexadecimal in C notation. The mbuf chain supplied will not be altered.

The 'copy_p' routine behaves as 'copy' does, except that the m_type, m_flags and m_pkthdr fields are assumed to contain significant information. This prevents two small mbufs with different m_type values from being merged into a single larger mbuf. This does not prevent a single mbuf being copied into more than one mbuf; this would replicate the m_type field. If the usage of m_type is more sophisticated than the simple 'tagging' discussed above, then it is likely than the client will require a custom copying routine.

The 'm_copy_u' routine produces an unsafe copy of an mbuf chain. This means no new underlying storage will be allocated. The chain returned will have the same number of mbufs as that supplied. The m_type, m_flags and m_pkthdr fields are replicated from the old chain to the new chain. The 'alloc_u' routine is used to perform the new allocations.

struct mbuf *
(* import)
(struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr);

Data may be copied from raw memory into an mbuf chain with the 'import' routine. Copying starts at 'ptr' for reading and the first byte described by the mbuf chain for writing. Copying proceeds until either the entire mbuf chain has been filled or 'bytes' bytes have been read. The magic value M_COPYALL may be used to indicate that the entire mbuf chain should be filled. The return value is the mbuf chain supplied. If either the mbuf chain or 'ptr' is the NULL pointer then no operation is performed.

struct mbuf *
(* export)
(struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr);

Data may be copied from an mbuf chain into raw memory with the 'export' routine. Copying starts at the first byte described by the mbuf chain for reading and 'ptr' for writing. Copying proceeds until either the entire mbuf

chain has been copied or 'bytes' bytes have been written to raw memory. The magic value M_COPYALL may be used to indicate that the entire mbuf chain should be copied to raw memory. The return value is the mbuf chain supplied. If either the mbuf chain or 'ptr' is the NULL pointer then no operation is performed.

Summary of mbuf chain ownership transfer for direct entry points

Routine Category

alloc Fresh

alloc_g Fresh

alloc_u Fresh

alloc_s Fresh

alloc_c Fresh

ensure_safe Release and gain

ensure_contig Release and gain

free Release

freem Release

dtom_free Release

dtom_freem Release

dtom No transfer

any_unsafe No transfer

this_unsafe No transfer

count_bytes No transfer

cat No transfer

trim No transfer

copy Fresh *

copy_p Fresh *

copy_u Fresh *

import No transfer

export No transfer

Notes:

Fresh: A new mbuf chain is generated and the client receives ownership of this new chain when it receives the chain itself.

Fresh *: Ownership of the supplied mbuf chain remains with the caller.

Release: Ownership of the mbuf chain supplied is passed to the MbufManager.

Release and gain: Ownership of the mbuf chain supplied is passed to the MbufManager. Ownership of the returned chain is passed to the client at the same time as the mbuf chain itself.

No transfer: No ownership transfer occurs (and hence no linkage changes are performed), but the MbufManager does expect a valid mbuf chain that remains static during the period of the call.

13 SWI Entry points

All SWIs defined here obey the RISC OS convention of indicating success by returning with the V flag clear and failure by returning with the V flag set and r0 pointing at a standard RISC OS error block. For convenience, this is omitted from the definition of each SWI individually. All SWIs defined here also obey the standard convention regarding interrupts, unless otherwise specified. That is: IRQ interrupt state is preserved across the SWI, although it may be enabled during the SWI. FIQ interrupt state is assumed enabled and not altered. This is described as the normal behaviour in the text below.

Mbuf_OpenSessionSWI &4A580

Establishes a session with the MbufManager

Address of an 'mbctl' structure

R0 - R9

preserved

Interrupts are preserved, but may be enabled during call

Fast interrupts are preserved

Processor is in undefined mode

Not defined

This SWI is used by clients to establish a session with the MbufManager. This informs the MbufManager of the clients mbuf requirements, and informs the client of the direct entry point addresses into the MbufManager that are appropriate for its mbuf requirements. A certain amount of validation of the proposed session is performed by the MbufManager. This may result in modifications of behaviour within the MbufManager and it may also result in a refusal to accept a session, with an error being returned in the normal fashion.

The flags field of the mbctl structure provides additional information to the MbufManager about the required session. The only flag with a defined meaning at present is the MBC_USERMODE flag.

The MBC_USERMODE flag may be specified in the flags field to request that the direct entry points be suitable for user mode calling. If this is not specified, then the direct entry points must be called in supervisor mode. If MBC_USERMODE is specified, then the direct entry points supplied must be called in user mode. This permits normal user mode applications to interact with the MbufManager. Care must be taken with unsafe data to ensure that the memory described is valid when the user mode application is not the current application, and hence might not have its memory currently 'mapped in'.

All other bits in the flags bitset should be zero.

MBC_USERMODE

0 - This client requires supervisor mode direct entry points.

1 - This client requires user mode direct entry points.

Errors:

As normal.

"MbufManager unsuitable for client"

Notes:

The addresess of the routines supplied for the direct entry points may vary according to the requirements indicated by a client. In some circumstances, the MbufManager is able to supply a "null" routine: ie an immediate return.

Further details of the fields and their uses is found elsewhere in this document.

Mbuf_CloseSessionSWI &4A581

Terminates a session with the MbufManager

Address of an 'mbctl' structure previously supplied

R0 - R9

preserved

Interrupts are preserved, but may be enabled during call

Fast interrupts are preserved

Processor is in undefined mode

Not defined

This SWI is uses to terminate a session that has previously been successfully created with the Mbuf_OpenSession SWI.

Errors:

As normal.

"No such session"

Notes:

This SWI is used to terminate a session. The address supplied must be the same as that supplied to Mbuf_Init when the session was created. Whether an error is returned or not, the client must consider the session closed after issuing this SWI.

Mbuf_MemorySWI &4A582

Controls maximum memory usage

New limit to use in bytes, or 0 to read current limit

R0	=	Approximate limit active when SWI issued
R1 - R9	preserved

Interrupts are preserved, but may be disabled during call

Fast interrupts are preserved

Processor is in undefined mode

Not defined

Provides a means to limit the maximum amount of memory that may be claimed by the MbufManager for mbuf and underlying storage.

Errors:

As normal.

Notes:

If zero is supplied as the new desired limit, then the limit is not altered and only an examination is performed. Limits are specified in bytes. They are approximate figures only (due to the underlying allocations being performed in granularities larger than a single byte). They are normally within one about kilobyte of the actual value. A new, larger limit does not cause more memory to be automatically claimed. The MbufManager may attempt to dynamically maintain an appropriately sized free pool. This limit provides a ceiling to any dynamic fluctuations. A user interface might well use a granularity of four kilobytes.

Mbuf_StatisticSWI &4A583

Returns statistics for the MbufManager

None

Interrupts are undefined

Fast interrupts are undefined

Processor is in undefined mode

Not defined

This SWI provides an entry point that conforms to the DCI4 Statistic Interface. Please refer to that document for further details.

FIXME: Provide links to the Statistics documentation FIXME: Provide at least the register call information

Mbuf_ControlSWI &4A584

General purpose control interface for MbufManager

Reason code:

Value	Meaning
0	SWI MBuf_Control 0

R1 - R9

Dependant on reason code

Interrupts are preserved

Fast interrupts are preserved

Processor is in undefined mode

Not defined

This is a general purpose control interface to the MbufManager. Different implementations may implement different control calls.

Notes:

Issuing this SWI with a reason code of 0 is a good method of checking for the presence of the MbufManager.

Mbuf_Control 0VersionSWI &4A584

Read version number of the MbufManager

0 (reason code)

R0	=	MbufManager version × 100
R1 - R9	preserved

Interrupts are preserved

Fast interrupts are preserved

Processor is in undefined mode

Not defined

This SWI reason is used to read the version of the MbufManager.

14 Service calls

Service_MbufManagerStatusService Call &A2

MbufManager state change notifications

Reason code (&A2)

Sub-reason code :

Value	Meaning
0	MbufManager has started
1	MbufManager is shutting down
2	MbufManager wishes to reclaim buffers

preserved

The MbufManager issues service calls to notify clients and potential clients of desired and actual state changes. The service call used is Service_MbufManagerStatus (service call number 0xa2, in C). A reason code is passed in r0 to indicate the reason for the service call.

Service_MbufManagerStatus 0StartedService Call &A2

MbufManager has started

R1	=	reason code (&8A)
R2	=	Sub-reason code (0)

preserved

MbufManager has started and is now available for use. It is possible to issue SWIs to the MbufManager as soon as this service call has been seen (it is issued from a callback to ensure this is possible).

Service_MbufManagerStatus 1StoppingService Call &A2

MbufManager is shutting down

R1	=	reason code (&8A)
R2	=	Sub-reason code (1)

preserved

The MbufManager is finishing. There are no open sessions if this reason code is used. The MbufManager will refuse to die if there are any open sessions.

Service_MbufManagerStatus 2ScavengeService Call &A2

MbufManager wishes to reclaim buffers

R1	=	reason code (&8A)
R2	=	Sub-reason code (2)

preserved

This reason code is used to indicate that the MbufManager is running short of allocatable memory and any clients with allocated data that may be easily recreated (such as cached data) should release this memory (ideally) before returning from the service call.

15 The life and death of an MbufManager

The components necessary to form a working DCI4 environment may be loaded in a number of different orders. To permit sensible, defined behaviour for all of these orders, the MbufManager and all device drivers provide mechanisms to announce their arrival and departure, and for their presence to be determined through a polled action.

The MbufManager announces its arrival with the Service_MbufManagerStatus service call with a reason code of MbufManagerStatus_Started. An client that cannot detect the presence of the MbufManager when it (the client) loads (via an SWI Mbuf_Control SWI call, for example) should place itself in a 'pre-active' state and await this service call. The MbufManager will respond to SWIs when the service call is issued.

Any attempt to kill the MbufManager when there are open sessions will be refused.

If the MbufManager is requested to die and there are not open sessions, it will issue the Service_MbufManagerStatus 1 reason code in a Service_MbufManagerStatus service call to notify potential clients of this.

Clients that can remain inactive without an open session with the MbufManager should do so, to give the user more flexibility should it be necessary to upgrade the MbufManager. Either way, killing all modules with open sessions should always permit the MbufManager to be killed.

16 Glossary

A "NULL pointer" is a pointer with all bits clear. It is never a valid address of examination or modification for virtually all programs under virtually all circumstances.

A linked list of mbufs constructed with the 'm_next' field is referred to as a "chain of mbufs".

A linked list of mbufs constructed with the 'm_list' field is referred to as a "list of mbufs" or a "list of mbuf chains".

A chain and a list may consist of just one mbuf. The ends of the chain and list are indicated by the NULL pointer.

Lists of chains are constructed, but not vice versa (certainly within this specification).

A1 Appendix - DCI4 MbufManager client interface contract

Ownership of an mbuf chain grants permission to examine and modify the described data, alter the order of the chain, etc. It also brings the responsibility to either pass ownership to another client or to ensure that the mbuf chain is (eventually) freed.

In short, ownership permits useful things to be done with an mbuf chain, and carries with it the responsibility to free the data.

When a client obtains a pointer to an mbuf chain from outside itself (ie from another client or from the MbufManager), it is deemed to have taken ownership of that chain. When it supplies that pointer to another client or the MbufManager, it has lost ownership. An mbuf chain is never owned by more than one client.

The presence of asynchronous execution mechanisms (such as interrupts) requires a more precise definition. During ownership transfer, there is a transient period where the relinquishing client has called the recipient client, but the call has not yet returned. Depending upon the precise definition, during this period of time, the mbuf chain could be viewed as being owned by zero, one or two clients. The definition for this specification is that the mbuf chain is owned by zero clients.

This requires the relinquishing client to take whatever steps are necessary to ensure that it cannot continue to access an mbuf chain before calling the recipient client. An example of such steps might be to remove the mbuf chain pointer from a list of such pointers examined by an interrupt routine of the relinquishing client or to set a semaphore of some form.

The mbuf pointer supplied to the call that transfers ownership should be the only copy of that pointer value that the relinquishing client has.

In short, once ownership transfer is committed to, the original owner has already lost ownership.

[Is this adequate. Does it supply the necessary degree of precision?]

Whenever one client passes an mbuf chain that describes unsafe data (an "unsafe mbuf chain") to another client, it pledges that the data will remain valid until the recipient client returns through the thread of control that supplied the unsafe mbuf chain.

If the recipient client were to retain the supplied mbuf chain beyond this point in time, then it might describe invalid data (it is possible that exceptions will arise if an attempt is made to access the described data, for example).

A device driver never supplies a protocol module an unsafe mbuf chain (this is a DCI4 protocol to device driver restriction only).

[I dislike this restriction, but it is necessary for the sweeping 'dtom' statement to apply.]

When a device driver supplies a received packet chain to a protocol module, the m_type field of the first mbuf holds MT_HEADER (2) and all the other m_type fields hold MT_DATA (1).

When a protocol module supplies an mbuf chain to a device driver, the m_type field of all described mbufs is invalid and must not influence the behaviour of the device driver.

Summary of mbuf field validity: dci4 client <=> dci4 client

Field From protocol From device driver

m_next valid valid

m_list valid* valid*

m_off valid valid

m_len valid valid

m_inioff valid valid

m_inilen valid valid

m_type invalid valid

m_sys1 opaque opaque

m_sys2 opaque opaque

m_flags invalid invalid

m_pkthdr invalid invalid

Notes:

Field: the name of a field within an mbuf structure.

From protocol: an mbuf chain for transmission

From device driver: a newly received mbuf chain for protocol processing.

From protocol: an mbuf chain being passed from a protocol to a device driver for transmission.

valid: the field meets the criteria stated within this document.

valid*: This is a list of mbuf chains. A protocol can avoid fragmenting datagrams down to the device driver mtu by using a list of unsafe mbufs to shadow the real data. The device driver uses 'ensure_safe' if it needs to retain an mbuf beyond the transmit call returning.

invalid: any value may be present. No manipulation of such a value should ever be made. Either the field should be ignored or it should be initialised prior to use.

opaque: never read and never written by any client.

The 'm_inioff' and 'm_inilen' fields of a safe mbuf are never altered by a client - only read. The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf may be altered by the client.

A2 Appendix - Supplementary clarifications

There are no supplementary clarifications known to be required at present.

DCI 4 networkingMbufManager

Overview

0 Currently outstanding

1 Conventions

2 Basics of operation

3 MbufManager goals

4 The mbuf structure and its uses

5 Unsafe data

6 MbufManager sessions

7 Allocation routines

8 The allocation phase

9 The clearing phase

10 The copying phase

11 Freeing mbufs

12 Support and ensuring routines

13 SWI Entry points

14 Service calls

15 The life and death of an MbufManager

16 Glossary

A1 Appendix - DCI4 MbufManager client interface contract

A2 Appendix - Supplementary clarifications

Restructure as PRM-in-XML

Tidy up XML