Fixed MTP to work with TWRP

This commit is contained in:
awab228 2018-06-19 23:16:04 +02:00
commit f6dfaef42e
50820 changed files with 20846062 additions and 0 deletions

View file

@ -0,0 +1,114 @@
INFINIBAND MIDLAYER LOCKING
This guide is an attempt to make explicit the locking assumptions
made by the InfiniBand midlayer. It describes the requirements on
both low-level drivers that sit below the midlayer and upper level
protocols that use the midlayer.
Sleeping and interrupt context
With the following exceptions, a low-level driver implementation of
all of the methods in struct ib_device may sleep. The exceptions
are any methods from the list:
create_ah
modify_ah
query_ah
destroy_ah
bind_mw
post_send
post_recv
poll_cq
req_notify_cq
map_phys_fmr
which may not sleep and must be callable from any context.
The corresponding functions exported to upper level protocol
consumers:
ib_create_ah
ib_modify_ah
ib_query_ah
ib_destroy_ah
ib_bind_mw
ib_post_send
ib_post_recv
ib_req_notify_cq
ib_map_phys_fmr
are therefore safe to call from any context.
In addition, the function
ib_dispatch_event
used by low-level drivers to dispatch asynchronous events through
the midlayer is also safe to call from any context.
Reentrancy
All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to
perform all synchronization necessary to maintain consistency, even
if multiple function calls using the same object are run
simultaneously.
The IB midlayer does not perform any serialization of function calls.
Because low-level drivers are reentrant, upper level protocol
consumers are not required to perform any serialization. However,
some serialization may be required to get sensible results. For
example, a consumer may safely call ib_poll_cq() on multiple CPUs
simultaneously. However, the ordering of the work completion
information between different calls of ib_poll_cq() is not defined.
Callbacks
A low-level driver must not perform a callback directly from the
same callchain as an ib_device method call. For example, it is not
allowed for a low-level driver to call a consumer's completion event
handler directly from its post_send method. Instead, the low-level
driver should defer this callback by, for example, scheduling a
tasklet to perform the callback.
The low-level driver is responsible for ensuring that multiple
completion event handlers for the same CQ are not called
simultaneously. The driver must guarantee that only one CQ event
handler for a given CQ is running at a time. In other words, the
following situation is not allowed:
CPU1 CPU2
low-level driver ->
consumer CQ event callback:
/* ... */
ib_req_notify_cq(cq, ...);
low-level driver ->
/* ... */ consumer CQ event callback:
/* ... */
return from CQ event handler
The context in which completion event and asynchronous event
callbacks run is not defined. Depending on the low-level driver, it
may be process context, softirq context, or interrupt context.
Upper level protocol consumers may not sleep in a callback.
Hot-plug
A low-level driver announces that a device is ready for use by
consumers when it calls ib_register_device(), all initialization
must be complete before this call. The device must remain usable
until the driver's call to ib_unregister_device() has returned.
A low-level driver must call ib_register_device() and
ib_unregister_device() from process context. It must not hold any
semaphores that could cause deadlock if a consumer calls back into
the driver across these calls.
An upper level protocol consumer may begin using an IB device as
soon as the add method of its struct ib_client is called for that
device. A consumer must finish all cleanup and free all resources
relating to a device before returning from the remove method.
A consumer is permitted to sleep in its add and remove methods.

View file

@ -0,0 +1,105 @@
IP OVER INFINIBAND
The ib_ipoib driver is an implementation of the IP over InfiniBand
protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
working group. It is a "native" implementation in the sense of
setting the interface type to ARPHRD_INFINIBAND and the hardware
address length to 20 (earlier proprietary implementations
masqueraded to the kernel as ethernet interfaces).
Partitions and P_Keys
When the IPoIB driver is loaded, it creates one interface for each
port using the P_Key at index 0. To create an interface with a
different P_Key, write the desired P_Key into the main interface's
/sys/class/net/<intf name>/create_child file. For example:
echo 0x8001 > /sys/class/net/ib0/create_child
This will create an interface named ib0.8001 with P_Key 0x8001. To
remove a subinterface, use the "delete_child" file:
echo 0x8001 > /sys/class/net/ib0/delete_child
The P_Key for any interface is given by the "pkey" file, and the
main interface for a subinterface is in "parent."
Child interface create/delete can also be done using IPoIB's
rtnl_link_ops, where childs created using either way behave the same.
Datagram vs Connected modes
The IPoIB driver supports two modes of operation: datagram and
connected. The mode is set and read through an interface's
/sys/class/net/<intf name>/mode file.
In datagram mode, the IB UD (Unreliable Datagram) transport is used
and so the interface MTU has is equal to the IB L2 MTU minus the
IPoIB encapsulation header (4 bytes). For example, in a typical IB
fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.
In connected mode, the IB RC (Reliable Connected) transport is used.
Connected mode takes advantage of the connected nature of the IB
transport and allows an MTU up to the maximal IP packet size of 64K,
which reduces the number of IP packets needed for handling large UDP
datagrams, TCP segments, etc and increases the performance for large
messages.
In connected mode, the interface's UD QP is still used for multicast
and communication with peers that don't support connected mode. In
this case, RX emulation of ICMP PMTU packets is used to cause the
networking stack to use the smaller UD MTU for these neighbours.
Stateless offloads
If the IB HW supports IPoIB stateless offloads, IPoIB advertises
TCP/IP checksum and/or Large Send (LSO) offloading capability to the
network stack.
Large Receive (LRO) offloading is also implemented and may be turned
on/off using ethtool calls. Currently LRO is supported only for
checksum offload capable devices.
Stateless offloads are supported only in datagram mode.
Interrupt moderation
If the underlying IB device supports CQ event moderation, one can
use ethtool to set interrupt mitigation parameters and thus reduce
the overhead incurred by handling interrupts. The main code path of
IPoIB doesn't use events for TX completion signaling so only RX
moderation is supported.
Debugging Information
By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
to 'y', tracing messages are compiled into the driver. They are
turned on by setting the module parameters debug_level and
mcast_debug_level to 1. These parameters can be controlled at
runtime through files in /sys/module/ib_ipoib/.
CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
virtual filesystem. By mounting this filesystem, for example with
mount -t debugfs none /sys/kernel/debug
it is possible to get statistics about multicast groups from the
files /sys/kernel/debug/ipoib/ib0_mcg and so on.
The performance impact of this option is negligible, so it
is safe to enable this option with debug_level set to 0 for normal
operation.
CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in
the data path when data_debug_level is set to 1. However, even with
the output disabled, enabling this configuration option will affect
performance, because it adds tests to the fast path.
References
Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
http://ietf.org/rfc/rfc4391.txt
IP over InfiniBand (IPoIB) Architecture (RFC 4392)
http://ietf.org/rfc/rfc4392.txt
IP over InfiniBand: Connected Mode (RFC 4755)
http://ietf.org/rfc/rfc4755.txt

View file

@ -0,0 +1,66 @@
SYSFS FILES
For each InfiniBand device, the InfiniBand drivers create the
following files under /sys/class/infiniband/<device name>:
node_type - Node type (CA, switch or router)
node_guid - Node GUID
sys_image_guid - System image GUID
In addition, there is a "ports" subdirectory, with one subdirectory
for each port. For example, if mthca0 is a 2-port HCA, there will
be two directories:
/sys/class/infiniband/mthca0/ports/1
/sys/class/infiniband/mthca0/ports/2
(A switch will only have a single "0" subdirectory for switch port
0; no subdirectory is created for normal switch ports)
In each port subdirectory, the following files are created:
cap_mask - Port capability mask
lid - Port LID
lid_mask_count - Port LID mask count
rate - Port data rate (active width * active speed)
sm_lid - Subnet manager LID for port's subnet
sm_sl - Subnet manager SL for port's subnet
state - Port state (DOWN, INIT, ARMED, ACTIVE or ACTIVE_DEFER)
phys_state - Port physical state (Sleep, Polling, LinkUp, etc)
There is also a "counters" subdirectory, with files
VL15_dropped
excessive_buffer_overrun_errors
link_downed
link_error_recovery
local_link_integrity_errors
port_rcv_constraint_errors
port_rcv_data
port_rcv_errors
port_rcv_packets
port_rcv_remote_physical_errors
port_rcv_switch_relay_errors
port_xmit_constraint_errors
port_xmit_data
port_xmit_discards
port_xmit_packets
symbol_error
Each of these files contains the corresponding value from the port's
Performance Management PortCounters attribute, as described in
section 16.1.3.5 of the InfiniBand Architecture Specification.
The "pkeys" and "gids" subdirectories contain one file for each
entry in the port's P_Key or GID table respectively. For example,
ports/1/pkeys/10 contains the value at index 10 in port 1's P_Key
table.
MTHCA
The Mellanox HCA driver also creates the files:
hw_rev - Hardware revision number
fw_ver - Firmware version
hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)",
or "MT25208"

View file

@ -0,0 +1,153 @@
USERSPACE MAD ACCESS
Device files
Each port of each InfiniBand device has a "umad" device and an
"issm" device attached. For example, a two-port HCA will have two
umad devices and two issm devices, while a switch will have one
device of each type (for switch port 0).
Creating MAD agents
A MAD agent can be created by filling in a struct ib_user_mad_reg_req
and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file
descriptor for the appropriate device file. If the registration
request succeeds, a 32-bit id will be returned in the structure.
For example:
struct ib_user_mad_reg_req req = { /* ... */ };
ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req);
if (!ret)
my_agent = req.id;
else
perror("agent register");
Agents can be unregistered with the IB_USER_MAD_UNREGISTER_AGENT
ioctl. Also, all agents registered through a file descriptor will
be unregistered when the descriptor is closed.
2014 -- a new registration ioctl is now provided which allows additional
fields to be provided during registration.
Users of this registration call are implicitly setting the use of
pkey_index (see below).
Receiving MADs
MADs are received using read(). The receive side now supports
RMPP. The buffer passed to read() must be at least one
struct ib_user_mad + 256 bytes. For example:
If the buffer passed is not large enough to hold the received
MAD (RMPP), the errno is set to ENOSPC and the length of the
buffer needed is set in mad.length.
Example for normal MAD (non RMPP) reads:
struct ib_user_mad *mad;
mad = malloc(sizeof *mad + 256);
ret = read(fd, mad, sizeof *mad + 256);
if (ret != sizeof mad + 256) {
perror("read");
free(mad);
}
Example for RMPP reads:
struct ib_user_mad *mad;
mad = malloc(sizeof *mad + 256);
ret = read(fd, mad, sizeof *mad + 256);
if (ret == -ENOSPC)) {
length = mad.length;
free(mad);
mad = malloc(sizeof *mad + length);
ret = read(fd, mad, sizeof *mad + length);
}
if (ret < 0) {
perror("read");
free(mad);
}
In addition to the actual MAD contents, the other struct ib_user_mad
fields will be filled in with information on the received MAD. For
example, the remote LID will be in mad.lid.
If a send times out, a receive will be generated with mad.status set
to ETIMEDOUT. Otherwise when a MAD has been successfully received,
mad.status will be 0.
poll()/select() may be used to wait until a MAD can be read.
Sending MADs
MADs are sent using write(). The agent ID for sending should be
filled into the id field of the MAD, the destination LID should be
filled into the lid field, and so on. The send side does support
RMPP so arbitrary length MAD can be sent. For example:
struct ib_user_mad *mad;
mad = malloc(sizeof *mad + mad_length);
/* fill in mad->data */
mad->hdr.id = my_agent; /* req.id from agent registration */
mad->hdr.lid = my_dest; /* in network byte order... */
/* etc. */
ret = write(fd, &mad, sizeof *mad + mad_length);
if (ret != sizeof *mad + mad_length)
perror("write");
Transaction IDs
Users of the umad devices can use the lower 32 bits of the
transaction ID field (that is, the least significant half of the
field in network byte order) in MADs being sent to match
request/response pairs. The upper 32 bits are reserved for use by
the kernel and will be overwritten before a MAD is sent.
P_Key Index Handling
The old ib_umad interface did not allow setting the P_Key index for
MADs that are sent and did not provide a way for obtaining the P_Key
index of received MADs. A new layout for struct ib_user_mad_hdr
with a pkey_index member has been defined; however, to preserve binary
compatibility with older applications, this new layout will not be used
unless one of IB_USER_MAD_ENABLE_PKEY or IB_USER_MAD_REGISTER_AGENT2 ioctl's
are called before a file descriptor is used for anything else.
In September 2008, the IB_USER_MAD_ABI_VERSION will be incremented
to 6, the new layout of struct ib_user_mad_hdr will be used by
default, and the IB_USER_MAD_ENABLE_PKEY ioctl will be removed.
Setting IsSM Capability Bit
To set the IsSM capability bit for a port, simply open the
corresponding issm device file. If the IsSM bit is already set,
then the open call will block until the bit is cleared (or return
immediately with errno set to EAGAIN if the O_NONBLOCK flag is
passed to open()). The IsSM bit will be cleared when the issm file
is closed. No read, write or other operations can be performed on
the issm file.
/dev files
To create the appropriate character device files automatically with
udev, a rule like
KERNEL=="umad*", NAME="infiniband/%k"
KERNEL=="issm*", NAME="infiniband/%k"
can be used. This will create device nodes named
/dev/infiniband/umad0
/dev/infiniband/issm0
for the first port, and so on. The InfiniBand device and port
associated with these devices can be determined from the files
/sys/class/infiniband_mad/umad0/ibdev
/sys/class/infiniband_mad/umad0/port
and
/sys/class/infiniband_mad/issm0/ibdev
/sys/class/infiniband_mad/issm0/port

View file

@ -0,0 +1,69 @@
USERSPACE VERBS ACCESS
The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
enables direct userspace access to IB hardware via "verbs," as
described in chapter 11 of the InfiniBand Architecture Specification.
To use the verbs, the libibverbs library, available from
http://www.openfabrics.org/, is required. libibverbs contains a
device-independent API for using the ib_uverbs interface.
libibverbs also requires appropriate device-dependent kernel and
userspace driver for your InfiniBand hardware. For example, to use
a Mellanox HCA, you will need the ib_mthca kernel module and the
libmthca userspace driver be installed.
User-kernel communication
Userspace communicates with the kernel for slow path, resource
management operations via the /dev/infiniband/uverbsN character
devices. Fast path operations are typically performed by writing
directly to hardware registers mmap()ed into userspace, with no
system call or context switch into the kernel.
Commands are sent to the kernel via write()s on these device files.
The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
The structs for commands that require a response from the kernel
contain a 64-bit field used to pass a pointer to an output buffer.
Status is returned to userspace as the return value of the write()
system call.
Resource management
Since creation and destruction of all IB resources is done by
commands passed through a file descriptor, the kernel can keep track
of which resources are attached to a given userspace context. The
ib_uverbs module maintains idr tables that are used to translate
between kernel pointers and opaque userspace handles, so that kernel
pointers are never exposed to userspace and userspace cannot trick
the kernel into following a bogus pointer.
This also allows the kernel to clean up when a process exits and
prevent one process from touching another process's resources.
Memory pinning
Direct userspace I/O requires that memory regions that are potential
I/O targets be kept resident at the same physical address. The
ib_uverbs module manages pinning and unpinning memory regions via
get_user_pages() and put_page() calls. It also accounts for the
amount of memory pinned in the process's locked_vm, and checks that
unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
Pages that are pinned multiple times are counted each time they are
pinned, so the value of locked_vm may be an overestimate of the
number of pages pinned by a process.
/dev files
To create the appropriate character device files automatically with
udev, a rule like
KERNEL=="uverbs*", NAME="infiniband/%k"
can be used. This will create device nodes named
/dev/infiniband/uverbs0
and so on. Since the InfiniBand userspace verbs should be safe for
use by non-privileged processes, it may be useful to add an
appropriate MODE or GROUP to the udev rule.