iWARP
iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts,[1] iWARP is not an acronym.[2]
Because iWARP is layered on Internet Engineering Task Force (IETF)-standard congestion-aware protocols such as Transmission Control Protocol (TCP) and Stream Control Transmission Protocol (SCTP), it makes few requirements on the network, and can be successfully deployed in a broad range of environments.
History
[edit]In 2007, the IETF published five Request for Comments (RFCs) that define iWARP:
- RFC 5040 A Remote Direct Memory Access Protocol Specification is layered over Direct Data Placement Protocol (DDP). It defines how RDMA Send, Read, and Write operations are encoded using DDP into headers on the network.
- RFC 5041 Direct Data Placement over Reliable Transports is layered over MPA/TCP or SCTP. It defines how received data can be directly placed into an upper layer protocols receive buffer without intermediate buffers.
- RFC 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security analyzes security issues related to iWARP DDP and RDMAP protocol layers.
- RFC 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation defines an adaptation layer that enables DDP over SCTP.
- RFC 5044 Marker PDU Aligned Framing for TCP Specification defines an adaptation layer that enables preservation of DDP-level protocol record boundaries layered over the TCP reliable connected byte stream.
These RFCs are based on the RDMA Consortium's specifications for RDMA over TCP.[3] The RDMA Consortium's specifications are influenced by earlier RDMA standards, including Virtual Interface Architecture (VIA) and InfiniBand (IB).
Since 2007, the IETF has published three additional RFCs that maintain and extend iWARP:
- RFC 6580 IANA Registries for the Remote Direct Data Placement (RDDP) Protocols published in 2012 defines IANA registries for Remote Direct Data Placement (RDDP) error codes, operation codes, and function codes.
- RFC 6581 Enhanced Remote Direct Memory Access (RDMA) Connection Establishment published in 2011 fixes shortcomings with iWARP connection setup.
- RFC 7306 Remote Direct Memory Access (RDMA) Protocol Extensions published in 2014 extends RFC 5040 with atomic operations and RDMA Write with Immediate Data.
Protocol
[edit]The main component in the iWARP protocol is the Direct Data Placement Protocol (DDP), which permits the actual zero-copy transmission. DDP itself does not perform the transmission; the underlying protocol (TCP or SCTP) does.
However, TCP does not respect message boundaries; it sends data as a sequence of bytes without regard to protocol data units (PDU). In this regard, DDP itself may be better suited for SCTP, and indeed the IETF proposed a standard RDMA over SCTP.[4] To run DDP over TCP requires a tweak known as marker PDU aligned (MPA) framing to guarantee boundaries of messages.
Furthermore, DDP is not intended to be accessed directly. Instead, a separate RDMA protocol (RDMAP) provides the services to read and write data. Therefore, the entire RDMA over TCP specification is really RDMAP over DDP over either MPA/TCP or SCTP. All of these protocols can be implemented in hardware.
Unlike IB, iWARP only has reliable connected communication, as this is the only service that TCP and SCTP provide. The iWARP specification omits other features of IB, such as Send with Immediate Data operations. With RFC 7306, the IETF is working to reduce these omissions.
Implementation
[edit]Because a kernel implementation of the TCP stack can be seen as a bottleneck, the protocol is typically implemented in hardware RDMA network interface controllers (rNICs). As simple data losses are rare in tightly coupled network environments, the error-correction mechanisms of TCP may be performed by software while the more frequently performed communications are handled strictly by logic embedded on the rNIC. Similarly, connections are often established entirely by software and then handed off to the hardware. Furthermore, the handling of iWARP specific protocol details is typically isolated from the TCP implementation, allowing rNICs to be used for both as RDMA offload and TCP offload (in support of traditional sockets based TCP/IP applications). The portion of the hardware implementation used for implementing the TCP protocol is known as the TCP Offload Engine (TOE).
TOE itself does not prevent copying on the reception side, and must be combined with RDMA hardware for zero-copy results. The RDMA / TCP specification is a set of different wire protocols intended to be implemented in hardware (though it seems feasible to emulate it in software for compatibility but without the performance benefits).
Interfaces
[edit]iWARP is a protocol, not an implementation, but defines protocol behavior in terms of the operations that are legal for the protocol, known as Verbs. As such, iWARP does not have any single standard programming interface. However, programming interfaces tend to very closely correspond to the Verbs.
Several programmatic interfaces have been proposed, including OpenFabrics Verbs, Network Direct, uDAPL, kDAPL, IT-API, and RNICPI. Implementations of some of these interfaces are available for different platforms, including Windows and Linux.
Services available
[edit]Networking services implemented over iWARP include those offered in the OpenFabrics Enterprise Distribution (OFED) by the OpenFabrics Alliance for Linux operating systems, and by Microsoft Windows via Network Direct.
- NVMe over Fabrics (NVMEoF)
- iSCSI Extensions for RDMA (iSER)
- Server Message Block Direct (SMB Direct)
- Sockets Direct Protocol (SDP)
- SCSI RDMA Protocol (SRP)
- Network File System over RDMA (NFS over RDMA)
- GPUDirect
Vendors
[edit]Popular vendors of iWarp enabled equipment include:
See also
[edit]References
[edit]- ^ "Understanding iWARP: Delivering Low Latency to Ethernet" (PDF). Intel. 2015-11-24. Retrieved 2018-09-07.
- ^ "RDMA Consortium FAQs".
- ^ "RDMA Consortium". 2009-12-17. Retrieved 2017-08-23.
- ^ Rashti, Mohammad J.; Afsahi, Ahmad (March 2007). "10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G". 2007 IEEE International Parallel and Distributed Processing Symposium. pp. 1–8. doi:10.1109/IPDPS.2007.370480. ISBN 978-1-4244-0909-9. S2CID 2279387.
External links
[edit]- OpenFabrics Alliance at the University of New Hampshire InterOperability Laboratory — Testing on iWARP devices
- Remote Direct Data Placement Charter (IETF)
- MPI-SCTP: Using the Stream Control Transmission Protocol for parallel programs written using the Message Passing Interface Archived 2009-10-02 at the Wayback Machine (2008-09-01)
- SMB2 Remote Direct Memory Access (RDMA) Transport Protocol (2017-06-01)