Trading in 24.2 Billionths of a Second

Schweitzer Scott
8 min readJun 1, 2020

Establishing a world record is one way of saying, “This is the best we can do as humans; now, let’s all try to beat it.” From Baseball, and the Guinness Book to the Olympics and Track and Field, we are continually striving to lift more, jump higher, or move faster. In business, the ability to react and adapt more quickly than your competitor has value. In Electronic Trading, like no other market, they can precisely assign a dollar value to that increase in speed. A trader can execute the same strategy on two systems in parallel, at the same time, in the same market, changing only one variable, for example, the system’s trade execution time, and they will see dramatically different profits. One billionth of a second, a nanosecond, could easily be valued at hundreds or even thousands of dollars depending on the market, order type, volume, pricing, and market conditions. In electronic trading, the Securities Technology Analysis Center (STAC®) validates and publishes the world records and the STAC-T0® benchmark is the gold standard for Tick-to-Trade measurement.

“In electronic trading, being first to the market allows you to be top in the order queue. Time is literally money, and in this competitive environment, every nanosecond counts. 25 Nanosecond data in, data out latency for LDA’s solution means the impact of the framework is virtually eliminated, allowing our ultra-low latency Raptor FPGA pre-trade risk management solution to reach previously unheard-of latencies.” — William Dallyn, Raptor, a product of Fusion Systems

In the previous century, people traded stocks in “pits” on the various floors of stock exchanges around the world. At that time, trading was subjective; there were personal relationships; hand signals were used to exchange messages over these loud pits, which were often overrun with the shouts of various offers. Trades were handwritten on slips of paper. By today’s standards, this model was slow, inefficient, and ripe for abuse. Today computers trade commodities at blindingly fast speeds, with incredible efficiency, and often in millionths of a second, microseconds. This type of computer-based trading is analogous to NASCAR, where cars race for extended periods around a closed track, here the speed record is 216 MPH. There is a whole other class of computer-based traders, ultra-low-latency trading that measures speed in billionths of a second, or nanoseconds. This model of trading is similar to NHRA Drag racing, where the track is a straight ¼ mile, and the speed record is 330 MPH. While both forms of racing use vehicles with four wheels, the engines, tires, and control systems are entirely different. They are both purpose-built, and the same is true for electronic trading versus ultra-low-latency trading.

In 2017, electronic trading systems measured the network delay or latency, also known as tick-to-trade time, at just over a microsecond; this is the NASCAR model mentioned above. These trading systems often use Application Specific Integrated Circuits (ASIC) based Network Interface Cards (NICs) that support bypassing the operating system for trading application traffic and are built using off-the-shelf computer components. Earlier in the decade, Solarflare Communications filed several patents around a new method to reduce network latency, and these changes led to substantial improvements. So much so that several years ago, Solarflare, now a Xilinx company, worked with LDA Technologies and STAC, to develop a new benchmark STAC-T0®.

“The STAC-T0 is modeled after the Chicago Mercantile Exchange and it is the absolute measure of how fast a trading platform can respond to a signal from the market over Ethernet. STAC-T0 measures the time between the transmission of a simulated UDP market data message into the system and receipt of simulated TCP order from the system, without the system performing any trading logic or market-specific protocol handling.” — Peter Lankford, STAC Director

Once the methodology and process for measuring the benchmark were agreed upon and finalized, Solarflare and LDA Technologies went on to set the first world record at 98 nanoseconds, which was in May of 2017. This is the NHRA, National Hot Rod Association, dragster model of trading where every bit of delay has been removed.

Clay Millican holds the Top Fuel Record 330 MPH.

Like many world records, Solarflare didn’t hold it for long. In the fall of 2019, an Exablaze solution, powered by a Xilinx FPGA, and using the patents mentioned above, which they licensed from Solarflare, went on to set a new STAC-T0 record at 44 nanoseconds. Again, Xilinx and LDA Technologies were not willing to stand by quietly. So, this past Winter, they once again teamed up intending to capture the record once again. LDA has computed that using current technology; the theoretical limit is 22 nanoseconds. We’re proud to announce that on June 1th, 2020, STAC has validated our STAC-T0 performance at a world record 24.2 nanoseconds. For perspective, 24.2 nanoseconds is the time it takes a single photon of light to pass two parked cars.

While most of this post is about world-record latency, we should not overlook the critical importance that determinism plays in this market. Having a system that can deliver trades in 24.2 nanoseconds most of the time is valuable, but having one that can do it nearly all the time is where the actual value lies. The collection of events that happen outside of what is predicted, is what we call the “jitter” of the system. Imagine when you use to commute to work, say it was 30 minutes, you built your departure time from home, around that 30 minutes. On days when you encounter weather, all the stoplights are red, or there’s an accident, your commute could be 40 minutes or even an hour, this would mean there is 10 to 30 minutes of jitter in the system. Now suppose you woke up, checked the traffic, and heard there would be a ten-minute delay that morning due to weather, so you’d adjust, or attenuate, your departure time appropriately, leaving 10 minutes earlier. Then as you’re leaving, you fire up Waze on your phone, which might make other subtle last-minute changes to your route to account for other unforeseen factors affecting your commute. At this point, you’ve made two process changes to your normal commute to work, the morning forecast, and Waze, both attenuate the jitter in your commute and ensure that you always arrive to work on time. LDA has essentially done the same on their SBM09P-3, they’ve included two jitter attenuators and a highly accurate clock to increase the determinism of any trading system built this platform. While the more typical NICs used for trading may contain a highly precise clock, but they do not include circuitry for jitter attenuation.

The STAC-T0 report is extensive in the volume of data it produces. It measures how latency varies based on changing the packet size, data rate, and actionable latency versus various queue models. Also, the minimum, mean, 99th percentile, and maximum latencies, as well as the uncertainty. The packet sizes used are for 68-byte frames and 507-byte frames. These are representative size UDP packets typically transmitted from an exchange. The data rate is based on inbound packets per second, and they are measured at three levels low, medium, and high. The reason for all these variations is that the T0 benchmark is designed to model the exchange in a variety of ways to represent a fair approximation of actual data flows. ASIC based NICs versus FPGA based NICs have one significant difference, ASIC based NICs are required to read the entire network packet into a buffer before they can operate on it. FPGA based NICs can process a network packet in four-byte chunks as the packet is arriving into the NIC. This subtle difference in data handling has huge performance implications. When the first round of testing was completed several years ago, the tools initially used the ASIC model for measuring time once the packet arrived. This resulted in FPGA based NICs reporting negative trading latencies as if they were trading into the future because it would submit the trade, once it saw the required signal it was waiting for a few bytes into the arriving packet. So the trade would be issued before the clock measuring the latency was even started. The point is that the STAC-T0 benchmark is designed to show potential customers precisely how things are measured and what they might be able to expect if they select and build the same solution stack as the system under test. Here is a diagram showing how data moves through this platform.

How Data Flows in a 24.2ns Tick-to-Trade System

The complete system under test was purpose-built to achieve this fantastic world record. However, it can easily be reproduced using a server platform and components that are now readily available. When building a world-class trading system, you have to start with a substantial computational foundation, and in this case, its Lenovo’s newly announced SR665 dual-socket AMD server. This server represents the absolute best-in-class system engineering, as it comes from a team that has been designing systems for High-Performance Computing (HPC) for well over two decades. Add to this a pair of AMD® EPYC™ 7742 Scalable (Rome) processors, which offers unmatched I/O performance via 64 cores and 128 computational threads, clocked at up to 3.4 GHz and you have a serious system for trading. To connect with the exchange and set up the necessary TCP sessions, we use a Xilinx XtremeScale x2522–25G-PLUS adapter with Onload. Finally, to execute trades using the absolute lowest latency platform, we installed an LDA Technologies SBM09P-3 FPGA board, which uses a Xilinx’s Virtex UltraScale+ VU9P-3 chip. This LDA board was purpose-built for trading, with two jitter attenuators, a Stratum-3 class clock, and 576MB of ultra-fast SRAM.

So, as we ponder the significance of moving actionable data through a system in 24.2 nanoseconds, it might be best to consider how Peter Lankford summarized the industry in his first STAC-T0 report in 2017:

“Tick-to-trade latency has several components. The speed of the firm’s business logic is crucial, but so is the speed with which the trading platform’s underlying hardware and software can get information to and from the business logic. This is the platform’s I/O latency. Just as innovations in telecommunications are creeping incrementally closer to light speed, innovations in networking and computing are shaving I/O latency down to smaller and smaller fractions of a second. While the communications effort is bounded by the speed of light, the computing effort is limited by the minimum possible scale of transistors and, perhaps ultimately, the divisibility of time itself. Thus, trading technology is simultaneously pushing toward the limits of both of Einstein’s great discoveries: relativity and quantum reality.”

While zero-latency trading will never be possible, we’re excited to see what the future holds as technology continues to push the limits of silicon and even physics itself.

--

--

Schweitzer Scott

Scott is a Technology Evangelist on the product management team at Achronix Semiconductor, focused on DPUs and security. Linkedin: https://bit.ly/2vdK4DY