Limit Order Books and High-Frequency Trading in Crypto: An Introduction

Limit Order Books and High-Frequency Trading in Crypto: An Introduction

Written by Sylvanus
Limit Order Books and High-Frequency Trading in Crypto: An Introduction

6 of the best crypto wallets out there

Vulputate adipiscing in lacus dignissim aliquet sit viverra sed etiam risus nascetur libero ornare non scelerisque est eu faucibus est pretium commodo quisque facilisi dolor enim egestas vel gravida condimentum congue ultricies venenatis aliquet sit.

  • Id at nisl nisl in massa ornare tempus purus pretium ullamcorper cursus
  • Arcu ac eu lacus ut porttitor egesta pulvinar litum suspendisse turpis commodo
  • Dignissim hendrerit sit sollicitudin nam iaculis quis ac malesuada pretium in
  • Sed elementum at at ultricies pellentesque scelerisque elit non eleifend

How to choose the right wallet for your cryptos?

Aliquet sit viverra sed etiam risus nascetur libero ornare non scelerisque est eu faucibus est pretium commodo quisque facilisi dolor enim egestas vel gravida condimentum congue ultricies venenatis aliquet sit quisque quis nibh consequat.

Sed elementum at at ultricies pellentesque scelerisque elit non eleifend

How to ensure the wallet you’re choosing is actually secure?

Integer in id netus magnis facilisis pretium aliquet posuere ipsum arcu viverra et id congue risus ullamcorper eu morbi proin tincidunt blandit tellus in interdum mauris vel ipsum et purus urna gravida bibendum dis senectus eu facilisis pellentesque.

What is the difference from an online wallet vs. a cold wallet?

Integer in id netus magnis facilisis pretium aliquet posuere ipsum arcu viverra et id congue risus ullamcorper eu morbi proin tincidunt blandit tellus in interdum mauris vel ipsum et purus urna gravida bibendum dis senectus eu facilisis pellentesque diam et magna parturient sed. Ultricies blandit a urna eu volutpat morbi lacus.

  1. At at tincidunt eget sagittis cursus vel dictum amet tortor id elementum
  2. Mauris aliquet faucibus iaculis dui vitae ullamco
  3. Gravida mi dolor volutpat et vitae lacus habitasse fames at tempus
  4. Tellus turpis ut neque amet arcu nunc interdum pretium eu fermentum
“Sed eu suscipit varius vestibulum consectetur ullamcorper tincidunt sagittis bibendum id at ut ornare”
Please share with us what is your favorite wallet using #DeFiShow

Tellus a ultrices feugiat morbi massa et ut id viverra egestas sed varius scelerisque risus nunc vitae diam consequat aliquam neque. Odio duis eget faucibus posuere egestas suspendisse id ut  tristique cras ullamcorper nulla iaculis condimentum vitae in facilisis id augue sit ipsum faucibus ut eros cras turpis a risus consectetur amet et mi erat sodales non leo.

italic

In the rapidly evolving landscape of cryptocurrency trading, the need for advanced order and execution management systems (OEMS) is becoming mission critical. Coupled with equipping themselves with a strong OEMS and analytical technology, traders must develop a robust understanding of limit order books (LOBs) and high-frequency trading (HFT) to navigate these complex yet high potential markets.

This article delves into the intricacies of LOBs and the significance of data integrity in trading strategies. By examining the intersection of tech and finance, we hope to provide insights that help traders and stakeholders make strong decisions in an era where speed and accuracy are vital. Through an exploration of data cleaning methodologies and a case study, we focus on the foundational elements of sustainable trading practices and offer a glimpse into the future of algorithmic trading in crypto.

Understanding the Limit Order Book (LOB)

The LOB is a fundamental component of modern financial markets, serving as a record of all outstanding limit orders for a security. LOBs are maintained by exchanges to automate order matching and there are multiple price levels. Level 1 (L1) provides the top-most level of the book, Level 2 (L2) provides multiple price levels (usually 10) of the book and Level 3 (L3) provides the full details of the entire order book. In general, orderbook data feeds are updated for three operations: order creation, order modification and order deletion. The trade data, which details the last trade that happened, can also add granularity to the LOB.

In most cases, traders can fetch the data from the WebSocket API provided by each exchange. For example, in the Coinbase Advanced Trade WebSocket API, level 2 order book data can be retrieved  from the “level2” channel and the last trade data from the “market trades” channel.

The Important Role of Data Cleaning

LOB data is high frequency, with updates occurring hundreds of times per minute for liquid securities. The irregular nature of LOB often requires transformation into feature vectors for deep learning, emphasizing the need for cleaning to handle dynamic changes in order volumes. This cleaning process involves preparing raw data for analysis by addressing errors and inconsistencies. Common techniques include:

  • Removing duplicates: Ensuring no order is listed multiple times, which can skew analysis.
  • Handling missing values: Addressing gaps in data, such as missing timestamps or quantities.
  • Ensuring chronological order: Sorting data by time to maintain sequence, critical for time-series analysis.
  • Normalizing data: Standardizing price and quantity formats for consistency.

Cleaning Binance’s Limit Order Book (A Case Study)

To demonstrate how to manage real-world LOB data, we use Binance, one of the largest CEX in the cryptocurrency market, as an example. This case study focuses on processing, cleaning, and analyzing high-frequency quotes and trades data from Binance’s Level 2 WebSocket streams for BTC trading pairs. The objective is to ensure data accuracy and reliability for statistical analysis and trading model development, particularly in algorithmic and high-frequency trading. The process involves collecting raw data, reformatting it into .CSV files, cleaning it to address inconsistencies, and producing deliverables including a cleaned dataset, descriptive statistics, performance logs, and cleaning scripts.

Purpose and Importance

The case study emphasizes the critical role of data integrity in backtesting trading strategies, real-time decision-making, and market microstructure analysis. Raw market data often contains errors like outliers, misaligned timestamps, and erroneous entries, which can skew results if not corrected. By employing advanced cleaning techniques, we can enhance data quality for more accurate statistical insights and a better grasp of market behavior.

Methodology

Data Extraction
Data is downloaded from Binance’s Level 2 WebSocket streams in twenty .GZ files, with each line of the .GZ file representing a JSON document. Three streams are utilized: “@depth10” (10 levels of market depth), “@depth” (general bid/ask updates), and “@trade” (individual trade details). The JSON data is parsed and consolidated into three .CSV files: “depth_list.csv”, “depth10_list.csv”, and “trades_list.csv”.

Data Cleaning
Cleaning follows methods like the procedures outlined in the R’s high frequency library. Trades with zero price and quantity are removed, and volume-weighted average prices are calculated for same-timestamp trades. Quotes with zero values, negative bid-ask spreads, or significant outliers (beyond ten times the rolling median absolute deviation) are excluded. Due to fewer quotes than trades, aligning trades with bid-ask spreads was skipped to avoid over-correction.

Efficiency
To manage large datasets effectively, this project employs parallel processing and vectorized operations during data cleaning, optimizing both speed and resource use. The mclapply function in R enables simultaneous processing across multiple cores, while the Polars library in Python leverages high-performance vectorized computations. Together, these tools significantly streamline data preprocessing, ensuring rapid and efficient handling of high-frequency financial data.

Key Statistics and Insights

The integrity of the data is preserved even when over half of the observations are “cleaned,” as most descriptive statistics—including critical and representative data points—remain the same. Trades removed for having zero price or being outliers will improve accuracy.

Descriptive Statistics for BTC/FDUSD Trades
Pre-Cleaning Post-Cleaning
p q p q
Count6543.0000006543.0000003391.0000003391.000000
Mean60842.402040.02134560841.867320.041185
STD35.0155940.04060934.8118670.087269
Min60773.270.00001060773.987060.000010
25%60814.0250.00225560814.130.003050
50%60841.730.01141060840.190.020000
75%60872.440.02108060872.335580.043060
Max60914.051.2691760914.051.809590

Source: Sylvanus Technologies. Above analysis is based on data from 2024/4/30 16:00 to 16:02.


After cleaning the data, there are some noteworthy observations.

  1. Price and volatility jumps

In the graph below, we can see price jumps are evident in less liquid markets like AVAX/BTC and OP/BTC. These markets, with lower volumes and thinner order books, are prone to significant swings from single or clustered trades and can be extremely challenging to manage in a market making strategy.

Source: Sylvanus Technologies. Above analysis is based on data from 2024/4/30 16:00 to 16:02.
  1. Significant deviation of trade and quote counts

A significant gap between trades and quotes is observed. For example, there are 6,000 trades versus 300 quotes for BTCUSDT in the given timeframe. The gap may stem from Bitcoin’s market microstructure—where trades outnumber quotes in liquid, volatile conditions as rapid executions or split large orders occur under unchanged quotes—or from data collection issues like missed WebSocket events or latency, which capture trades more accurately than quotes. Investigating whether this reflects market behavior or technical flaws is key to ensuring dataset reliability for analysis.

  1. Liquidity differences

A third, noteworthy observation is the varying liquidity levels among stablecoins. BTCFDUSD has become one of the most liquid stablecoins due to strong support on Binance, where a market maker consistently provides liquidity. In contrast, BTCUSDC exhibits greater volatility and erratic trading patterns, reflecting fewer market participants. Historically, BTCFDUSD serves as a substitute for Binance USD, especially after regulatory actions affected Binance USD's availability, enhancing its stability. Conversely, BTCUSDC's limited appeal outside the U.S. has resulted in lower trading volumes on major exchanges like Binance. Thus, liquidity and trading behaviors among stablecoins can vary significantly, impacting strategies and decisions.

Next Steps

The procedure above illustrates how we can fetch and clean LOB data. We can then feed the cleaned dataset into our predictive model. For LOB mid-price prediction, common deep learning models include CNN+LSTM, transformers and its variants. Some mid-price prediction models can set the benchmark price for market making strategy.

It is also important to note that the data cleaning process and choice of model should be consistent with the goal. If detecting outliers is the objective, for example, outliers need to be kept in the dataset instead of removing them. Additionally, while this case study aims to provide an introduction for backtesting with LOB data, when it comes to live trading, the choice of programming language is very important. Compiled language such as Rust or C++, is often preferred in live trading to achieve faster execution speed, so R and Python being interpreted languages may not be the preferred choice.

Decoding High Frequency Trading (HFT)

With an overview of LOB data established, we now explore the realm of high-frequency trading (HFT). HFT is a form of algorithmic trading characterized by high speeds, high turnover rates, and high order-to-trade ratios, leveraging advanced technology for rapid execution. It aims to capture small profit margins from numerous trades, often holding positions for milliseconds to seconds.

In general, HFT requires highly efficient software such that all analysis and execution are performed in a low latency, high frequency environment. There exists a high threshold for fixed cost investment including Field-Programmable Gate Arrays (FPGAs), multi-core processors, high-speed memory, and GPUs for real-time data handling. Having a server as close to the exchange as possible is best for lower latency. HFT is not just about data and prediction accuracy but also the race for computing power and physical resources.

Popular HFT Strategies

Market Making
HFT firms provide liquidity by continuously posting buy and sell orders, profiting from the bid-ask spread. In the crypto world, there are two types of market makers. One is like a market maker in other assets such as equities. They focus on Centralized Exchanges (CEX) and trade on the LOB. The other ones are recognized as a Decentralized Exchange (DEX) such as Uniswap; they are often called an automated market maker (AMM). In contrast to traditional market makers, they provide liquidity to the DEXs through liquidity pools instead of a LOB.

Directional Trading
This involves taking short-term positions based on anticipated price movements, leveraging speed to predict shifts before others. Traders use real-time LOB updates or market signals to predict whether a security’s price will rise or fall. This strategy relies on rapid execution and low latency to act before slower market participants, exploiting fleeting trends or momentum. Its strength lies in its adaptability to volatile conditions, though it carries risks if predictions fail, or market conditions shift unexpectedly.

Arbitrage
The backbone of this strategy is to exploit market inefficiencies of mispricing. In the still highly segmented cryptocurrency market, there are many exchanges. Arbitrage traders can use LOB data to detect mispricing, buying low and selling high across different exchanges.

Summary

Understanding LOBs and HFT is vital to navigating the complexities of the fast-moving crypto market effectively, and analyzing LOB data can help lay the groundwork for crafting informed trading strategies and ensuring data integrity. As the market continues to evolve, leveraging advanced technologies may enhance trading precision and efficiency as traders focus on data quality and advanced analytics to position themselves for ongoing success.

  

Investing in in cryptocurrencies carries significant risks, including volatility, liquidity, and potential loss of capital. This information is for educational purposes only and should not be considered financial advice. Always conduct thorough research and consult with a qualified financial advisor before making investment decisions.


Contributed by Ken Lau – Quantitative Trading Associate

Ken Lau is a Quantitative Developer Associate with a Masters in Quantitative Finance from Rutgers University. With a data-driven approach, he specializes in quantitative trading, position management, and risk management, aiming to optimize profitability with prudent oversight. His dedication to learning keeps him at the forefront of rapidly moving markets.


Suscribe to our weekly newsletter.

Lorem ipsum dolor sit amet consectetur in aenean a in tempor varius amet aliquam suspendisse et.

Thanks for subscribing to our newsletter
Oops! Something went wrong while submitting the form.