SumoDB in Neo4j Chaining Multiple Graph Algorithms in Snowflake for Advanced Performance Analysis

The intersection of ancient athletic tradition and modern computational science has reached a new milestone as data analysts leverage the combined power of Neo4j Graph Analytics and Snowflake SQL to decode the complexities of professional sumo wrestling. By applying sophisticated graph algorithms to years of tournament data, researchers are moving beyond traditional win-loss records to uncover the hidden hierarchies and non-transitive rivalries that define the sport’s modern era. This analytical approach provides a composite view of the competitive landscape, identifying not just the most successful wrestlers, but those who serve as the structural pillars of the Makuuchi division.

The Evolution of Sports Analytics in the Dohyō
Sumo wrestling, a sport steeped in Shinto ritual and centuries of tradition, has long relied on the "Banzuke"—the official hierarchy of rikishi (wrestlers)—to determine status and matchups. However, the Banzuke is a reactive system, updated after each basho (tournament) based on recent performance. It often fails to capture the underlying momentum or the "prestige" of specific victories. To address this, a new study utilizing the SumoDB dataset has applied graph theory to Makuuchi and Juryou division bouts occurring between January 2021 and November 2025.
The analysis focuses on the top 42 rikishi slated for the 2026 Haru Basho. To ensure statistical significance and avoid data skewing—particularly caused by veteran rikishi like Tamawashi, who holds the record for the most top-division bouts in history—the researchers narrowed the scope to the last five years of competition. Furthermore, only rikishi with 20 or more bouts were included to provide a stable baseline for fighting styles and dominance records.

Quantifying Prestige: The PageRank Methodology
In traditional sports reporting, the primary metric for dominance is the raw win count. In sumo, however, the quality of an opponent is paramount. A win against a Yokozuna (grand champion) or an Ozeki (champion) carries significantly more weight than a victory over a lower-ranked Maegashira. To quantify this, the study employed the PageRank algorithm, originally developed by Google to rank web pages.
In the context of a sumo-bout network, PageRank measures the quality of a wrestler’s victories. Prestige is treated as a fluid commodity that flows through the network: beating a wrestler who has defeated many strong opponents propagates more prestige than accumulating wins against weaker competition. To implement this in Snowflake, the researchers built a directed edge table where each row represents a bout, flowing from the winner to the loser.

The algorithm was configured with a damping factor of 0.85, a standard in graph science. This models an 85% probability that prestige continues to flow through the victory chain, while a 15% "reset" probability ensures that the network does not become a closed loop of prestige among a small elite. By running 20 iterations, the scores converged to reveal a stable ranking of competitive prestige.
The results showed a notable divergence between raw wins and PageRank scores. While top-tier wrestlers like Hoshoryu, Kirishima, and Kotozakura led in both metrics, others like Abi, Wakatakakage, and Takayasu showed high PageRank scores despite lower total bout counts—often due to injury-related absences. Their high rankings suggest that when they do compete, they consistently defeat high-prestige opponents, maintaining their status as elite threats despite lower volume.

Identifying the Structural Pillars via Betweenness Centrality
While PageRank identifies the "best" winners, it does not necessarily identify the wrestlers who are most critical to the division’s competitive structure. For this, the researchers turned to Betweenness Centrality. This algorithm identifies nodes that act as bridges between different clusters in a network.
In the sumo hierarchy, high-betweenness wrestlers are those who sit on the most paths between the elite "Sanyaku" ranks and the mid-tier "Maegashira." These rikishi are the load-bearing pillars of the division; they are the ones who consistently beat the lower ranks to maintain the hierarchy but also pose a constant threat to the elite. If these wrestlers were removed, the competitive chain would fracture, making it harder to distinguish between the different tiers of talent.

The data identified specific rikishi whose losses are as significant as their wins. When an elite wrestler defeats a high-betweenness opponent, it validates their dominance over the entire tier below that opponent. Conversely, when a mid-tier wrestler upsets a high-betweenness pillar, it signals a potential shift in the division’s power dynamics.
The Chaos Score: Mapping Non-Transitive Rivalries
One of the most intriguing findings of the graph analysis is the presence of "rock-paper-scissors" cycles, formally known as non-transitive rivalries. In a perfectly linear hierarchy, if Wrestler A beats Wrestler B, and Wrestler B beats Wrestler C, then Wrestler A should beat Wrestler C. However, sumo is famously unpredictable.

By analyzing the "DOMINATES" graph—where a directed edge is created for rikishi who hold a significant net winning record over another—researchers found numerous 3-cycles. For instance, Wrestler A might dominate Wrestler B, who dominates Wrestler C, yet Wrestler C consistently defeats Wrestler A.
To quantify this phenomenon, the researchers developed a "Chaos Score." This metric combines PageRank (prestige), Betweenness Centrality (structure), and 3-cycle counts (unpredictability). The Chaos Score highlights wrestlers who are not only successful but are deeply embedded in the most volatile and competitive segments of the division. These rikishi are the primary drivers of the "chaos" that makes sumo a compelling spectator sport, as their matches are the least likely to follow a predictable trajectory based on rank alone.

Chronology of Data and Analytical Milestones
The research followed a structured timeline to ensure the integrity of the findings:
- January 2021 – November 2025: The primary data collection window, capturing all Makuuchi and Juryou division bouts.
- December 2025: Data cleaning and filtering. The team removed "outlier" data points, such as rikishi with fewer than 20 bouts, and accounted for the "Tamawashi effect" to prevent historical longevity from overshadowing current performance trends.
- January 2026: Implementation of the Neo4j-Snowflake connector. This allowed for the execution of weighted PageRank and Betweenness Centrality algorithms directly on the data stored in Snowflake’s cloud environment.
- February 2026: Development of the Chaos Score and final visualization of the dominance graph.
- March 2026 (Upcoming): The findings are set to be tested against the live results of the 2026 Haru Basho to determine the predictive power of the Chaos Score.
Broader Impact and Industry Implications
The implications of this study extend far beyond the dohyō. The ability to chain multiple graph algorithms within a cloud data warehouse like Snowflake represents a significant leap in sports analytics and organizational data management.

For the sports world, this methodology offers a blueprint for analyzing any head-to-head competitive environment, from tennis and chess to mixed martial arts. By moving beyond simple win-loss ratios, organizations can better understand the "value" of an athlete’s performance, which has direct applications in scouting, contract negotiations, and the development of betting markets.
In the tech sector, the integration of Neo4j’s graph capabilities with Snowflake’s SQL-based architecture solves a long-standing problem: the "siloing" of relational and graph data. Traditionally, analysts had to export data from a warehouse to a specialized graph database to run these types of algorithms. The seamless execution demonstrated in the SumoDB study suggests a future where complex network analysis is a standard feature of business intelligence.

Conclusion: The Complexity of the Dohyō
The study concludes that the "best" wrestlers in the current era of sumo are not necessarily those with the most trophies, but those who maintain high prestige while anchoring the division’s structural integrity. Hoshoryu, for example, emerged as a leader not because he is unbeatable, but because his victories carry immense weight, and his presence in the network connects various competitive tiers.
As the sport moves toward the 2026 Haru Basho, these analytical tools provide fans and officials with a new lens through which to view the matches. The next phase of the research will delve into "kimarite" (winning techniques) to see how specific fighting styles influence the Chaos Score and dominance patterns over decades. For now, the data confirms what sumo enthusiasts have long felt: the sport is a beautiful, chaotic system where prestige and structure are constantly in flux. The dohyō has always rewarded complexity; now, for the first time, science can measure it.






