Sigcomm 2024 Google has revealed technical details of its in-house data transfer tool, called Effingo, and bragged that it uses the project to move an average of 1.2 exabytes every day.
As explained in a paper [PDF] and video to be presented on Thursday at the SIGCOMM 2024 conference in Sydney, bandwidth constraints and the stubbornly steady speed of light mean that not even Google is immune to the need to replicate data so it is located close to where it is processed or served.
Indeed, the paper describes managed data transfer as “an unsung hero of large-scale, globally-distributed systems” because it “reduces the network latency from across-globe hundreds to in-continent dozens of milliseconds.”
The paper also points out that data transfer tools are not hard to find, and asks why a management layer like Effingo is needed.
The answer is that the tools Google could find either optimized for transfer time or handled point-to-point data streams – and weren’t up to the job of handling the 1.2 exabytes Effingo moves on an average day, at 14 terabytes per second.
To shift all those bits, Effingo “balances infrastructure efficiency and users’ needs” and recognizes that “some users and some transfers are more important than the others: eg, disaster recovery for a serving database, compared to migrating data from a cluster with maintenance scheduled a week from now.”
The tool must ensure that jobs get the performance they need, and ensure that if jobs have the same priority they both get the right level of resources rather than battling it out for IOPS, bandwidth, or compute resources.
All while the rest of Google keeps humming along at hyperscale.
Deep in the G-stack
How does Google do that?
The paper explains that Effingo is optimized for the Colossus filesystem Google designed and uses in-house, which is typically deployed across clusters comprising thousands of machines.
Effingo is deployed in each cluster. Its software stack comprises a control plane to manage the lifecycle of a copy, and a data plane that transfers bytes and reports status. “The code and resource consumption are uneven: the data plane uses 99 percent of CPU but is less than seven percent of lines of code,” the paper states.
Each cluster is connected to every other – some within a datacenter on a “low-latency, high-bandwidth CLOS” network. Others rely on WAN connections that use a mix of Google and third-party owned infrastructure.
Whatever the tech underpinning a WAN, a tool called Bandwidth Enforcer (BWe) is present.
BWe is another Google project designed to allocate capacity based on service priority and the value derived by adding extra bandwidth. BWe defines “network service classes” (NSCs), and all traffic flows at Google are “marked with one of the predefined NSCs.”
When a user initiates a data shift, Effingo requests a traffic allocation from BWe, and gets to work shifting data as fast as possible.
That traffic allocation could come from pre-determined quotas that Google defines using metrics for ingress and egress bandwidth per cluster, IOPS, and available Effingo workers – code that handles data movements and which run as “Borg” jobs. A reminder: Borg is the containerized platform that Google spun out as Kubernetes.
Effingo can bid to use quota-defined resources for workloads that need certain network performance. The tool can also rely on best-effort resources for less critical flows.
The paper reveals that quotas “are budgeted upfront (typically, for months) in a central planning system in which Effingo is just one of many resources.”
Best-effort resources “are typically harvested from underused quotas and shared equally.” But if a quota user needs capacity, they can reclaim it “at a very short notice.”
Even with all of that resource allocation work, Effingo has a mean global backlog size of 12 million – usually about eight petabytes. Even on its best day, around two million files are queued. Backlogs spike by around 12 petabytes and nine million files when the service’s top ten users initiate new transfers.
The paper offers plenty of detail on how Effingo allows for parallel operations, handles the 4.2 percent of transfers that end with an error, and sometimes slows down copy jobs to save resources.
It also reveals that Effingo is a work in progress: Google plans to improve integration with resource management systems and CPU usage during the cross-datacenter transfer. Enhancements to the control loop to scale out transfers faster are also on the agenda. ®