May 19, 2025

How one leaked Git token can wreck multi-cloud security

Pearson and Internet Archive breaches

Table of Contents

On May 6, Pearson, the U.K.-based education and publishing giant, disclosed that an “unauthorised actor” accessed part of its network and copied large amounts of corporate and customer data—described as “largely legacy” by the company.

Attackers located a GitLab Personal Access Token (PAT) that had slipped into a public .git/config file, cloned the company’s private repos, and pulled out a grab-bag of hard-coded AWS, GCP, Snowflake and Salesforce keys.

From there they pivoted into production, quietly siphoning terabytes of “legacy” customer and financial data for months. The root faults are painfully familiar: Hard-coded secrets living alongside code instead of in a secrets manager.

Long-lived PATs with broad scopes that bypass MFA. Zero real-time visibility into who owns a token, how old it is, or where it’s being used. No DevOps-level threat detection, so a massive repo clone looked just like another build job.

Swap Pearson’s logo with that of the Internet Archive, and the plot barely changes; only the supporting cast -- this time a Zendesk API key exposed since 2018 -- takes the stage. Two organizations, two timelines, one root cause: unguarded secrets traveling through the DevOps pipeline.

The root faults your DevOps pipeline keeps repeating

Pearson’s May 2025 GitLab Token Breach was a déjà vu for DevOps security pros
Attackers uncovered a GitLab Personal Access Token (PAT) in a public .git/config, cloned Pearson’s private repositories, and harvested hard-coded AWS, Google Cloud Platform, Snowflake, and Salesforce secrets.

With those credentials in hand, they jumped from the developer environment into production and quietly siphoned terabytes of so-called “legacy” customer and financial data—undetected for months. Take a look at what went wrong:

Hard-coded secrets embedded in source code rather than stored in a secrets manager.
Long-lived PATs that carry broad, admin-level scopes and effortlessly bypass multi-factor authentication.
Zero real-time visibility into token ownership, age, or usage context.
No DevOps-native threat detection, so a sudden full-repository clone looks identical to a routine CI/CD build job.

Why a single stray token still crushes global brands

Modern software development runs in minutes, not quarters. Every Git push can mint a new identity: a bot, a service account, or a temporary API key. Human nature whispers, “I’ll delete it later,” and the secret stays permanently baked into version control. Over time that credential becomes a zombie: nobody claims it, nobody rotates it, yet it still grants production-level access. When an attacker finally stumbles across the forgotten token, it behaves exactly as it did on day one, complete with admin scopes and no MFA.

What went wrong at Pearson and the Internet Archive

Investigators trace the Pearson breach back to a single exposed GitLab PAT visible to anyone who viewed .git/config. That token unlocked read access to private repos; inside those repos lurked plain-text cloud keys for AWS, GCP, Snowflake, and Salesforce.

With multi-cloud doors flung open, attackers pivoted laterally and downloaded customer records, financial statements, support logs, and proprietary code: all without triggering alarms. One neglected secret toppled an enterprise-scale, multi-cloud stack.

The Internet Archive received warnings about its exposed GitLab token weeks before news of the breach became public. Yet the key remained active, retaining full Zendesk agent permissions dating back to 2018.

Attackers could read, and even reply to, user tickets from the official support address. This episode drives home a second truth: disclosure is meaningless without enforced, rapid token rotation and a strict SLA for completion.

The cure lives where the code lives

Traditional Cloud Infrastructure Entitlement Management (CIEM) solutions stop at the cloud console, and classic IAM programs end at HR’s joiner-mover-leaver checklist. Neither toolset monitors what developers commit to GitHub or GitLab every hour. The fix is to embed identity security in the Git workflow itself, make protection continuous, and automate every repetitive task.

Continuous secret discovery must scan every commit, every branch, and every historical blob the moment your security platform connects through OAuth. If a developer accidentally pushes an aws_secret_access_key at 02:00, the system needs to flag it by 02:01. Tokens older than the intern who wrote the code should be quarantined automatically and tied to a blocking ticket until rotation is complete.

Zero-standing-privilege flips the “access by default” model upside down. Instead of long-lived PATs with god-mode scopes, developers request short-lived, repo-scoped tokens for a defined duration. When the window closes, the permission evaporates. Harvest that token tomorrow and you get nothing but digital dust.

Identity Threat Detection and Response (ITDR) for DevOps rounds out the triad. Your stack already baselines login patterns across email, VPN, and Okta; it should also baseline Git-clone volume per user, per repo, and per geography. A sudden full-repo clone from an IP address on another continent is the DevSecOps equivalent of someone wheeling a server out the door. Best practice: kill the token first, investigate second.

How Unosecur’s GitHub Integration closes every gap

Unosecur tackles the problem exactly the way an incident responder would: start where secrets live, tie each key to a human owner, and watch for drift in real time. The integration uses an agentless connection via GitHub OAuth, inventories humans, bots, tokens, and web-hooks within minutes, and benchmarks them against least-privilege policies. If anything breaks those policies—a public .git folder, a non-MFA login, a token born before Kubernetes—Unosecur fires an immediate, actionable alert. Optional playbooks can revoke the token automatically and open a Jira ticket so the developer knows what happened and why.

Because the loop runs continuously, there’s no window for a Pearson-style months-long dwell time or an Internet-Archive-style rotation delay. A bad token is blocked at commit or neutralized minutes after detection, long before attackers can pivot.

Your next steps

First, audit your own repos for public .git/config exposure. If you find even one path, rotate every token inside and treat the event like a live breach. Next, adopt a platform that builds secret scanning, owner mapping, and token rotation into your daily engineering rhythm—rather than a quarterly scramble that always runs late. Implement CI/CD gates that integrate secret scanning into pull-request workflows so issues never merge to main.

Ready for proof? The Unosecur team can surface every key, token, and entitlement hiding in your GitHub or GitLab organisation, usually before your next coffee finishes brewing. Email sales@unosecur.com, connect the integration, and watch your blind spots disappear.

Continuous Git security has to become as routine as unit tests. Securing GitHub tokens, deploying GitHub secret-scanning tools, and adopting DevSecOps practices are steps in protecting source code, CI/CD pipelines, and cloud workloads. Pearson and the Internet Archive show that even revered brands can fall to one forgotten token. Don’t let the next breaking-news banner feature your logo.

‍