The database was available for anyone to access without a password.
Recently on October 16, 2019, a team of two dark web researchers named Bob Diachenko and Vinny Troia discovered a database containing a massive trove of personal records of more than 1.2 billion people.
While they were looking for exposures through BinaryEdge and Shodan, they stumbled upon the server which had an IP address that could be traced to Google Cloud Services. In total, the database was home to over 4 terabytes of data sitting in plain sight for public access.
Found on an exposed Elasticsearch server; the good news is that these records did not host login credentials, social security numbers or payment card details. A look at the details shared by researchers indicates that the data was scraped from social media platforms including Twitter, Facebook, LinkedIn and GitHub, a Git repository hosting service.
Additionally, it contains approximately 50 million phone numbers and 622 million email addresses, both unique without any duplication.
When it comes to the structure of the data found, it appears that four different data sets have been combined with three of them labeled to be originating from a San Francisco based data broker called People Data Labs and one from OxyData.
However, PDL has denied that they own the server with Sean Thorne, the Co-Founder stating that,
“The owner of this server likely used one of our enrichment products, along with a number of other data-enrichment or licensing services”.
On the other hand, OxyData which boasts of having 4 TB of user data which includes 380 million profiles also denied the ownership of the server. Most of the data found with them is of LinkedIn which includes recruiter information.
Nonetheless, despite the denials by both companies, a comparison of the exposed data with their databases shows us that they are identical confirming to an extent that it did at least originate from them. The researchers elaborate specifically for the PDL in their blog post stating,
The data discovered on the open Elasticsearch server was almost a complete match to the data being returned by the People Data Labs API. The only difference being the data returned by the PDL also contained education histories. There was no education information in any of the data downloaded from the server. Everything else was exactly the same, including accounts with multiple email addresses and multiple phone numbers.
Comments
Post a Comment