When you sign for a newsletter, book a hotel, or check out online, you probably take it for granted that if you get your email address wrong three times or change your mind and X off the page, it doesn’t matter. Nothing actually happens until you hit the submit button, right? Well, maybe not. As with so many assumptions about the web, that’s not always the case, according to new research: A surprising number of websites are collecting some or all of your data as you enter it digitally.
Researchers from KU Leuven, Radboud University and Lausanne University crawled and analyzed the top 100,000 websites, looking at scenarios where a user visits a site while in the European Union and visits a site from the United States. They found that 1,844 websites collected an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Many of the sites do not seem to intend to log data, but instead include third-party marketing and analytics services that cause the behavior.
After a special crawl of password leak sites in May 2021, researchers also found 52 websites where third parties, including Russian tech giant Yandex, accidentally collected password data before sending it. The group disclosed its findings to these sites, and all 52 cases have since been resolved.
“If there’s a Submit button on a form, the reasonable expectation is that it’s doing something – that it’s going to send your data when you click on it,” says Gunesh Akar, professor and researcher at Radboud University’s Digital Security Group and one of the study’s leaders . “We were super surprised by these results. We thought maybe we’d find a few hundred websites where your emails are collected before you send, but this far exceeded our expectations.”
The researchers, who will present their findings at the Usenix Security Conference in August, say they were inspired to investigate what they call “leaky forms” by media reports, particularly from Gizmodo, for third parties collecting form data regardless of submission status. They point out that the behavior is basically similar to so-called keyloggers, which are typically malicious programs that log everything the target types. But on a mainstream top 1000 site, users probably wouldn’t expect their information to be logged in a keylogger. And in practice, the researchers saw several variations in behavior. Some sites logged keystroke-by-keystroke data, but many grabbed full submissions from one field as users clicked to the next.
“In some cases, when you click on the next field, they collect the previous one, like you click on the password field and they collect the email, or you just click anywhere and they collect all the information immediately,” says Asuman Senol, privacy and identity researcher at KU Leuven and one of the study’s co-authors “We didn’t expect to find thousands of websites; and in the US the numbers are really high, which is interesting,”