AI

Storage tech races to keep pace with massive data growth

By Gary Hilson Sep 19, 2024 12:11pm

Exponential data growth isn’t new, and the rapid emergence of artificial intelligence (AI) is only compounding it. But it’s not just the volume of small data that’s creating challenges; it’s high volumes of larger data that are putting increasing pressure on storage systems.

The concept of massive storage isn’t new either, but the amount of data that needs to be kept and accessed quickly and effectively has grown dramatically thanks partly to storage capacity and the many tools and devices that allow for the creation and manipulation of data, whether for personal or business use.

Jeff Janukowicz, research VP at IDC, said the term “massive storage” is evolving in the age of generative artificial intelligence (GenAI). “We're clearly creating a lot of data. We're trying to store a lot of data.” He looks at data storage through the lens of flash storage and SSDs and noted enterprises are still trying to figure out how to capture, store, protect and analyze data as part of their digital transformation efforts.

AI is not only generating huge amounts of data, but also requiring that organizations store more high-quality data for training and inference, Janukowicz said. This data needs to be accessed efficiently and stored for longer periods of time.

Massive storage must account for the fact that data is quite fragmented – it’s spread across many devices, on-premises systems and the cloud, and large data sets must reside close by if they are to be used for AI workloads. “That fragmentation creates challenges but also presents itself with opportunities as well,” Janukowicz said.

He said current projections peg the amount of data created to hit nearly 400 zettabytes by 2028, up from 130 zettabytes in 2023. “We're creating a lot of data, but there's a gap between the data that we're creating and what we're able to store.”

bar chart comparing size of video to other media

Massive storage encompasses the efforts to come with new ways to store more of that data, tap it for AI and machine learning, and use it effectively use it to drive better business decisions, Janukowicz said, which means it’s not just a capacity problem. “AI is only just the tip of the iceberg as people are realizing just how much value is in data.”

AI workloads drive capacity, access speed requirements

On the storage technology front, hard drives remain a cost effective medium to store massive amounts of data while keeping it accessible. Although flash SSDs are much faster, heat assisted magnetic recording (HAMR) enables higher storage capacity and density on hard drives by using a laser to heat and flip magnetic bits. Janukowicz said HAMR is one way of increasing aerial density to drive down the cost while storing more and more data.

On the flash side, there are many conversations around QLC NAND, which is four bits per cell and denser and more cost effective than TLC NAND, but it’s still in the early stages of adoption, he said.

The impact of AI on data volumes is that for it to be useful, there needs to be a massive amount. “The more data you’re able to put into your model, the better the quality the model,” Janukowicz said, noting that ChatGPT 4 is trillions of parameters. If an organization is extracting value out of data, it increases the incentive to store it for long periods of time and mine it to make better informed business decisions.

Compliance is also driving massive storage, and storing that data comes with costs, as well as risk due to security breaches, Janukowicz said.

Ravi Pendekanti, senior Vice President Product Management & Marketing HDD at Western Digital, said regulatory needs are just one of many trends that have led us to an era of massive storage, and are several things from a hardware perspective that have happened that have led to the era of massive storage. The first hard drive in 1956 was only five megabytes, but capacities today have hit 28 terabytes.

But it’s not just the capacity that has led to more data being stored, it’s the many tools that have been created in the last several decades to help with regulatory compliance and track scientific progress, as well as the many pictures and photos that people can create and store, Pendekanti said.

Faster wireless networks and the internet of things (IoT) have also sparked a whole new realm of data collection, he said. “As IoT and edge computing takes off, more data is going to be created at the edge.”

Storing it and preserving is one problem, but if you can’t find it when you need it, it doesn’t help anyone, Pendekanti said.

Duplication drives data accumulation

The massive storage challenge is made even more complex by the need for redundancy – the backup philosophy today is that you need to have three copies. Data is not only being continuously created and stored but stored in multiple ways and locations.

And with data storage having become relatively cheap, there’s not a lot of incentive to get rid of unnecessary data, Pendekanti said. “My wife is happy to take 20 different pictures of the same thing because it doesn't cost her much.”

He said this is an example of how tools are contributing to massive storage – it’s become a lot easier and cheaper to take a picture thanks to digital technology. It’s also generally easier to create information that you’re not likely to look at again.

Massive storage means having a lot of files to store, while the file sizes are getting more massive, which is made possible by capacity increases.

At the low end of the spectrum are text messages, Pendekanti said, followed by audio and images, with video having a much higher weightage when it comes to individual file sizes. This weightage model for file sizes can be see in autonomous vehicles, where there’s a lot of video capturing happening, he said.

But autonomous vehicles are nothing compared the scientific community, where genomics and pharmaceutical research drive even more massive storage volumes with large files, Pendekanti said. These areas are also examples of how databases are evolving from relational to vector, which emphasizes the need to only store massive amounts of data more quickly. “That's the next frontier that we continuously have to pursue as an industry.”

Outside of technology, Pendekanti said there is a good housekeeping mindset that should be applied to data storage so that useless data doesn’t accumulate as it’s continually created. “You want to have some of the data, but you don't need to have a lot of data that is sitting there for no reason.”

data storage Generative AI (GenAI) flash Western Digital Electronics AI