Storing the immense amounts of data produced in our increasingly digital world is quickly becoming a serious scientific problem. A new project launched by a team of chemists and engineers from Brown University seeks a method to store and manipulate data in a way that has never been done before — by representing data using molecules dissolved in solution. Such a system could have the potential to store billions of terabytes of data in a single flask of liquid.
The project, dubbed "Chemical CPUs: Computational Processing via Ugi Reactions," will be backed by a $4.1 million award from the Defense Advanced Research Projects Agency (DARPA) Molecular Informatics program.
"Collectively, people produce millions of terabytes of data every day, and it's getting harder and harder to store all that data in small devices," said Brenda Rubenstein, an assistant professor of chemistry at Brown and the project's principal investigator. "The aim of this project is to come up with a new form of storage that is many times more compact that what we currently have. One obvious candidate is molecules."
Other research groups have started investigating the possibility of using DNA molecules to store information. After all, DNA naturally carries biological data. But the approach Rubenstein and her colleagues are pursuing is different. They aim to use synthetic molecules, produced in millions of unique combinations, as a means of encoding data, which could be stored in immense quantities in solutions. The data will then be read back out using a high-performance mass spectrometer capable of identifying the molecular combinations.
The approach enables information densities even higher than that of DNA, Rubenstein says, and also has the potential of enabling computation through chemical reactions — actual data processing in solution, which is something that's never been done before.
"For this project, we want to show that we can read and write information, as well as do some very basic calculations, all in solution," Rubenstein said. "Later, we'd like go beyond that and think about how we could hook that up to larger systems."
Jacob Rosenstein, an assistant professor in Brown's School of Engineering and co-principal investigator of the project, says that while the complexity of performing such computations is daunting, the potential computing power is immense.
"We can start to think about ways in which the complexity of molecules in solution might be an advantage for some computations," Rosenstein said. "Fluids are three-dimensional. That dimensionality could potentially be an advantage for things like pattern recognition and search algorithms, which don't always scale well in two-dimensional circuits."
As a proof of concept, the team showed that they could successfully encode and read out a small black and white image comprising 81 pixels. Under the DARPA contract, the team will scale that process up, encoding images from machine learning databases, audio files from a speech database, and weather data from the National Oceanic and Atmospheric Administration. At the end of the first phase of the project, the team aims to have the ability to read and write 100 megabytes of chemical information per day.
There are numerous challenges to meeting those goals. For the 81-bit image, the researchers only needed to synthesize 25 unique molecules. For larger data sets, they'll need many more — perhaps millions — of distinct molecules. The team plans to synthesize their molecules using Ugi reactions, which are often used in pharmaceutical development to merge several components into one molecule. The technique has not been used, however, at the scale that the team is proposing for information storage.
Jason Sello and Eunsuk Kim, both faculty in Brown's Department of Chemistry who have expertise in molecular synthesis, will work with Rosenstein and another engineering faculty member, Sherief Reda, to automate and optimize the strategies to synthesize molecules in those dizzying quantities. In addition to the chemistry development, this will involve writing computer-aided design software to optimize the mapping of digital data into mixtures of chemicals.
Another challenge is efficiently detecting distinct signals from all of those molecules during the read-out process. The DARPA contract will support the purchase of a powerful mass spectrometer able to resolve those signals. Peter Weber, a professor of chemistry with expertise in spectroscopy, and research scientist Joseph Geiser will work with the team to optimize the readout system. A group led by Rosenstein, Reda and engineering professor Chris Rose will develop software tools to decode the original digital data from these mass spectrometer readings.
Rubenstein, a theoretical chemist, will the lead the effort to find the right molecules to use in solution and to develop computational schemes for those molecules. Rose, a communication theorist whose work includes molecular communication, will add his expertise to the theoretical side of the project as well.
In addition to demonstrating a new way to store data, the researchers say that the tools developed for this project could have impact in other domains as well. The high-throughput synthesis, analysis and informatics that will be developed could find use in proteomics and other fields. The research could also be useful in analyzing other complex chemical mixtures and in understanding the molecular signaling that occurs in natural systems.
"There are some really daunting challenges involved here, but there is also immense potential for creating the information storage density we'll need in the future along with other useful technologies," Rubenstein said. "We think we have the right team assembled to make real progress."
-Kevin Stacey