Study: Transparency is often lacking in datasets used to train large language models

In order to train more powerful large language models, researchers use vast dataset collections that blend diverse data from thousands of web sources.

But as these datasets are combined and recombined into multiple collections, important information about their origins and restrictions on how they can be used are often lost or confounded in the shuffle.

Not only does this raise legal and ethical concerns, it can also damage a model’s performance. For instance, if a dataset is miscategorized, someone training a machine-learning model for a certain task may end up unwittingly using data that are not designed for that task.

In addition, data from unknown sources could contain biases that cause a model to make unfair predictions when deployed.

To improve data transparency, a team of multidisciplinary researchers from MIT and elsewhere launched a systematic audit of more than 1,800 text datasets on popular hosting sites. They found that more than 70 percent of these datasets omitted some licensing information, while about 50 percent had information that contained errors.

Building off these insights, they developed a user-friendly tool called the Data Provenance Explorer that automatically generates easy-to-read summaries of a dataset’s creators, sources, licenses, and allowable uses.

“These types of tools can help regulators and practitioners make informed decisions about AI deployment, and further the responsible development of AI,” says Alex “Sandy” Pentland, an MIT professor, leader of the Human Dynamics Group in the MIT Media Lab, and co-author of a new open-access paper about the project.

The Data Provenance Explorer could help AI practitioners build more effective models by enabling them to select training datasets that fit their model’s intended purpose. In the long run, this could improve the accuracy of AI models in real-world situations, such as those used to evaluate loan applications or respond to customer queries.

“One of the best ways to understand the capabilities and limitations of an AI model is understanding what data it was trained on. When you have misattribution and confusion about where data came from, you have a serious transparency issue,” says Robert Mahari, a graduate student in the MIT Human Dynamics Group, a JD candidate at Harvard Law School, and co-lead author on the paper.

Mahari and Pentland are joined on the paper by co-lead author Shayne Longpre, a graduate student in the Media Lab; Sara Hooker, who leads the research lab Cohere for AI; as well as others at MIT, the University of California at Irvine, the University of Lille in France, the University of Colorado at Boulder, Olin College, Carnegie Mellon University, Contextual AI, ML Commons, and Tidelift. The research is published today in Nature Machine Intelligence.

Focus on finetuning

Researchers often use a technique called fine-tuning to improve the capabilities of a large language model that will be deployed for a specific task, like question-answering. For finetuning, they carefully build curated datasets designed to boost a model’s performance for this one task.

The MIT researchers focused on these fine-tuning datasets, which are often developed by researchers, academic organizations, or companies and licensed for specific uses.

When crowdsourced platforms aggregate such datasets into larger collections for practitioners to use for fine-tuning, some of that original license information is often left behind.

“These licenses ought to matter, and they should be enforceable,” Mahari says.

For instance, if the licensing terms of a dataset are wrong or missing, someone could spend a great deal of money and time developing a model they might be forced to take down later because some training data contained private information.

“People can end up training models where they don’t even understand the capabilities, concerns, or risk of those models, which ultimately stem from the data,” Longpre adds.

To begin this study, the researchers formally defined data provenance as the combination of a dataset’s sourcing, creating, and licensing heritage, as well as its characteristics. From there, they developed a structured auditing procedure to trace the data provenance of more than 1,800 text dataset collections from popular online repositories.

After finding that more than 70 percent of these datasets contained “unspecified” licenses that omitted much information, the researchers worked backward to fill in the blanks. Through their efforts, they reduced the number of datasets with “unspecified” licenses to around 30 percent.

Their work also revealed that the correct licenses were often more restrictive than those assigned by the repositories.   

In addition, they found that nearly all dataset creators were concentrated in the global north, which could limit a model’s capabilities if it is trained for deployment in a different region. For instance, a Turkish language dataset created predominantly by people in the U.S. and China might not contain any culturally significant aspects, Mahari explains.

“We almost delude ourselves into thinking the datasets are more diverse than they actually are,” he says.

Interestingly, the researchers also saw a dramatic spike in restrictions placed on datasets created in 2023 and 2024, which might be driven by concerns from academics that their datasets could be used for unintended commercial purposes.

A user-friendly tool

To help others obtain this information without the need for a manual audit, the researchers built the Data Provenance Explorer. In addition to sorting and filtering datasets based on certain criteria, the tool allows users to download a data provenance card that provides a succinct, structured overview of dataset characteristics.

“We are hoping this is a step, not just to understand the landscape, but also help people going forward to make more informed choices about what data they are training on,” Mahari says.

In the future, the researchers want to expand their analysis to investigate data provenance for multimodal data, including video and speech. They also want to study how terms of service on websites that serve as data sources are echoed in datasets.

As they expand their research, they are also reaching out to regulators to discuss their findings and the unique copyright implications of fine-tuning data.

“We need data provenance and transparency from the outset, when people are creating and releasing these datasets, to make it easier for others to derive these insights,” Longpre says.

“Many proposed policy interventions assume that we can correctly assign and identify licenses associated with data, and this work first shows that this is not the case, and then significantly improves the provenance information available,” says Stella Biderman, executive director of EleutherAI, who was not involved with this work. “In addition, section 3 contains relevant legal discussion. This is very valuable to machine learning practitioners outside companies large enough to have dedicated legal teams. Many people who want to build AI systems for public good are currently quietly struggling to figure out how to handle data licensing, because the internet is not designed in a way that makes data provenance easy to figure out.”

How MIT’s online resources provide a “highly motivating, even transformative experience”

Charalampos (Haris) Sampalis was well established in his career as a product manager at a telecommunications company in Greece. Yet, as someone who enjoys learning, he was on a mission to acquire more knowledge and develop new skills. That’s how he discovered MIT Open Learning resources.

With a bachelor’s degree in computer science from the University of Crete and a master’s in innovation management and entrepreneurship from Hellenic Open University — the only online/distance learning university in Greece — Sampalis had developed expertise in product management and digital strategy. In 2016, he turned to MITx within MIT Open Learning and found a wealth of knowledge and a community of learners who broadened his horizons.

“I’m a person who likes to be constantly absorbing educational information,” Sampalis says. “I strongly believe that education shouldn’t be under boundaries, or strictly belong to specific periods in our lives. I started with computer science, and it grew from there, following programs on a regular basis that may help me expand my horizons and strengthen my skills.”

Sampalis built his life and career in Athens, which makes MIT Open Learning’s digital resources more valuable. He completed courses in computer science, including 6.00.1x (Introduction to Computer Science and Programming Using Python), 11.155x (Design Thinking for Leading and Learning) and Becoming an Entrepreneur back in 2016 and 2017 through MITx, which offers hundreds of high-quality massive open online courses adapted from the MIT classroom for learners worldwide. Sampalis has also enrolled in Management in Engineering: Strategy and Leadership and Management in Engineering: Accounting and Planning, which are part of the MITx MicroMasters Program in Principles of Manufacturing.

“I really appreciate the fact that an established institution like MIT was offering programs online,” he says. “I work full time and it’s not easy at this period of my life to leave everything behind and move to another continent for education — something I might have done at another time in my life. So, this is a model that allows me to access MIT resources and grow myself as part of a community that shares similar interests and seeks further collaborations, even locally where I live, something that makes the overall experience really unique.” 

In 2022, Sampalis applied for and completed the MIT Innovation Leadership Bootcamp. Part of MIT Open Learning, MIT Bootcamps are intensive and immersive educational programs for the global community of innovators, entrepreneurs, and changemakers. The Innovation Leadership Bootcamp was offered online, and Sampalis jumped at the opportunity. 

“I was in collaborative mode, having daily interactions with a diverse group of individuals scattered around the world, and that took place during an intensive 10-week period of my life that really taught me a lot,” says Sampalis. “Working with a global team was extremely engaging. It was a highly motivating, even transformative experience.”

MITx and MIT Bootcamps are both hands-on and interactive experiences offered by MIT Open Learning, which is exactly what appealed to Sampalis. One of the best parts, he says, is that community and collaborations with those he met through MIT continued even after the boot camp concluded. Participants remain in touch not only with their cohort, but with a broader community of over 1,800 other participants from around the world, and have access to continued coaching and mentorship.

Overall, the community of learners has been a highlight of Sampalis’ MIT Open Learning experience.

“What is so beneficial is not just that I get a certificate from MIT and access to a highly valuable repository of knowledge resources, but the fact that I have been exposed to the full umbrella of what Open Learning has to offer — and I share that with other learners,” he says. “I’m part of MIT now. I continue to learn for myself, and I also try to give back, by supporting Open Learning and sharing my story and resources.”

Students learn theater design through the power of play

As a mechanical engineering and theater double major, senior Alayo Oloko often finds herself at the western end of MIT’s campus in Building W97, where the academic program in theater at MIT is based.

During her time as an actor, designer, and technical crew member in student-driven theater at MIT, Oloko has overseen the chaos of “tech week,” where design decisions and rehearsals come together on a pressure-cooker timeline. She calls theater a team sport: “If you mess something up or you drop the ball, it doesn’t just impact you. It impacts the entire production and the entire end product,” she recounts.

But just like team sports, theater is, at its heart, a kind of play, whether under the limelight, backstage, or in the classroom. “We’re always laughing during rehearsals or technical meetings because you’re always surrounded by a bunch of other creative people. And you’re bouncing ideas off each other as you’re all bonded together by a common goal,” says Oloko.

Designing for theater

In the theater world, a team of designers, makers, and actors often bring a writer’s script to the stage with the help of a director. Traditionally, design responsibilities in theater are taken on by different people — set, sound, lighting, and costume designers form the core of the design team. Just as in a sport, each team member is entrusted with bringing out their best while cooperating with the whole team.

Whether it’s a rendition of Shakespeare’s “Macbeth” or a more contemporary script, each theater designer has an opportunity to contribute something unique: a design informed by their personal experience. “If you feel it personally, an audience will also feel it personally,” says Sara Brown, professional set designer, professor of theater at MIT, and a member of the Morningside Academy for Design (MAD) Faculty Advisory Council.

Theater designers can invoke their personal experiences to create worlds with “friction,” a metaphor for the emotional work of individuals needed to grapple with new ideas presented in an artistic piece. “It is a world that has friction that then the actors have to deal with, or a director has to manage, or an audience has to manage,” explains Brown.

This integration of personal experience in design proves critical for a cultural function of theater — to invite an audience to feel represented or empathize with different perspectives, and furthermore, to reflect the intricacies of real life.

However, digging into one’s personal experience can be challenging for young designers. As with children roughhousing or building sandcastles, play is an opportunity to experiment in a safe environment and build social and emotional skills, yet it is not effortless.

Play in practice — exploring sound

Although professional theater production is notoriously high-stakes in practice, subject to constraints such as strict timelines and budgets, the classroom setting, by contrast, allows students to set aside real-world concerns and better embrace the imaginative and expressive process of play.

“We call them plays for a reason. It's not only sort of a play on words,” says Christian Frederickson, sound designer and technical instructor in music and theater at MIT. “The process of learning it should be fun,” he adds.

As a sound designer, Frederickson creates audio cues and music to accompany a live performance, making decisions on where to place these cues in time, and when it’s better to let silence speak.

“Sound design for theater is not creating or not trying to duplicate reality. It’s looking for ways to help the storytelling in — at least for me — the most direct and elegant way possible, and in our contemporary world there’s a lot of noise. If we try to duplicate that in the theater, we get a mess. So it’s about refining and looking for the most direct way to tell a story or help the audience have an emotional experience,” he says.

The first lesson in Frederickson’s class involves getting to know one’s personal style. In his courses 21T.223 (Sound Design) and 21T.232 (Producing Podcasts), Frederickson introduces students to the fields through a “game” he calls Everything is an Instrument. “The reason I call it a ‘game’ is that I think it’s fun, and I think my students think it’s fun because there are no particular rules,” he says.

In the game, Frederickson and his students take a short recording of a “mundane everyday object” such as a metal water bottle or sheet of paper. After demonstrating the capabilities of Adobe Audition (a digital audio workstation), he lets students loose to manipulate the audio sample and begin finding their own styles.

“If there are 20 students in the class, we get 20 completely different results from the same sample material,” Frederickson says. “I can tell this student makes these really sparse, interesting, textural pieces, and then this person is always trying to turn their sample into something from musical theater.”

Trained as a musician, Frederickson considers his sound designs to have a musical quality, though he may be composing with the sound of helicopters and explosions instead of instruments. By playing the game, students tap into their personal interests and experience to inform their sound designs, influencing the play.

Responding and resonating with design

“[Theater design] is not just asking you to fit yourself to a task. It’s actually asking you to bring yourself to that task,” says Sara Brown. This, to Brown, sets theater design apart from other design philosophies. To unlock one’s personal experience, Brown asks designers to consider “first and foremost, how do you intersect with the material physically, personally?”

Like in Frederickson’s game Everything is an Instrument, Brown introduces her classes to theater design by way of playing with mundane materials. During one of the first in-class exercises for class 21T.220 (Set Design), students in small teams rummage through bins full of scrap paper, fabric, and matboard, prompted by an evocative word to guide their vision and hands.

Set designers work from scripts and references to develop a plan for the overall set — everything from the type of flooring to adding walls and platforms. One traditional method of communicating a set design is to create a physical model. Working with a scale model of W97’s black box theater space, students place their scrap materials into the model; evaluating their designs, these begin to take shape. Brown elaborates: “we start to see that when you make design decisions, you’re making design decisions in response to a reality.”

The unpretentious choice of materials and use of a prompt inspire set design students like rising seniors Verose Agbing and Alayo Oloko to make design choices without hesitation, thwarting the dreaded “blank-page anxiety” caused by overthinking. 

For Oloko, this “quick-and-dirty prototyping” is essential to see if something works. “If it does, that’s great. If it doesn’t, OK, it didn’t take too much time,” she says.

But Brown’s mention of “reality” is not to be confused with “real life.” In fact, Brown encourages students to shed any notions of real-life constraints. Also involved with student theater outside of the classroom, Oloko prompts: “imagine what you could do if you could go crazy and then figure out which parts of that work within it … In your initial design, if you’re limiting yourself by budget, you might overconstrain yourself without even realizing it.”

“My catchphrase in the class became ‘this is not OSHA [Occupational Safety and Health Administration] certified’ because … in the beginning, I was definitely stuck on that notion of being able to stick with real life,” says Agbing. Inspired by modern and experimental theater sets, Agbing recounts gradually letting go of these preconceptions, finding software an even more rewarding and flexible platform for theater design projects.

Set design students learn Vectorworks, an architecture modeling program, in conjunction with Twinmotion, a 3D visualization program, in a modern approach to theater design. “With the software, I was able to create this beautiful blend of … contrasting lighting and being able to manipulate that intensity was really important,” observes Agbing. 

How play connects us

While MIT Theater takes this playful approach to design, it doesn’t mean its objectives are only fun and games. “I don’t think that the stakes are lower in theater by any means,” says Frederickson. As an educator, he sees theater at MIT as a safe setting for students to “explore individual expression” and “develop design skills that you didn’t know that you needed or were going to use.”

As theater aims not to replicate reality, it is a chance to “play pretend” for both designers and audiences to consider difficult ideas at a distance. The immersion into a fictionalized world is an opportunity for audiences to feel represented, entertain new ideas, and cultivate empathy. For theater designers, the process of designing a performance allows for the exploration of multifaceted personal experiences which may be challenging or complex.

Echoing Frederickson’s sentiment, technical instructor and video designer Josh Higgason — who offers courses in Lighting Design (21T.221) and Interactive Design and Projection for Live Performance (21T.320) — finds that with his students, “there’s a lot of learning of how to have empathy, how to have connection, how to foster connection, and how to talk about difficult things when we first start.”

By the end of the term, equipped with the tools to thoughtfully express “big ideas and big emotions,” theater designers and audiences become members of a larger community more able to handle friction and bridge differences. Higgason reflects: “One of [theater’s] many purposes is to try and tell stories of people and individuals. But it also gets to stand in for these bigger, universal stories or these bigger, universal experiences.”

A framework for solving parabolic partial differential equations

Computer graphics and geometry processing research provide the tools needed to simulate physical phenomena like fire and flames, aiding the creation of visual effects in video games and movies as well as the fabrication of complex geometric shapes using tools like 3D printing.

Under the hood, mathematical problems called partial differential equations (PDEs) model these natural processes. Among the many PDEs used in physics and computer graphics, a class called second-order parabolic PDEs explain how phenomena can become smooth over time. The most famous example in this class is the heat equation, which predicts how heat diffuses along a surface or in a volume over time.

Researchers in geometry processing have designed numerous algorithms to solve these problems on curved surfaces, but their methods often apply only to linear problems or to a single PDE. A more general approach by researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) tackles a general class of these potentially nonlinear problems. 

In a paper recently published in the Transactions on Graphics journal and presented at the SIGGRAPH conference, they describe an algorithm that solves different nonlinear parabolic PDEs on triangle meshes by splitting them into three simpler equations that can be solved with techniques graphics researchers already have in their software toolkit. This framework can help better analyze shapes and model complex dynamical processes.

“We provide a recipe: If you want to numerically solve a second-order parabolic PDE, you can follow a set of three steps,” says lead author Leticia Mattos Da Silva SM ’23, an MIT PhD student in electrical engineering and computer science (EECS) and CSAIL affiliate. “For each of the steps in this approach, you’re solving a simpler problem using simpler tools from geometry processing, but at the end, you get a solution to the more challenging second-order parabolic PDE.”

To accomplish this, Da Silva and her coauthors used Strang splitting, a technique that allows geometry processing researchers to break the PDE down into problems they know how to solve efficiently.

First, their algorithm advances a solution forward in time by solving the heat equation (also called the “diffusion equation”), which models how heat from a source spreads over a shape. Picture using a blow torch to warm up a metal plate — this equation describes how heat from that spot would diffuse over it. 
This step can be completed easily with linear algebra.

Now, imagine that the parabolic PDE has additional nonlinear behaviors that are not described by the spread of heat. This is where the second step of the algorithm comes in: it accounts for the nonlinear piece by solving a Hamilton-Jacobi (HJ) equation, a first-order nonlinear PDE. 

While generic HJ equations can be hard to solve, Mattos Da Silva and coauthors prove that their splitting method applied to many important PDEs yields an HJ equation that can be solved via convex optimization algorithms. Convex optimization is a standard tool for which researchers in geometry processing already have efficient and reliable software. In the final step, the algorithm advances a solution forward in time using the heat equation again to advance the more complex second-order parabolic PDE forward in time.

Among other applications, the framework could help simulate fire and flames more efficiently. “There’s a huge pipeline that creates a video with flames being simulated, but at the heart of it is a PDE solver,” says Mattos Da Silva. For these pipelines, an essential step is solving the G-equation, a nonlinear parabolic PDE that models the front propagation of the flame and can be solved using the researchers’ framework.

The team’s algorithm can also solve the diffusion equation in the logarithmic domain, where it becomes nonlinear. Senior author Justin Solomon, associate professor of EECS and leader of the CSAIL Geometric Data Processing Group, previously developed a state-of-the-art technique for optimal transport that requires taking the logarithm of the result of heat diffusion. Mattos Da Silva’s framework provided more reliable computations by doing diffusion directly in the logarithmic domain. This enabled a more stable way to, for example, find a geometric notion of average among distributions on surface meshes like a model of a koala.

Even though their framework focuses on general, nonlinear problems, it can also be used to solve linear PDE. For instance, the method solves the Fokker-Planck equation, where heat diffuses in a linear way, but there are additional terms that drift in the same direction heat is spreading. In a straightforward application, the approach modeled how swirls would evolve over the surface of a triangulated sphere. The result resembles purple-and-brown latte art.

The researchers note that this project is a starting point for tackling the nonlinearity in other PDEs that appear in graphics and geometry processing head-on. For example, they focused on static surfaces but would like to apply their work to moving ones, too. Moreover, their framework solves problems involving a single parabolic PDE, but the team would also like to tackle problems involving coupled parabolic PDE. These types of problems arise in biology and chemistry, where the equation describing the evolution of each agent in a mixture, for example, is linked to the others’ equations.

Mattos Da Silva and Solomon wrote the paper with Oded Stein, assistant professor at the University of Southern California’s Viterbi School of Engineering. Their work was supported, in part, by an MIT Schwarzman College of Computing Fellowship funded by Google, a MathWorks Fellowship, the Swiss National Science Foundation, the U.S. Army Research Office, the U.S. Air Force Office of Scientific Research, the U.S. National Science Foundation, MIT-IBM Watson AI Lab, the Toyota-CSAIL Joint Research Center, Adobe Systems, and Google Research.

Designing better delivery for medical therapies

Early in his undergraduate studies in bioengineering, Sayo Eweje was thinking of a career in medicine. He was inspired by the idea of harnessing medical knowledge to improve patients’ lives, having grown up seeing his father do so as a gastroenterologist. However, his research experiences in college made him appreciate how scientific advancement can lead to paradigm-shifting innovations. What if he could contribute to breakthroughs that improved lives on a much larger scale?

“That idea really captured me, and I realized that we’re only enabled to do that by really delving into the frontiers of science,” he says. In his junior year of college, he decided to aim for a career as a physician-scientist, splitting his time between caring for patients and conducting research. After graduating, he entered the Harvard-MIT MD/PhD program, which is affiliated with both Harvard Medical School and MIT’s Institute for Medical Engineering and Sciences.

Now, Eweje is entering his sixth year in the program, and the fourth year of his PhD studies in medical engineering through the Harvard-MIT Program in Health Sciences and Technology. Throughout his PhD, he has worked in the lab of Elliot Chaikof at Beth Israel Deaconess Medical Center, where his research has focused on the development of protein-based nanoparticle systems for delivering nucleic acid and protein therapies directly to cells inside the body.

Eweje’s interest in this area was sparked shortly after he entered the program: Initial reports describing a promising new gene editing-based treatment for inherited blood disorders were released, highlighting the curative potential of this approach. However, administering this therapy involves removing blood-forming stem cells from patients, editing them, then putting them back in. In order to accommodate the edited cells, recipients undergo heavy chemotherapy, which led to questions surrounding toxicity and scalability.

“The thought that I had, and that many others in the field had, is that if we could deliver these gene-editing therapies inside of the body without having to remove cells, without having to do this chemotherapy, his could be a much more effective and accessible therapy,” Eweje says.

“After thinking about problems like that and understanding that a lot of this ultimately comes down to drug delivery and engineering nanoparticles and delivery vehicles, I realized that’s where I want to spend my time,” he says. “There are so many challenges in treating disease where the bottleneck ultimately comes down to effective delivery.”

Striking disease at the source

A number of diseases are caused by mutations in hematopoietic (blood-forming) stem cells, and Eweje chose Chaikof’s lab in part because the team was looking for ways to deliver RNA and protein therapies directly to those cells in patients. The work has spun off in many interesting directions since then.

“It started there, but it has become a much broader platform-focused project,” Eweje explains. “We’re looking at things ranging from gene editing in the lungs to immunotherapy and thinking about new cancer treatments.”

This January, he published an article in Biomaterials that gave a progress update on the state of research using protein-based nanoparticles to deliver nucleic acid therapies to cells. Historically, scientists have found success with viral vectors for delivering gene-based therapies, but because of those viral origins, there remains the possibility of triggering a patient’s immune system.

“Protein materials, particularly human-derived protein materials, are far less likely to trigger that immune response, which is one major advantage,” he says. “The other thing that we’re actively working towards in the lab is this idea of leveraging programmability and precise structure in recombinant proteins.”

While much work remains to determine whether nonviral, protein-based nanoparticles can used as effectively as those that are virus-derived, or lipid nanoparticles, he’s grateful to have learned valuable lessons during this process.

“I really appreciate the fact that I’ve had an opportunity to learn about what’s out there, better understand the challenges, and carry that knowledge forward,” he says.

Building opportunity for others

Outside the lab and the hospital, Eweje is engaged in education and outreach projects as close as Cambridge and as far as Nigeria, where his family traces their roots. He is a co-founder of the Program of Ragon and IMES in Science and Medicine (PRISM), which hosts weekly programs for high school students in the greater Boston area to learn directly from scientists and clinicians about various topics in STEM.

“I see kids as stem cells,” he says. “They have so much potential to differentiate into so many different things, but you have to put them in a proper environment and give them the exposure required to understand where they can go.”

He’s also a co-managing director of the Critical Healthcare Information Integration Network (CHIIN), a nonprofit that provides medical information to community health workers in rural and underdeveloped areas of Africa. It operates via a chatbot that can respond to queries over SMS text messaging and is therefore able to reach communities without internet access, indirectly assisting thousands of patients.

“Part of it was developing confidence in the users by giving them something to have in their back pocket as a reference tool,” he says.

As his time in the HST program draws to a close, Eweje aims to defend his PhD next year and return to full-time clinical work at Harvard Medical School. Ultimately, he envisions a career at the intersection of clinical medicine and biotech innovation.

He also intends to continue encouraging young people to explore STEM. “Everyone should have the right to explore their fullest potential,” he says.

“I find a lot of gratification in the impact that we can have on someone’s life just by giving them the opportunity to learn about something, which could change the trajectory of what they do,” he adds. “We have not only the pleasure of doing that, but also a little bit of an obligation.”

First AI + Education Summit is an international push for “AI fluency”

This summer, 350 participants came to MIT to dive into a question that is, so far, outpacing answers: How can education still create opportunities for all when digital literacy is no longer enough — a world in which students now need to have AI fluency?

The AI + Education Summit was hosted by the MIT RAISE Initiative (Responsible AI for Social Empowerment and Education) in Cambridge, Massachusetts, with speakers from the App Inventor Foundation, the Mayor’s Office of the City of Boston, the Hong Kong Jockey Club Charities Trust, and more. Highlights included an onsite “Hack the Climate” hackathon, where teams of beginner and experienced MIT App Inventor users had a single day to develop an app for fighting climate change.

In opening remarks, RAISE principal investigators Eric Klopfer, Hal Abelson, and Cynthia Breazeal emphasized what new goals for AI fluency look like. “Education is not just about learning facts,” Klopfer said. “Education is a whole developmental process. And we need to think about how we support teachers in being more effective. Teachers must be part of the AI conversation.” Abelson highlighted the empowerment aspect of computational action, namely its immediate impact, that “what’s different than in the decades of people teaching about computers [is] what kids can do right now.” And Breazeal, director of the RAISE Initiative, touched upon AI-supported learning, including the imperative to use technology like classroom robot companions as something supplementary to what students and teachers can do together, not as a replacement for one another. Or as Breazeal underlined in her talk: “We really want people to understand, in an appropriate way, how AI works and how to design it responsibly. We want to make sure that people have an informed voice of how AI should be integrated into society. And we want to empower all kinds of people around the world to be able to use AI, harness AI, to solve the important problems of their communities.”

The summit featured the invited winners of the Global AI Hackathon. Prizes were awarded for apps in two tracks: climate and sustainability, and health and wellness. Winning projects addressed issues like sign-language-to-audio translation, moving object detection for the vision impaired, empathy practice using interactions with AI characters, and personal health checks using tongue images. Attendees also participated in hands-on demos for MIT App Inventor, a “playground” for the Personal Robots Group’s social robots, and an educator professional development session on responsible AI.

By convening people of so many ages, professional backgrounds, and geographies, organizers were able to foreground a unique mix of ideas for participants to take back home. Conference papers included real-world case studies of implementing AI in school settings, such as extracurricular clubs, considerations for student data security, and large-scale experiments in the United Arab Emirates and India. And plenary speakers tackled funding AI in education, state government’s role in supporting its adoption, and — in the summit’s keynote speech by Microsoft’s principal director of AI and machine learning engineering Francesca Lazzeri — the opportunities and challenges of the use of generative AI in education. Lazzeri discussed the development of tool kits that enact safeguards around principles like fairness, security, and transparency. “I truly believe that learning generative AI is not just about computer science students,” Lazzeri said. “It’s about all of us.”

Trailblazing AI education from MIT

Critical to early AI education has been the Hong Kong Jockey Club Charities Trust, a longtime collaborator that helped MIT deploy computational action and project-based learning years before AI was even a widespread pedagogical challenge. A summit panel discussed the history of its CoolThink project, which brought such learning to grades 4-6 in 32 Hong Kong schools in an initial pilot and then met the ambitious goal of bringing it to over 200 Hong Kong schools. On the panel, CoolThink director Daniel Lai said that the trust, MIT, Education University of Hong Kong, and the City University of Hong Kong did not want to add a burden to teachers and students of another curriculum outside of school. Instead, they wanted “to mainstream it into our educational system so that every child would have equal opportunity to access these skills and knowledge.”

MIT worked as a collaborator from CoolThink’s start in 2016. Professor and App Inventor founder Hal Abelson helped Lai get the project off the ground. Several summit attendees and former MIT research staff members were leaders in the project development. Educational technologist Josh Sheldon directed the MIT team’s work on the CoolThink curriculum and teacher professional development. Karen Lang, then App Inventor’s education and business development manager, was the main curriculum developer for the initial phase of CoolThink, writing the lessons and accompanying tutorials and worksheets for the three levels in the curriculum, with editing assistance from the Hong Kong education team. And Mike Tissenbaum, now a professor at the University of Illinois at Urbana-Champaign, led the development of the project’s research design and theoretical grounding. Among other key tasks, they ran the initial teacher training for the first two cohorts of Hong Kong teachers, consisting of sessions totaling 40 hours with about 40 teachers each.

The ethical demands of today’s AI “funhouse mirror”

Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, delivered the closing keynote. He described the current state of AI as a “funhouse mirror” that “distorts the world around us” and framed it as yet another technology that has presented humans with ethical demands to find its positive, empowering uses that complement our intelligence but also to mitigate its risks. 

“One of the areas I’m most excited about personally,” Huttenlocher said, “is people learning from AI,” with AI discovering solutions that people had not yet come upon on their own. As so much of the summit demonstrated, AI and education is something that must happen in collaboration. “[AI] is not human intellect. This is not human judgment. This is something different.”

Faces of MIT: Jessica Tam

The MIT Office of the Vice President for Finance (VPF) determines the best ways to allocate funds for the goods, resources, and services that support the research, education, and important work performed by students, staff, and faculty at MIT. The attention to detail and organization of VPF’s staff members help community members understand and use Institute financial resources. One of the 170 staff members in VPF who works hard behind the scenes to make life at MIT more effective is Jessica Tam, senior strategic sourcing analyst, travel and hospitality.

Tam has been in the travel and hospitality industry for over 20 years. She worked for hotels for 15 years before arriving at MIT, leaving one side of hospitality for the other. Tam is well-versed in forming and maintaining relationships with vendors, including travel companies and caterers. Those invaluable skills allowed her to comfortably pivot from what she refers to as “being a supplier” to “being a buyer.”

A member of the strategic sourcing and contracts team, Tam is responsible for everything related to travel and hospitality (catering, dining, tents, and events) that involves purchasing. Knowing how to connect with people is a significant part of her job, as she oversees reaching out to suppliers, both potential and preferred, managing requests for proposals (RPFs), negotiating contracts, securing concessions, and ensuring the best value for MIT travelers and event planners. When assisting with travel accommodations, she troubleshoots issues that a traveler may run into. Tam also answers vendor questions and works very closely with Institute Events.

Even though she is constantly meeting and speaking with new people, Tam notes that the hospitality industry is small. When she came to MIT there was a lot to learn, but knowing the major players in the industry helped her to acclimate quickly into the role. With her expertise, Tam was immediately able to help streamline the hotel side of travel. With her knowledge of the industry, she was able to rebalance MIT’s negotiated rates so that they were competitive and in line with what she believed MIT should be paying.

A significant part of Tam’s job is vetting vendors to be included on the list of MIT preferred businesses. For example, when a staff member asks for VPF's list of preferred hotels, it comes with expected price points for each that have already been negotiated by Tam, eliminating the need for that staff member to carry out a selection of source — finding two or three other competitive quotes. Terms and conditions have also already been put in place so that after selecting one of the preferred hotels, it is simple to gain approval in the buy-to-pay process. 

In May 2024, Tam received an Excellence Award for Embracing Diversity, Equity, and Inclusion for a project she began in March 2020 that was put on hold due to the pandemic. The initiative's purpose was to bring diverse catering options for events taking place at MIT. The preferred catering services list in place when Tam started her job was mostly known, big-box caterers. When she resumed work on the project, Tam issued RPFs to small, local, Black- and minority-owned catering businesses. At the project's conclusion, Tam had almost doubled the number of preferred caterers available to the community. In her award nomination, colleagues noted that Tam’s work “fosters inclusivity, contributes to the growth and success of our local economy, and brings new, diverse culinary options to our very global community.” 


Q: What do you like the most about your job?

Tam: I enjoy introducing people to resources at MIT that they did not know existed. Sometimes there is a travel hiccup for a faculty member, and I get them on the next flight. If a catering order does not show up for an event, I check which preferred vendor has availability to come up with bagged lunches on a tight deadline. I'm here to answer questions that make my colleagues’ travel and events as seamless as possible. I want the community to know that I am here to be a resource. It's a little-known fact that the VPF website is a great tool available to the community that has every possible piece of information not just for travel planning and hospitality, but for expense reports, budget planning, and more. 

Q: What do you like the most about the people at MIT?

Tam: I am a member of the strategic sourcing and contracts team, and everyone is so friendly. When we come together on in-office days it feels like a family. Our Vice President of Finance Katie Hammer is approachable and will ask, “How was your weekend? How are your kids?” I can walk to her office and ask a question, which is nice and probably different from other universities where you might hear about your VP but you could never ask them a question directly or say hello.

I also love that at MIT you might not initially know the accomplishments of the person you are working with. I have been talking to Professor Tod Machover, who is a composer, and it turns out that the popular video games “Guitar Hero” and “Rock Band” grew out of Machover’s group at the Media Lab — something that never came up in our work conversations. My first year at MIT I had to reach out to Sir Tim Berners-Lee, who is the inventor of the World Wide Web. You never know who you’re going to meet or talk to.

Q: What advice would you give to a new staff member at MIT?

Tam: Try and meet the people you will work with in person, even if your job is hybrid. This is my first job in higher education, and I had heard that working at a university can feel like you work in a silo. In hospitality I learned that a five- or 10-minute conversation goes a long way, even if it is just to say, “I’m Jessica, I’m in this role, and I look forward to working with you.” When I first started, I found a list of departments and people that I knew I would be working with and visited their offices to introduce myself and have a brief conversation. Meeting in person gives you a good understanding of how people communicate.

President Kornbluth welcomes the Class of 2028

Addressing MIT’s newest students and their families yesterday, President Sally Kornbluth and several alumni faculty offered some tips about how to thrive at the Institute.

“You belong here,” Kornbluth and others assured the audience, while emphasizing the many ways that the 1,102 members of the Class of 2028 are connected and interdependent.

All of us together are responsible for the character of our community,” Kornbluth said.

The President’s Convocation took place under a tent on Kresge lawn, on a warm, sunny morning. Kornbluth introduced several of MIT’s senior leaders — Provost Cynthia Barnhart, Chancellor Melissa Nobles, Vice Chancellor for Undergraduate and Graduate Education Dan Hastings, Vice Chancellor for Student Life Suzy Nelson — and then offered some guidance of her own.

Kornbluth advised students to take advantage of MIT’s “unmissable opportunity,” the Undergraduate Research Opportunities Program. She also encouraged students to try new activities and make time for fun, while also acknowledging that the MIT experience can be intense.

“But if you sometimes get frustrated or feel stuck,” Kornbluth said, “please know: We all do! And you don’t have to go it alone. It’s not always easy asking for help, but as everyone up here today will tell you, sometimes the only way to succeed in facing a big challenge or solving a tough problem is to admit there’s no way you can do it all yourself. You’re surrounded by a community of caring people. Please don’t be shy about asking for guidance or help.”

She urged students to care for each other even when disagreeing or having difficult conversations. “At MIT, the work we do is so important, and so hard, that it’s essential we treat each other with empathy and compassion, that we take care to express our own ideas with clarity and respect, and make room for sharply different points of view,” she said.

“Empathy and respect are central values here,” Kornbluth said. “And frankly, they are also skills — skills that we all have to practice, at every stage of life, because they turn out to be vital to every aspect of our success: as an institution, as a community, and as individual human beings.”

Kornbluth was joined by three MIT faculty who had also been students at the Institute.

Isaiah Smith Andrews PhD ’14, the Charles E. and Susan T. Harris Professor of Economics, described the MIT community’s commitment to making the world a better place through “concrete changes we can see, touch, and measure.”

He urged students to consider what they could do to make a better world, not just through new science and engineering advances, but also by figuring out how to ensure those advances benefit humanity. “You’re all here because you’ve excelled, and I know that you will excel here as well,” he said. “I challenge you to follow the MIT tradition and be more than just excellent: I challenge you to be good.”

Paula Hammond ’84, PhD ’93, Institute Professor and vice provost for faculty, recalled that before arriving at MIT, she was excited to join a community of people pursuing their interests in STEM with “true nerdy exuberance.” But, upon arriving, she was intimidated by some of the experiences of her peers. “I was sure I was an admissions mistake,” she said. However, she found her footing by connecting with other students and learning from them.

“You are all meant to be here. You’re all brilliant in a spectacularly diverse set of ways,” she said. “It’s exactly those differences that make MIT a place of excellence and a true foundry of learning and shared knowledge. Without the many perspectives that each of you are bringing here today we don’t learn about new ways to address old problems or how to adjust our lens to see new problems.”

Physics Professor Aram Harrow ’01, PhD ’05 reflected on how unpredictable an educational journey can be — and urged students to embrace that. Harrow wasn’t aware that his own field, quantum computing, even existed when he began college, but he became hooked after attending a seminar with a friend.

He acknowledged some contradictions within the guidance he gives to students: “You’ll notice that I’m saying sometimes you should be flexible and open to new experiences, and sometimes you should fanatically pursue your dreams. That’s why giving advice is hard,” he joked.

But he urged students to seriously consider studying topics they hadn’t expected to. “You never know what will happen,” he said.

Scientists find neurons that process language on different timescales

Using functional magnetic resonance imaging (fMRI), neuroscientists have identified several regions of the brain that are responsible for processing language. However, discovering the specific functions of neurons in those regions has proven difficult because fMRI, which measures changes in blood flow, doesn’t have high enough resolution to reveal what small populations of neurons are doing.

Now, using a more precise technique that involves recording electrical activity directly from the brain, MIT neuroscientists have identified different clusters of neurons that appear to process different amounts of linguistic context. These “temporal windows” range from just one word up to about six words.

The temporal windows may reflect different functions for each population, the researchers say. Populations with shorter windows may analyze the meanings of individual words, while those with longer windows may interpret more complex meanings created when words are strung together.

“This is the first time we see clear heterogeneity within the language network,” says Evelina Fedorenko, an associate professor of neuroscience at MIT. “Across dozens of fMRI experiments, these brain areas all seem to do the same thing, but it’s a large, distributed network, so there’s got to be some structure there. This is the first clear demonstration that there is structure, but the different neural populations are spatially interleaved so we can’t see these distinctions with fMRI.”

Fedorenko, who is also a member of MIT’s McGovern Institute for Brain Research, is the senior author of the study, which appears today in Nature Human Behavior. MIT postdoc Tamar Regev and Harvard University graduate student Colton Casto are the lead authors of the paper.

Temporal windows

Functional MRI, which has helped scientists learn a great deal about the roles of different parts of the brain, works by measuring changes in blood flow in the brain. These measurements act as a proxy of neural activity during a particular task. However, each “voxel,” or three-dimensional chunk, of an fMRI image represents hundreds of thousands to millions of neurons and sums up activity across about two seconds, so it can’t reveal fine-grained detail about what those neurons are doing.

One way to get more detailed information about neural function is to record electrical activity using electrodes implanted in the brain. These data are hard to come by because this procedure is done only in patients who are already undergoing surgery for a neurological condition such as severe epilepsy.

“It can take a few years to get enough data for a task because these patients are relatively rare, and in a given patient electrodes are implanted in idiosyncratic locations based on clinical needs, so it takes a while to assemble a dataset with sufficient coverage of some target part of the cortex. But these data, of course, are the best kind of data we can get from human brains: You know exactly where you are spatially and you have very fine-grained temporal information,” Fedorenko says.

In a 2016 study, Fedorenko reported using this approach to study the language processing regions of six people. Electrical activity was recorded while the participants read four different types of language stimuli: complete sentences, lists of words, lists of non-words, and “jabberwocky” sentences — sentences that have grammatical structure but are made of nonsense words.

Those data showed that in some neural populations in language processing regions, activity would gradually build up over a period of several words, when the participants were reading sentences. However, this did not happen when they read lists of words, lists of nonwords, of Jabberwocky sentences.

In the new study, Regev and Casto went back to those data and analyzed the temporal response profiles in greater detail. In their original dataset, they had recordings of electrical activity from 177 language-responsive electrodes across the six patients. Conservative estimates suggest that each electrode represents an average of activity from about 200,000 neurons. They also obtained new data from a second set of 16 patients, which included recordings from another 362 language-responsive electrodes.

When the researchers analyzed these data, they found that in some of the neural populations, activity would fluctuate up and down with each word. In others, however, activity would build up over multiple words before falling again, and yet others would show a steady buildup of neural activity over longer spans of words.

By comparing their data with predictions made by a computational model that the researchers designed to process stimuli with different temporal windows, the researchers found that neural populations from language processing areas could be divided into three clusters. These clusters represent temporal windows of either one, four, or six words.

“It really looks like these neural populations integrate information across different timescales along the sentence,” Regev says.

Processing words and meaning

These differences in temporal window size would have been impossible to see using fMRI, the researchers say.

“At the resolution of fMRI, we don’t see much heterogeneity within language-responsive regions. If you localize in individual participants the voxels in their brain that are most responsive to language, you find that their responses to sentences, word lists, jabberwocky sentences and non-word lists are highly similar,” Casto says.

The researchers were also able to determine the anatomical locations where these clusters were found. Neural populations with the shortest temporal window were found predominantly in the posterior temporal lobe, though some were also found in the frontal or anterior temporal lobes. Neural populations from the two other clusters, with longer temporal windows, were spread more evenly throughout the temporal and frontal lobes.

Fedorenko’s lab now plans to study whether these timescales correspond to different functions. One possibility is that the shortest timescale populations may be processing the meanings of a single word, while those with longer timescales interpret the meanings represented by multiple words.

“We already know that in the language network, there is sensitivity to how words go together and to the meanings of individual words,” Regev says. “So that could potentially map to what we’re finding, where the longest timescale is sensitive to things like syntax or relationships between words, and maybe the shortest timescale is more sensitive to features of single words or parts of them.”

The research was funded by the Zuckerman-CHE STEM Leadership Program, the Poitras Center for Psychiatric Disorders Research, the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University, the U.S. National Institutes of Health, an American Epilepsy Society Research and Training Fellowship, the McDonnell Center for Systems Neuroscience, Fondazione Neurone, the McGovern Institute, MIT’s Department of Brain and Cognitive Sciences, and the Simons Center for the Social Brain.

Pursuing the secrets of a stealthy parasite

Toxoplasma gondii, the parasite that causes toxoplasmosis, is believed to infect as much as one-third of the world’s population. Many of those people have no symptoms, but the parasite can remain dormant for years and later reawaken to cause disease in anyone who becomes immunocompromised.

Why this single-celled parasite is so widespread, and what triggers it to reemerge, are questions that intrigue Sebastian Lourido, an associate professor of biology at MIT and member of the Whitehead Institute for Biomedical Research. In his lab, research is unraveling the genetic pathways that help to keep the parasite in a dormant state, and the factors that lead it to burst free from that state.

“One of the missions of my lab to improve our ability to manipulate the parasite genome, and to do that at a scale that allows us to ask questions about the functions of many genes, or even the entire genome, in a variety of contexts,” Lourido says.

There are drugs that can treat the acute symptoms of Toxoplasma infection, which include headache, fever, and inflammation of the heart and lungs. However, once the parasite enters the dormant stage, those drugs don’t affect it. Lourido hopes that his lab’s work will lead to potential new treatments for this stage, as well as drugs that could combat similar parasites such as a tickborne parasite known as Babesia, which is becoming more common in New England.

“There are a lot of people who are affected by these parasites, and parasitology often doesn’t get the attention that it deserves at the highest levels of research. It’s really important to bring the latest scientific advances, the latest tools, and the latest concepts to the field of parasitology,” Lourido says.

A fascination with microbiology

As a child in Cali, Colombia, Lourido was enthralled by what he could see through the microscopes at his mother’s medical genetics lab at the University of Valle del Cauca. His father ran the family’s farm and also worked in government, at one point serving as interim governor of the state.

“From my mom, I was exposed to the ideas of gene expression and the influence of genetics on biology, and I think that really sparked an early interest in understanding biology at a fundamental level,” Lourido says. “On the other hand, my dad was in agriculture, and so there were other influences there around how the environment shapes biology.”

Lourido decided to go to college in the United States, in part because at the time, in the early 2000s, Colombia was experiencing a surge in violence. He was also drawn to the idea of attending a liberal arts college, where he could study both science and art. He ended up going to Tulane University, where he double-majored in fine arts and cell and molecular biology.

As an artist, Lourido focused on printmaking and painting. One area he especially enjoyed was stone lithography, which involves etching images on large blocks of limestone with oil-based inks, treating the images with chemicals, and then transferring the images onto paper using a large press.

“I ended up doing a lot of printmaking, which I think attracted me because it felt like a mode of expression that leveraged different techniques and technical elements,” he says.

At the same time, he worked in a biology lab that studied Daphnia, tiny crustaceans found in fresh water that have helped scientists learn about how organisms can develop new traits in response to changes to their environment. As an undergraduate, he helped develop ways to use viruses to introduce new genes into Daphnia. By the time he graduated from Tulane, Lourido had decided to go into science rather than art.

“I had really fallen in love with lab science as an undergrad. I loved the freedom and the creativity that came from it, the ability to work in teams and to build on ideas, to not have to completely reinvent the entire system, but really be able to develop it over a longer period of time,” he says.

After graduating from college, Lourido spent two years in Germany, working at the Max Planck Institute for Infection Biology. In Arturo Zychlinksy’s lab, Lourido studied two bacteria known as Shigella and Salmonella, which can cause severe illnesses, including diarrhea. His studies there helped to reveal how these bacteria get into cells and how they modify the host cells’ own pathways to help them replicate inside cells.

As a graduate student at Washington University in St. Louis, Lourido worked in several labs focusing on different aspects of microbiology, including virology and bacteriology, but eventually ended up working with David Sibley, a prominent researcher specializing in Toxoplasma.

“I had not thought much about Toxoplasma before going to graduate school,” Lourido recalls. “I was pretty unaware of parasitology in general, despite some undergrad courses, which honestly very superficially treated the subject. What I liked about it was here was a system where we knew so little — organisms that are so different from the textbook models of eukaryotic cells.”

Toxoplasma gondii belongs to a group of parasites known as apicomplexans — a type of protozoans that can cause a variety of diseases. After infecting a human host, Toxoplasma gondii can hide from the immune system for decades, usually in cysts found in the brain or muscles. Lourido found the organism especially intriguing because as a 17-year-old, he had been diagnosed with toxoplasmosis. His only symptom was swollen glands, but doctors found that his blood contained antibodies against Toxoplasma.

“It is really fascinating that in all of these people, about a quarter to a third of the world’s population, the parasite persists. Chances are I still have live parasites somewhere in my body, and if I became immunocompromised, it would become a big problem. They would start replicating in an uncontrolled fashion,” he says.

A transformative approach

One of the challenges in studying Toxoplasma is that the organism’s genetics are very different from those of either bacteria or other eukaryotes such as yeast and mammals. That makes it harder to study parasitic gene functions by mutating or knocking out the genes.

Because of that difficulty, it took Lourido his entire graduate career to study the functions of just a couple of Toxoplasma genes. After finishing his PhD, he started his own lab as a fellow at the Whitehead Institute and began working on ways to study the Toxoplasma genome at a larger scale, using the CRISPR genome-editing technique.

With CRISPR, scientists can systematically knock out every gene in the genome and then study how each missing gene affects parasite function and survival.

“Through the adaptation of CRISPR to Toxoplasma, we’ve been able to survey the entire parasite genome. That has been transformative,” says Lourido, who became a Whitehead member and MIT faculty member in 2017. “Since its original application in 2016, we’ve been able to uncover mechanisms of drug resistance and susceptibility, trace metabolic pathways, and explore many other aspects of parasite biology.”

Using CRISPR-based screens, Lourido’s lab has identified a regulatory gene called BFD1 that appears to drive the expression of genes that the parasite needs for long-term survival within a host. His lab has also revealed many of the molecular steps required for the parasite to shift between active and dormant states.

“We’re actively working to understand how environmental inputs end up guiding the parasite in one direction or another,” Lourido says. “They seem to preferentially go into those chronic stages in certain cells like neurons or muscle cells, and they proliferate more exuberantly in the acute phase when nutrient conditions are appropriate or when there are low levels of immunity in the host.”

Study of disordered rock salts leads to battery breakthrough

For the past decade, disordered rock salt has been studied as a potential breakthrough cathode material for use in lithium-ion batteries and a key to creating low-cost, high-energy storage for everything from cell phones to electric vehicles to renewable energy storage.

A new MIT study is making sure the material fulfills that promise.

Led by Ju Li, the Tokyo Electric Power Company Professor in Nuclear Engineering and professor of materials science and engineering, a team of researchers describe a new class of partially disordered rock salt cathode, integrated with polyanions — dubbed disordered rock salt-polyanionic spinel, or DRXPS — that delivers high energy density at high voltages with significantly improved cycling stability.

“There is typically a trade-off in cathode materials between energy density and cycling stability … and with this work we aim to push the envelope by designing new cathode chemistries,” says Yimeng Huang, a postdoc in the Department of Nuclear Science and Engineering and first author of a paper describing the work published today in Nature Energy. “(This) material family has high energy density and good cycling stability because it integrates two major types of cathode materials, rock salt and polyanionic olivine, so it has the benefits of both.”

Importantly, Li adds, the new material family is primarily composed of manganese, an earth-abundant element that is significantly less expensive than elements like nickel and cobalt, which are typically used in cathodes today.

“Manganese is at least five times less expensive than nickel, and about 30 times less expensive than cobalt,” Li says. “Manganese is also the one of the keys to achieving higher energy densities, so having that material be much more earth-abundant is a tremendous advantage.”

A possible path to renewable energy infrastructure

That advantage will be particularly critical, Li and his co-authors wrote, as the world looks to build the renewable energy infrastructure needed for a low- or no-carbon future.

Batteries are a particularly important part of that picture, not only for their potential to decarbonize transportation with electric cars, buses, and trucks, but also because they will be essential to addressing the intermittency issues of wind and solar power by storing excess energy, then feeding it back into the grid at night or on calm days, when renewable generation drops.

Given the high cost and relative rarity of materials like cobalt and nickel, they wrote, efforts to rapidly scale up electric storage capacity would likely lead to extreme cost spikes and potentially significant materials shortages.

“If we want to have true electrification of energy generation, transportation, and more, we need earth-abundant batteries to store intermittent photovoltaic and wind power,” Li says. “I think this is one of the steps toward that dream.”

That sentiment was shared by Gerbrand Ceder, the Samsung Distinguished Chair in Nanoscience and Nanotechnology Research and a professor of materials science and engineering at the University of California at Berkeley.

“Lithium-ion batteries are a critical part of the clean energy transition,” Ceder says. “Their continued growth and price decrease depends on the development of inexpensive, high-performance cathode materials made from earth-abundant materials, as presented in this work.”

Overcoming obstacles in existing materials

The new study addresses one of the major challenges facing disordered rock salt cathodes — oxygen mobility.

While the materials have long been recognized for offering very high capacity — as much as 350 milliampere-hour per gram — as compared to traditional cathode materials, which typically have capacities of between 190 and 200 milliampere-hour per gram, it is not very stable.

The high capacity is contributed partially by oxygen redox, which is activated when the cathode is charged to high voltages. But when that happens, oxygen becomes mobile, leading to reactions with the electrolyte and degradation of the material, eventually leaving it effectively useless after prolonged cycling.

To overcome those challenges, Huang added another element — phosphorus — that essentially acts like a glue, holding the oxygen in place to mitigate degradation.

“The main innovation here, and the theory behind the design, is that Yimeng added just the right amount of phosphorus, formed so-called polyanions with its neighboring oxygen atoms, into a cation-deficient rock salt structure that can pin them down,” Li explains. “That allows us to basically stop the percolating oxygen transport due to strong covalent bonding between phosphorus and oxygen … meaning we can both utilize the oxygen-contributed capacity, but also have good stability as well.”

That ability to charge batteries to higher voltages, Li says, is crucial because it allows for simpler systems to manage the energy they store.

“You can say the quality of the energy is higher,” he says. “The higher the voltage per cell, then the less you need to connect them in series in the battery pack, and the simpler the battery management system.”

Pointing the way to future studies

While the cathode material described in the study could have a transformative impact on lithium-ion battery technology, there are still several avenues for study going forward.

Among the areas for future study, Huang says, are efforts to explore new ways to fabricate the material, particularly for morphology and scalability considerations.

“Right now, we are using high-energy ball milling for mechanochemical synthesis, and … the resulting morphology is non-uniform and has small average particle size (about 150 nanometers). This method is also not quite scalable,” he says. “We are trying to achieve a more uniform morphology with larger particle sizes using some alternate synthesis methods, which would allow us to increase the volumetric energy density of the material and may allow us to explore some coating methods … which could further improve the battery performance. The future methods, of course, should be industrially scalable.”

In addition, he says, the disordered rock salt material by itself is not a particularly good conductor, so significant amounts of carbon — as much as 20 weight percent of the cathode paste — were added to boost its conductivity. If the team can reduce the carbon content in the electrode without sacrificing performance, there will be higher active material content in a battery, leading to an increased practical energy density.

“In this paper, we just used Super P, a typical conductive carbon consisting of nanospheres, but they’re not very efficient,” Huang says. “We are now exploring using carbon nanotubes, which could reduce the carbon content to just 1 or 2 weight percent, which could allow us to dramatically increase the amount of the active cathode material.”

Aside from decreasing carbon content, making thick electrodes, he adds, is yet another way to increase the practical energy density of the battery. This is another area of research that the team is working on.

“This is only the beginning of DRXPS research, since we only explored a few chemistries within its vast compositional space,” he continues. “We can play around with different ratios of lithium, manganese, phosphorus, and oxygen, and with various combinations of other polyanion-forming elements such as boron, silicon, and sulfur.”

With optimized compositions, more scalable synthesis methods, better morphology that allows for uniform coatings, lower carbon content, and thicker electrodes, he says, the DRXPS cathode family is very promising in applications of electric vehicles and grid storage, and possibly even in consumer electronics, where the volumetric energy density is very important.

This work was supported with funding from the Honda Research Institute USA Inc. and the Molecular Foundry at Lawrence Berkeley National Laboratory, and used resources of the National Synchrotron Light Source II at Brookhaven National Laboratory and the Advanced Photon Source at Argonne National Laboratory. 

Toward a code-breaking quantum computer

The most recent email you sent was likely encrypted using a tried-and-true method that relies on the idea that even the fastest computer would be unable to efficiently break a gigantic number into factors.

Quantum computers, on the other hand, promise to rapidly crack complex cryptographic systems that a classical computer might never be able to unravel. This promise is based on a quantum factoring algorithm proposed in 1994 by Peter Shor, who is now a professor at MIT.

But while researchers have taken great strides in the last 30 years, scientists have yet to build a quantum computer powerful enough to run Shor’s algorithm.

As some researchers work to build larger quantum computers, others have been trying to improve Shor’s algorithm so it could run on a smaller quantum circuit. About a year ago, New York University computer scientist Oded Regev proposed a major theoretical improvement. His algorithm could run faster, but the circuit would require more memory.

Building off those results, MIT researchers have proposed a best-of-both-worlds approach that combines the speed of Regev’s algorithm with the memory-efficiency of Shor’s. This new algorithm is as fast as Regev’s, requires fewer quantum building blocks known as qubits, and has a higher tolerance to quantum noise, which could make it more feasible to implement in practice.

In the long run, this new algorithm could inform the development of novel encryption methods that can withstand the code-breaking power of quantum computers.

“If large-scale quantum computers ever get built, then factoring is toast and we have to find something else to use for cryptography. But how real is this threat? Can we make quantum factoring practical? Our work could potentially bring us one step closer to a practical implementation,” says Vinod Vaikuntanathan, the Ford Foundation Professor of Engineering, a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and senior author of a paper describing the algorithm.

The paper’s lead author is Seyoon Ragavan, a graduate student in the MIT Department of Electrical Engineering and Computer Science. The research will be presented at the 2024 International Cryptology Conference.

Cracking cryptography

To securely transmit messages over the internet, service providers like email clients and messaging apps typically rely on RSA, an encryption scheme invented by MIT researchers Ron Rivest, Adi Shamir, and Leonard Adleman in the 1970s (hence the name “RSA”). The system is based on the idea that factoring a 2,048-bit integer (a number with 617 digits) is too hard for a computer to do in a reasonable amount of time.

That idea was flipped on its head in 1994 when Shor, then working at Bell Labs, introduced an algorithm which proved that a quantum computer could factor quickly enough to break RSA cryptography.

“That was a turning point. But in 1994, nobody knew how to build a large enough quantum computer. And we’re still pretty far from there. Some people wonder if they will ever be built,” says Vaikuntanathan.

It is estimated that a quantum computer would need about 20 million qubits to run Shor’s algorithm. Right now, the largest quantum computers have around 1,100 qubits.

A quantum computer performs computations using quantum circuits, just like a classical computer uses classical circuits. Each quantum circuit is composed of a series of operations known as quantum gates. These quantum gates utilize qubits, which are the smallest building blocks of a quantum computer, to perform calculations.

But quantum gates introduce noise, so having fewer gates would improve a machine’s performance. Researchers have been striving to enhance Shor’s algorithm so it could be run on a smaller circuit with fewer quantum gates.

That is precisely what Regev did with the circuit he proposed a year ago.

“That was big news because it was the first real improvement to Shor’s circuit from 1994,” Vaikuntanathan says.

The quantum circuit Shor proposed has a size proportional to the square of the number being factored. That means if one were to factor a 2,048-bit integer, the circuit would need millions of gates.

Regev’s circuit requires significantly fewer quantum gates, but it needs many more qubits to provide enough memory. This presents a new problem.

“In a sense, some types of qubits are like apples or oranges. If you keep them around, they decay over time. You want to minimize the number of qubits you need to keep around,” explains Vaikuntanathan.

He heard Regev speak about his results at a workshop last August. At the end of his talk, Regev posed a question: Could someone improve his circuit so it needs fewer qubits? Vaikuntanathan and Ragavan took up that question.

Quantum ping-pong

To factor a very large number, a quantum circuit would need to run many times, performing operations that involve computing powers, like 2 to the power of 100.

But computing such large powers is costly and difficult to perform on a quantum computer, since quantum computers can only perform reversible operations. Squaring a number is not a reversible operation, so each time a number is squared, more quantum memory must be added to compute the next square.

The MIT researchers found a clever way to compute exponents using a series of Fibonacci numbers that requires simple multiplication, which is reversible, rather than squaring. Their method needs just two quantum memory units to compute any exponent.

“It is kind of like a ping-pong game, where we start with a number and then bounce back and forth, multiplying between two quantum memory registers,” Vaikuntanathan adds.

They also tackled the challenge of error correction. The circuits proposed by Shor and Regev require every quantum operation to be correct for their algorithm to work, Vaikuntanathan says. But error-free quantum gates would be infeasible on a real machine.

They overcame this problem using a technique to filter out corrupt results and only process the right ones.

The end-result is a circuit that is significantly more memory-efficient. Plus, their error correction technique would make the algorithm more practical to deploy.

“The authors resolve the two most important bottlenecks in the earlier quantum factoring algorithm. Although still not immediately practical, their work brings quantum factoring algorithms closer to reality,” adds Regev.

In the future, the researchers hope to make their algorithm even more efficient and, someday, use it to test factoring on a real quantum circuit.

“The elephant-in-the-room question after this work is: Does it actually bring us closer to breaking RSA cryptography? That is not clear just yet; these improvements currently only kick in when the integers are much larger than 2,048 bits. Can we push this algorithm and make it more feasible than Shor’s even for 2,048-bit integers?” says Ragavan.

This work is funded by an Akamai Presidential Fellowship, the U.S. Defense Advanced Research Projects Agency, the National Science Foundation, the MIT-IBM Watson AI Lab, a Thornton Family Faculty Research Innovation Fellowship, and a Simons Investigator Award.

Uphill battles: Across the country in 75 days

Amulya Aluru ’23, MEng ’24, will head to the University of California at Berkeley for a PhD in molecular and cell biology PhD this fall. Aluru knows her undergraduate 6-7 major and MEng program, where she worked on a computational project in a biology lab, have prepared her for the next step of her academic journey.

“I’m a lot more comfortable with the unknown in terms of research — and also life,” she says. “While I’ve enjoyed what I’ve done so far, I think it’s equally valuable to try and explore new topics. I feel like there’s still a lot more for me to learn in biology.”

Unlike many of her peers, however, Aluru won’t reach the San Francisco Bay Area by car, plane, or train. She will arrive by bike — a journey she began in Washington just a few days after receiving her master’s degree.

Showing that science is accessible

Spokes is an MIT-based nonprofit that each year sends students on a transcontinental bike ride. Aluru worked for months with seven fellow MIT students on logistics and planning. Since setting out, the team has bonded over their love of memes and cycling-themed nicknames: Hank “Handlebar Hank” Stennes, Clelia “Climbing Cleo” Lacarriere, Varsha “Vroom Vroom Varsha” Sandadi, Rebecca “Railtrail Rebecca” Lizarde, JD “JDerailleur Hanger” Hagood, Sophia “Speedy Sophia” Wang, Amulya “Aero Amulya” Aluru, and Jessica “Joyride Jess” Xu. The support minivan, carrying food, luggage, and occasionally injured or sick cyclists, even earned its own nickname: “Chrissy”, short for Chrysler Pacifica.

“I really wanted to do something to challenge myself, but not in a strictly academic sense,” Aluru says of her decision to join the team and bike more than 3,000 miles this summer.

The Spokes team is not biking across the country solely to accomplish such a feat. Throughout their journey, they’ll be offering a variety of science demonstrations, including making concrete with Rice Krispies, demonstrating the physics of sound, using 3D printers, and, in Aluru’s case, extracting DNA from strawberries.   

“We’re going to be in a lot of really different learning environments,” she says. “I hope to demonstrate that science can be accessible, even if you don’t have a lab at your disposal.”

These demonstrations have been held in venues such as a D.C. jaila space camp, and libraries and youth centers across the country; their learning festivals were even featured on a local news channel in Kentucky.

Some derailments

The team was beset with challenges from the first day they started their journey. Aluru’s first day on the road involved driving to every bike shop and REI store in the D.C. metro area to purchase bike computers for navigation because the ones the team had already purchased would only display maps of Europe.

Four days in and four Chrysler Pacificas later — the first was unsafe due to bald tires, the second made a weird sound as they pulled out of the rental lot, and the third’s gas pedal stopped working over 50 miles away from the nearest rental agency — the team was back together again in Waynesboro, Virginia, for the first time since they’d set out.

Since then, they’ve had run-ins with local fauna — including mean dogs and a meaner turtle — attempted to repair a tubeless bike that was not, in fact, tubeless, and slept in Chrissy the minivan after their tents got soaked and blew away.

Although it hasn’t all been smooth riding, the team has made time for fun. They’ve perfected the art of eating a Clif bar while on two wheelsplayed around on monkey bars in Colorado, met up with Stanford Spokes, enjoyed pounds of ice cream, and downed gallons of lattes.

The team prioritized routes on bike trails, rather than highways, as much as possible. Their teaching activities are scheduled between visits to National Parks like Tahoe, Zion, Bryce Canyon, Arches, and touring and hiking places like Breaks Interstate ParkMammoth Cave, and the Collegiate Peaks.

Aluru says she’s excited to see parts of the country she’s never visited before, and experience the terrain under her own power — except for breaks when it’s her turn to drive Chrissy.

Rolling with the ups and downs

Aluru was only a few weeks into her first Undergraduate Research Opportunities Program project in the late professor Angelika Amon’s lab when the Covid-19 pandemic hit, quickly transforming her wet lab project into a computational one. David Waterman, her postdoc mentor in the Amon Lab, was trained as a biologist, not a computational scientist. Luckily, Aluru had just taken two computer science classes.

“I was able to have a big hand in formulating my project and bouncing ideas off of him,” she recalls. “That helped me think about scientific questions, which I was able to apply when I came back to campus and started doing wet lab research again.”

When Aluru returned to campus, she began work in the Page Lab at the Whitehead Institute for Biomedical Research. She continued working there for the rest of her time at MIT, first as an undergraduate student and then as an MEng student.

The Page Lab’s work primarily concerns sex differences and how those differences play out in genetics, development, and disease — and the Department of Electronic Engineering and Computer Science, which oversees the MEng program, allows students to pursue computational projects across disciplines, no matter the department.

For her MEng work, Aluru looked at sex differences in human height, a continuation of a paper that the Page Lab published in 2019. Height is an easily observable human trait and, from previous research, is known to be sex-biased across at least five species. Genes that have sex-biased expression patterns, or expression patterns that are higher or lower in males compared to females, may play a role in establishing or maintaining these sex differences. Through statistical genetics, Aluru replicated the findings of the earlier paper and expanded them using newly published datasets.

“Amulya has had an amazing journey in our department,” says David Page, professor of biology and core member of the Whitehead Institute. “There is simply no stopping her insatiable curiosity and zest for life.”

Working with the lab as a graduate student came with more day-to-day responsibility and independence than when she was an undergrad.

“It was a shift I quite appreciated,” Aluru says. “At times it was challenging, but I think it was a good challenge: learning how to structure my research on my own, while still getting a lot of support from lab members and my PI [principal investigator].”

Gearing up for the future

Since departing MIT, Aluru and the rest of the Spokes team have spent their nights camping, sleeping in churches, and staying with hosts. They enjoyed the longest day of the year in a surprisingly “Brooklyn chic” house, spent a lazy afternoon on a river, and pinky-promised to be in each other’s weddings. The team has also been hosted by, met up with, and run into MIT alums as they’ve crossed the country.

As Aluru looks to the future, she admits she’s not exactly sure what she’ll study — but when she reaches the West Coast, she knows she’s not leaving what she’s built through MIT far behind.

“There’s going to be a small MIT community even there — a lot of my friends are in San Francisco, and a few people I know are also going to be at Berkeley,” she says. “I have formed a community at MIT that I know will support me in all my future endeavors.”

