lunes, 17 de junio de 2024

Researchers leverage shadows to model 3D scenes, including objects blocked from view

Imagine driving through a tunnel in an autonomous vehicle, but unbeknownst to you, a crash has stopped traffic up ahead. Normally, you’d need to rely on the car in front of you to know you should start braking. But what if your vehicle could see around the car ahead and apply the brakes even sooner?

Researchers from MIT and Meta have developed a computer vision technique that could someday enable an autonomous vehicle to do just that.

They have introduced a method that creates physically accurate, 3D models of an entire scene, including areas blocked from view, using images from a single camera position. Their technique uses shadows to determine what lies in obstructed portions of the scene.

They call their approach PlatoNeRF, based on Plato’s allegory of the cave, a passage from the Greek philosopher’s “Republic” in which prisoners chained in a cave discern the reality of the outside world based on shadows cast on the cave wall.

By combining lidar (light detection and ranging) technology with machine learning, PlatoNeRF can generate more accurate reconstructions of 3D geometry than some existing AI techniques. Additionally, PlatoNeRF is better at smoothly reconstructing scenes where shadows are hard to see, such as those with high ambient light or dark backgrounds.

In addition to improving the safety of autonomous vehicles, PlatoNeRF could make AR/VR headsets more efficient by enabling a user to model the geometry of a room without the need to walk around taking measurements. It could also help warehouse robots find items in cluttered environments faster.

“Our key idea was taking these two things that have been done in different disciplines before and pulling them together — multibounce lidar and machine learning. It turns out that when you bring these two together, that is when you find a lot of new opportunities to explore and get the best of both worlds,” says Tzofi Klinghoffer, an MIT graduate student in media arts and sciences, affiliate of the MIT Media Lab, and lead author of a paper on PlatoNeRF.

Klinghoffer wrote the paper with his advisor, Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; senior author Rakesh Ranjan, a director of AI research at Meta Reality Labs; as well as Siddharth Somasundaram at MIT, and Xiaoyu Xiang, Yuchen Fan, and Christian Richardt at Meta. The research will be presented at the Conference on Computer Vision and Pattern Recognition.

Shedding light on the problem

Reconstructing a full 3D scene from one camera viewpoint is a complex problem.

Some machine-learning approaches employ generative AI models that try to guess what lies in the occluded regions, but these models can hallucinate objects that aren’t really there. Other approaches attempt to infer the shapes of hidden objects using shadows in a color image, but these methods can struggle when shadows are hard to see.

For PlatoNeRF, the MIT researchers built off these approaches using a new sensing modality called single-photon lidar. Lidars map a 3D scene by emitting pulses of light and measuring the time it takes that light to bounce back to the sensor. Because single-photon lidars can detect individual photons, they provide higher-resolution data.

The researchers use a single-photon lidar to illuminate a target point in the scene. Some light bounces off that point and returns directly to the sensor. However, most of the light scatters and bounces off other objects before returning to the sensor. PlatoNeRF relies on these second bounces of light.

By calculating how long it takes light to bounce twice and then return to the lidar sensor, PlatoNeRF captures additional information about the scene, including depth. The second bounce of light also contains information about shadows.

The system traces the secondary rays of light — those that bounce off the target point to other points in the scene — to determine which points lie in shadow (due to an absence of light). Based on the location of these shadows, PlatoNeRF can infer the geometry of hidden objects.

The lidar sequentially illuminates 16 points, capturing multiple images that are used to reconstruct the entire 3D scene.

“Every time we illuminate a point in the scene, we are creating new shadows. Because we have all these different illumination sources, we have a lot of light rays shooting around, so we are carving out the region that is occluded and lies beyond the visible eye,” Klinghoffer says.

A winning combination

Key to PlatoNeRF is the combination of multibounce lidar with a special type of machine-learning model known as a neural radiance field (NeRF). A NeRF encodes the geometry of a scene into the weights of a neural network, which gives the model a strong ability to interpolate, or estimate, novel views of a scene.

This ability to interpolate also leads to highly accurate scene reconstructions when combined with multibounce lidar, Klinghoffer says.

“The biggest challenge was figuring out how to combine these two things. We really had to think about the physics of how light is transporting with multibounce lidar and how to model that with machine learning,” he says.

They compared PlatoNeRF to two common alternative methods, one that only uses lidar and the other that only uses a NeRF with a color image.

They found that their method was able to outperform both techniques, especially when the lidar sensor had lower resolution. This would make their approach more practical to deploy in the real world, where lower resolution sensors are common in commercial devices.

“About 15 years ago, our group invented the first camera to ‘see’ around corners, that works by exploiting multiple bounces of light, or ‘echoes of light.’ Those techniques used special lasers and sensors, and used three bounces of light. Since then, lidar technology has become more mainstream, that led to our research on cameras that can see through fog. This new work uses only two bounces of light, which means the signal to noise ratio is very high, and 3D reconstruction quality is impressive,” Raskar says.

In the future, the researchers want to try tracking more than two bounces of light to see how that could improve scene reconstructions. In addition, they are interested in applying more deep learning techniques and combining PlatoNeRF with color image measurements to capture texture information.

“While camera images of shadows have long been studied as a means to 3D reconstruction, this work revisits the problem in the context of lidar, demonstrating significant improvements in the accuracy of reconstructed hidden geometry. The work shows how clever algorithms can enable extraordinary capabilities when combined with ordinary sensors — including the lidar systems that many of us now carry in our pocket,” says David Lindell, an assistant professor in the Department of Computer Science at the University of Toronto, who was not involved with this work.



de MIT News https://ift.tt/G4LyE5I

Technologies enable 3D imaging of whole human brain hemispheres at subcellular resolution

Observing anything and everything within the human brain, no matter how large or small, while it is fully intact has been an out-of-reach dream of neuroscience for decades. But in a new study in Science, an MIT-based team describes a technology pipeline that enabled them to finely process, richly label, and sharply image full hemispheres of the brains of two donors — one with Alzheimer’s disease and one without — at high resolution and speed.

“We performed holistic imaging of human brain tissues at multiple resolutions, from single synapses to whole brain hemispheres, and we have made that data available,” says senior and corresponding author Kwanghun Chung, associate professor the MIT departments of Chemical Engineering and Brain and Cognitive Sciences and member of The Picower Institute for Learning and Memory and the Institute for Medical Engin­­­­eering and Science. “This technology pipeline really enables us to analyze the human brain at multiple scales. Potentially this pipeline can be used for fully mapping human brains.”

The new study does not present a comprehensive map or atlas of the entire brain, in which every cell, circuit, and protein is identified and analyzed. But with full hemispheric imaging, it demonstrates an integrated suite of three technologies to enable that and other long-sought neuroscience investigations. The research provides a “proof of concept” by showing numerous examples of what the pipeline makes possible, including sweeping landscapes of thousands of neurons within whole brain regions; diverse forests of cells, each in individual detail; and tufts of subcellular structures nestled among extracellular molecules. The researchers also present a rich variety of quantitative analytical comparisons focused on a chosen region within the Alzheimer’s and non-Alzheimer’s hemispheres.

The importance of being able to image whole hemispheres of human brains intact and down to the resolution of individual synapses (the teeny connections that neurons forge to make circuits) is two-fold for understanding the human brain in health and disease, Chung says.

One brain is better than two

On one hand, it will enable scientists to conduct integrated explorations of questions using the same brain, rather than having to (for example) observe different phenomena in different brains, which can vary significantly, and then try to construct a composite picture of the whole system. A key feature of the new technology pipeline is that analysis doesn’t degrade the tissue. On the contrary, it makes the tissues extremely durable and repeatedly re-labelable to highlight different cells or molecules as needed for new studies for potentially years on end. In the paper, Chung’s team demonstrates using 20 different antibody labels to highlight different cells and proteins, but they are already expanding that to a hundred or more.

“We need to be able to see all these different functional components — cells, their morphology and their connectivity, subcellular architectures, and their individual synaptic connections — ideally within the same brain, considering the high individual variabilities in the human brain and considering the precious nature of human brain samples,” Chung says. “This technology pipeline really enables us to extract all these important features from the same brain in a fully integrated manner.”

On the other hand, the pipeline’s relatively high scalability and throughput (imaging a whole brain hemisphere once it is prepared takes 100 hours, rather than many months) means that it is possible to create many samples to represent different sexes, ages, disease states, and other factors that can enable robust comparisons with increased statistical power. Chung says he envisions creating a brain bank of fully imaged brains that researchers could analyze and re-label as needed for new studies to make more of the kinds of comparisons he and co-authors made with the Alzheimer’s and non-Alzheimer’s hemispheres in the new paper.

Three key innovations

Chung says the biggest challenge he faced in achieving the advances described in the paper was building a team at MIT that included three especially talented young scientists, each a co-lead author of the paper because of their key roles in producing the three major innovations. Ji Wang, a mechanical engineer and former postdoc, developed the “Megatome,” a device for slicing intact human brain hemispheres so finely that there is no damage to them. Juhyuk Park, a materials engineer and former postdoc, developed the chemistry that makes each brain slice clear, flexible, durable, expandable, and quickly, evenly, and repeatedly labelable — a technology called “mELAST.” Webster Guan, a former MIT chemical engineering graduate student with a knack for software development, created a computational system called “UNSLICE” that can seamlessly reunify the slabs to reconstruct each hemisphere in full 3D, down to the precise alignment of individual blood vessels and neural axons (the long strands they extend to forge connections with other neurons).

No technology allows for imaging whole human brain anatomy at subcellular resolution without first slicing it, because it is very thick (it’s 3,000 times the volume of a mouse brain) and opaque. But in the Megatome, tissue remains undamaged because Wang, who is now at a company Chung founded called LifeCanvas Technologies, engineered its blade to vibrate side-to-side faster, and yet sweep wider, than previous vibratome slicers. Meanwhile she also crafted the instrument to stay perfectly within its plane, Chung says. The result are slices that don’t lose anatomical information at their separation or anywhere else. And because the vibratome cuts relatively quickly and can cut thicker (and therefore fewer) slabs of tissue, a whole hemisphere can be sliced in a day, rather than months.

A major reason why slabs in the pipeline can be thicker comes from mELAST. Park engineered the hydrogel that infuses the brain sample to make it optically clear, virtually indestructible, and compressible and expandable. Combined with other chemical engineering technologies developed in recent years in Chung’s lab, the samples can then be evenly and quickly infused with the antibody labels that highlight cells and proteins of interest. Using a light sheet microscope the lab customized, a whole hemisphere can be imaged down to individual synapses in about 100 hours, the authors report in the study. Park is now an assistant professor at Seoul National University in South Korea.

“This advanced polymeric network, which fine-tunes the physicochemical properties of tissues, enabled multiplexed multiscale imaging of the intact human brains,” Park says.

After each slab has been imaged, the task is then to restore an intact picture of the whole hemisphere computationally. Guan’s UNSLICE does this at multiple scales. For instance, at the middle, or “meso” scale, it algorithmically traces blood vessels coming into one layer from adjacent layers and matches them. But it also takes an even finer approach. To further register the slabs, the team purposely labeled neighboring neural axons in different colors (like the wires in an electrical fixture). That enabled UNSLICE to match layers up based on tracing the axons, Chung says. Guan is also now at LifeCanvas.

In the study, the researchers present a litany of examples of what the pipeline can do. The very first figure demonstrates that the imaging allows one to richly label a whole hemisphere and then zoom in from the wide scale of brainwide structures to the level of circuits, then individual cells, and then subcellular components, such as synapses. Other images and videos demonstrate how diverse the labeling can be, revealing long axonal connections and the abundance and shape of different cell types including not only neurons but also astrocytes and microglia.

Exploring Alzheimer’s

For years, Chung has collaborated with co-author Matthew Frosch, an Alzheimer’s researcher and director of the brain bank at Massachusetts General Hospital, to image and understand Alzheimer’s disease brains. With the new pipeline established they began an open-ended exploration, first noticing where within a slab of tissue they saw the greatest loss of neurons in the disease sample compared to the control. From there, they followed their curiosity — as the technology allowed them to do — ultimately producing a series of detailed investigations described in the paper.

“We didn’t lay out all these experiments in advance,” Chung says. “We just started by saying, ‘OK, let’s image this slab and see what we see.’ We identified brain regions with substantial neuronal loss so let’s see what’s happening there. ‘Let’s dive deeper.’ So we used many different markers to characterize and see the relationships between pathogenic factors and different cell types.

“This pipeline allows us to have almost unlimited access to the tissue,” Chung says. “We can always go back and look at something new.”

They focused most of their analysis in the orbitofrontal cortex within each hemisphere. One of the many observations they made was that synapse loss was concentrated in areas where there was direct overlap with amyloid plaques. Outside of areas of plaques the synapse density was as high in the brain with Alzheimer’s as in the one without the disease.

With just two samples, Chung says, the team is not offering any conclusions about the nature of Alzheimer’s disease, of course, but the point of the study is that the capability now exists to fully image and deeply analyze whole human brain hemispheres to enable exactly that kind of research.

Notably, the technology applies equally well to many other tissues in the body, not just brains.

“We envision that this scalable technology platform will advance our understanding of the human organ functions and disease mechanisms to spur development of new therapies,” the authors conclude.

In addition to Park, Wang, Guan, Chung, and Frosch, the paper’s other authors are Lars A. Gjesteby, Dylan Pollack, Lee Kamentsky, Nicholas B. Evans, Jeff Stirman, Xinyi Gu, Chuanxi Zhao, Slayton Marx, Minyoung E. Kim, Seo Woo Choi, Michael Snyder, David Chavez, Clover Su-Arcaro, Yuxuan Tian, Chang Sin Park, Qiangge Zhang, Dae Hee Yun, Mira Moukheiber, Guoping Feng, X. William Yang, C. Dirk Keene, Patrick R. Hof, Satrajit S. Ghosh, and Laura J. Brattain.

The main funding for the work came from the National Institutes of Health, The Picower Institute for Learning and Memory, The JPB Foundation, and the NCSOFT Cultural Foundation.



de MIT News https://ift.tt/hXJTbGK

Understanding the visual knowledge of language models

You’ve likely heard that a picture is worth a thousand words, but can a large language model (LLM) get the picture if it’s never seen images before?

As it turns out, language models that are trained purely on text have a solid understanding of the visual world. They can write image-rendering code to generate complex scenes with intriguing objects and compositions — and even when that knowledge is not used properly, LLMs can refine their images. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) observed this when prompting language models to self-correct their code for different images, where the systems improved on their simple clipart drawings with each query.

The visual knowledge of these language models is gained from how concepts like shapes and colors are described across the internet, whether in language or code. When given a direction like “draw a parrot in the jungle,” users jog the LLM to consider what it’s read in descriptions before. To assess how much visual knowledge LLMs have, the CSAIL team constructed a “vision checkup” for LLMs: using their “Visual Aptitude Dataset,” they tested the models’ abilities to draw, recognize, and self-correct these concepts. Collecting each final draft of these illustrations, the researchers trained a computer vision system that identifies the content of real photos.

“We essentially train a vision system without directly using any visual data,” says Tamar Rott Shaham, co-lead author of the study and an MIT electrical engineering and computer science (EECS) postdoc at CSAIL. “Our team queried language models to write image-rendering codes to generate data for us and then trained the vision system to evaluate natural images. We were inspired by the question of how visual concepts are represented through other mediums, like text. To express their visual knowledge, LLMs can use code as a common ground between text and vision.”

To build this dataset, the researchers first queried the models to generate code for different shapes, objects, and scenes. Then, they compiled that code to render simple digital illustrations, like a row of bicycles, showing that LLMs understand spatial relations well enough to draw the two-wheelers in a horizontal row. As another example, the model generated a car-shaped cake, combining two random concepts. The language model also produced a glowing light bulb, indicating its ability to create visual effects. 

“Our work shows that when you query an LLM (without multimodal pre-training) to create an image, it knows much more than it seems,” says co-lead author, EECS PhD student, and CSAIL member Pratyusha Sharma. “Let’s say you asked it to draw a chair. The model knows other things about this piece of furniture that it may not have immediately rendered, so users can query the model to improve the visual it produces with each iteration. Surprisingly, the model can iteratively enrich the drawing by improving the rendering code to a significant extent.”

The researchers gathered these illustrations, which were then used to train a computer vision system that can recognize objects within real photos (despite never having seen one before). With this synthetic, text-generated data as its only reference point, the system outperforms other procedurally generated image datasets that were trained with authentic photos.

The CSAIL team believes that combining the hidden visual knowledge of LLMs with the artistic capabilities of other AI tools like diffusion models could also be beneficial. Systems like Midjourney sometimes lack the know-how to consistently tweak the finer details in an image, making it difficult for them to handle requests like reducing how many cars are pictured, or placing an object behind another. If an LLM sketched out the requested change for the diffusion model beforehand, the resulting edit could be more satisfactory.

The irony, as Rott Shaham and Sharma acknowledge, is that LLMs sometimes fail to recognize the same concepts that they can draw. This became clear when the models incorrectly identified human re-creations of images within the dataset. Such diverse representations of the visual world likely triggered the language models’ misconceptions.

While the models struggled to perceive these abstract depictions, they demonstrated the creativity to draw the same concepts differently each time. When the researchers queried LLMs to draw concepts like strawberries and arcades multiple times, they produced pictures from diverse angles with varying shapes and colors, hinting that the models might have actual mental imagery of visual concepts (rather than reciting examples they saw before).

The CSAIL team believes this procedure could be a baseline for evaluating how well a generative AI model can train a computer vision system. Additionally, the researchers look to expand the tasks they challenge language models on. As for their recent study, the MIT group notes that they don’t have access to the training set of the LLMs they used, making it challenging to further investigate the origin of their visual knowledge. In the future, they intend to explore training an even better vision model by letting the LLM work directly with it.

Sharma and Rott Shaham are joined on the paper by former CSAIL affiliate Stephanie Fu ’22, MNG ’23 and EECS PhD students Manel Baradad, Adrián Rodríguez-Muñoz ’22, and Shivam Duggal, who are all CSAIL affiliates; as well as MIT Associate Professor Phillip Isola and Professor Antonio Torralba. Their work was supported, in part, by a grant from the MIT-IBM Watson AI Lab, a LaCaixa Fellowship, the Zuckerman STEM Leadership Program, and the Viterbi Fellowship. They present their paper this week at the IEEE/CVF Computer Vision and Pattern Recognition Conference.



de MIT News https://ift.tt/z47ahOn

A smarter way to streamline drug discovery

The use of AI to streamline drug discovery is exploding. Researchers are deploying machine-learning models to help them identify molecules, among billions of options, that might have the properties they are seeking to develop new medicines.

But there are so many variables to consider — from the price of materials to the risk of something going wrong — that even when scientists use AI, weighing the costs of synthesizing the best candidates is no easy task.

The myriad challenges involved in identifying the best and most cost-efficient molecules to test is one reason new medicines take so long to develop, as well as a key driver of high prescription drug prices.

To help scientists make cost-aware choices, MIT researchers developed an algorithmic framework to automatically identify optimal molecular candidates, which minimizes synthetic cost while maximizing the likelihood candidates have desired properties. The algorithm also identifies the materials and experimental steps needed to synthesize these molecules.

Their quantitative framework, known as Synthesis Planning and Rewards-based Route Optimization Workflow (SPARROW), considers the costs of synthesizing a batch of molecules at once, since multiple candidates can often be derived from some of the same chemical compounds.

Moreover, this unified approach captures key information on molecular design, property prediction, and synthesis planning from online repositories and widely used AI tools.

Beyond helping pharmaceutical companies discover new drugs more efficiently, SPARROW could be used in applications like the invention of new agrichemicals or the discovery of specialized materials for organic electronics.

“The selection of compounds is very much an art at the moment — and at times it is a very successful art. But because we have all these other models and predictive tools that give us information on how molecules might perform and how they might be synthesized, we can and should be using that information to guide the decisions we make,” says Connor Coley, the Class of 1957 Career Development Assistant Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science, and senior author of a paper on SPARROW.

Coley is joined on the paper by lead author Jenna Fromer SM ’24. The research appears today in Nature Computational Science.

Complex cost considerations

In a sense, whether a scientist should synthesize and test a certain molecule boils down to a question of the synthetic cost versus the value of the experiment. However, determining cost or value are tough problems on their own.

For instance, an experiment might require expensive materials or it could have a high risk of failure. On the value side, one might consider how useful it would be to know the properties of this molecule or whether those predictions carry a high level of uncertainty.

At the same time, pharmaceutical companies increasingly use batch synthesis to improve efficiency. Instead of testing molecules one at a time, they use combinations of chemical building blocks to test multiple candidates at once. However, this means the chemical reactions must all require the same experimental conditions. This makes estimating cost and value even more challenging.

SPARROW tackles this challenge by considering the shared intermediary compounds involved in synthesizing molecules and incorporating that information into its cost-versus-value function.

“When you think about this optimization game of designing a batch of molecules, the cost of adding on a new structure depends on the molecules you have already chosen,” Coley says.

The framework also considers things like the costs of starting materials, the number of reactions that are involved in each synthetic route, and the likelihood those reactions will be successful on the first try.

To utilize SPARROW, a scientist provides a set of molecular compounds they are thinking of testing and a definition of the properties they are hoping to find.

From there, SPARROW collects information on the molecules and their synthetic pathways and then weighs the value of each one against the cost of synthesizing a batch of candidates. It automatically selects the best subset of candidates that meet the user’s criteria and finds the most cost-effective synthetic routes for those compounds.

“It does all this optimization in one step, so it can really capture all of these competing objectives simultaneously,” Fromer says.

A versatile framework

SPARROW is unique because it can incorporate molecular structures that have been hand-designed by humans, those that exist in virtual catalogs, or never-before-seen molecules that have been invented by generative AI models.

“We have all these different sources of ideas. Part of the appeal of SPARROW is that you can take all these ideas and put them on a level playing field,” Coley adds.

The researchers evaluated SPARROW by applying it in three case studies. The case studies, based on real-world problems faced by chemists, were designed to test SPARROW’s ability to find cost-efficient synthesis plans while working with a wide range of input molecules.

They found that SPARROW effectively captured the marginal costs of batch synthesis and identified common experimental steps and intermediate chemicals. In addition, it could scale up to handle hundreds of potential molecular candidates.

“In the machine-learning-for-chemistry community, there are so many models that work well for retrosynthesis or molecular property prediction, for example, but how do we actually use them? Our framework aims to bring out the value of this prior work. By creating SPARROW, hopefully we can guide other researchers to think about compound downselection using their own cost and utility functions,” Fromer says.

In the future, the researchers want to incorporate additional complexity into SPARROW. For instance, they’d like to enable the algorithm to consider that the value of testing one compound may not always be constant. They also want to include more elements of parallel chemistry in its cost-versus-value function.

“The work by Fromer and Coley better aligns algorithmic decision making to the practical realities of chemical synthesis. When existing computational design algorithms are used, the work of determining how to best synthesize the set of designs is left to the medicinal chemist, resulting in less optimal choices and extra work for the medicinal chemist,” says Patrick Riley, senior vice president of artificial intelligence at Relay Therapeutics, who was not involved with this research. “This paper shows a principled path to include consideration of joint synthesis, which I expect to result in higher quality and more accepted algorithmic designs.”

“Identifying which compounds to synthesize in a way that carefully balances time, cost, and the potential for making progress toward goals while providing useful new information is one of the most challenging tasks for drug discovery teams. The SPARROW approach from Fromer and Coley does this in an effective and automated way, providing a useful tool for human medicinal chemistry teams and taking important steps toward fully autonomous approaches to drug discovery,” adds John Chodera, a computational chemist at Memorial Sloan Kettering Cancer Center, who was not involved with this work.

This research was supported, in part, by the DARPA Accelerated Molecular Discovery Program, the Office of Naval Research, and the National Science Foundation.



de MIT News https://ift.tt/kuWI9lr

sábado, 15 de junio de 2024

A new way to spot life-threatening infections in cancer patients

Chemotherapy and other treatments that take down cancer cells can also destroy patients’ immune cells. Every year, that leads tens of thousands of cancer patients with weakened immune systems to contract infections that can turn deadly if unmanaged.

Doctors must strike a balance between giving enough chemotherapy to eradicate cancer while not giving so much that the patient’s white blood cell count gets dangerously low, a condition known as neutropenia. It can also leave patients socially isolated in between rounds of chemotherapy. Currently, the only way for doctors to monitor their patients’ white blood cells is through blood tests.

Now Leuko is developing an at-home white blood cell monitor to give doctors a more complete view of their patients’ health remotely. Rather than drawing blood, the device uses light to look through the skin at the top of the fingernail, and artificial intelligence to analyze and detect when white blood cells reach dangerously low levels.

The technology was first conceived of by researchers at MIT in 2015. Over the next few years, they developed a prototype and conducted a small study to validate their approach. Today, Leuko’s devices have accurately detected low white blood cell counts in hundreds of cancer patients, all without drawing a single drop of blood.

“We expect this to bring a clear improvement in the way that patients are monitored and cared for in the outpatient setting,” says Leuko co-founder and CTO Ian Butterworth, a former research engineer in MIT’s Research Laboratory of Electronics. “I also think there’s a more personal side of this for patients. These people can feel vulnerable around other people, and they don't currently have much they can do. That means that if they want to see their grandkids or see family, they’re constantly wondering, ‘Am I at high risk?’”

The company has been working with the Food and Drug Administration (FDA) over the last four years to design studies confirming their device is accurate and easy to use by untrained patients. Later this year, they expect to begin a pivotal study that will be used to register for FDA approval.

Once the device becomes an established tool for patient monitoring, Leuko’s team believes it could also give doctors a new way to optimize cancer treatment.

“Some of the physicians that we have talked to are very excited because they think future versions of our product could be used to personalize the dose of chemotherapy given to each patient,” says Leuko co-founder and CEO Carlos Castro-Gonzalez, a former postdoc at MIT. “If a patient is not becoming neutropenic, that could be a sign that you could increase the dose. Then every treatment could be based on how each patient is individually reacting.”

Monitoring immune health

Leuko co-founders Ian Butterworth, Carlos Castro-Gonzalez, Aurélien Bourquard, and Alvaro Sanchez-Ferro came to MIT in 2013 as part of the Madrid-MIT M+Vision Consortium, which was a collaboration between MIT and Madrid and is now part of MIT linQ. The program brought biomedical researchers from around the world to MIT to work on translational projects with institutions around Boston and Madrid.

The program, which was originally run out of MIT’s Research Laboratory of Electronics, challenged members to tackle huge unmet needs in medicine and connected them with MIT faculty members from across the Institute to build solutions. Leuko’s founders also received support from MIT’s entrepreneurial ecosystem, including the Venture Mentoring Service, the Sandbox Innovation Fund, the Martin Trust Center for Entrepreneurship, and the Deshpande Center. After its MIT spinout, the company raised seed and series A financing rounds led by Good Growth Capital and HTH VC.

“I didn’t even realize that entrepreneurship was a career option for a PhD [like myself],” Castro-Gonzalez says. “I was thinking that after the fellowship I would apply for faculty positions. That was the career path I had in mind, so I was very excited about the focus at MIT on trying to translate science into products that people can benefit from.”

Leuko’s founders knew people with cancer stood to benefit the most from a noninvasive white blood cell monitor. Unless patients go to the hospital, they can currently monitor only their temperature from home. If they show signs of a fever, they’re advised to go to the emergency room immediately.

“These infections happen quite frequently,” Sanchez-Ferro says. “One in every six cancer patients undergoing chemotherapy will develop an infection where their white blood cells are critically low. Some of those infections unfortunately end in deaths for patients, which is particularly terrible because they’re due to the treatment rather than the disease. [Infections] also mean the chemotherapy gets interrupted, which increases negative clinical outcomes for patients.”

Leuko’s optical device works through imaging the capillaries, or small blood vessels, just above the fingernail, which are more visible and already used by doctors to assess other aspects of vascular health. The company’s portable device analyzes white blood cell activity to detect critically low levels for care teams.

In a study of 44 patients in 2019, Leuko’s team showed the approach was able to detect when white blood cell levels dropped below a critical threshold, with minimal false positives. The team has since developed a product that another, larger study showed unsupervised patients can use at home to get immune information to doctors.

“We work completely noninvasively, so you can perform white blood cell measurements at home and much more frequently than what’s possible today,” Bourquard says. “The key aspect of this is it allows doctors to identify patients whose immune systems become so weak they’re at high risk of infection. If doctors have that information, they can provide preventative treatment in the form of antibiotics and growth factors. Research estimates that would eliminate 50 percent of hospitalizations.”

Expanding applications

Leuko’s founders believe their device will help physicians make more informed care decisions for patients. They also believe the device holds promise for monitoring patient health across other conditions.

“The long-term vision for the company is making this available to other patient populations that can also benefit from increased monitoring of their immune system,” Castro-Gonzalez says. “That includes patients with multiple sclerosis, autoimmune diseases, organ transplants, and patients that are rushed into the emergency room.”

Leuko’s team even sees a future where their device could be used to monitor other biomarkers in the blood.

“We believe this could be a platform technology,” Castro-Gonzalez says. “We get these noninvasive videos of the blood flowing through the capillaries, so part of the vision for the company is measuring other parameters in the blood beyond white blood cells, including hemoglobin, red blood cells, and platelets. That’s all part of our roadmap for the future.”



de MIT News https://ift.tt/xfWecCs

jueves, 13 de junio de 2024

A creation story told through immersive technology

In the beginning, as one version of the Haudenosaunee creation story has it, there was only water and sky. According to oral tradition, when the Sky Woman became pregnant, she dropped through a hole in the clouds. While many animals guided her descent as she fell, she eventually found a place on the turtle’s back. They worked together, with the aid of other water creatures, to lift the land from the depths of these primordial waters to create what we now know as our earth.

The new immersive experience, “Ne:Kahwistará:ken Kanónhsa’kówa í:se Onkwehonwe,” is a vivid retelling of this creation story by multimedia artist Jackson 2bears, also known as Tékeniyáhsen Ohkwá:ri (Kanien’kehà:ka), the 2022–24 Ida Ely Rubin Artist in Residence at the MIT Center for Art, Science and Technology. “A lot of what drives my work is finding new ways to keep Haudenosaunee teachings and stories alive in our communities, finding new ways to tell them, but also helping with the transmission and transformation of those stories as they are for us, a living part of our cultural practice,” he says.

 

A virtual recreation of the traditional longhouse

2bears was first inspired to create a virtual reality version of a longhouse, a traditional Haudenosaunee structure, in collaboration with Thru the RedDoor, an Indigenous-owned media company in Six Nations at the Grand River that 2bears calls home. The longhouse is not only a “functional dwelling,” says 2bears, but an important spiritual and cultural center where creation myths are shared. “While we were developing the project, we were told by one of our knowledge keepers in the community that longhouses aren’t structures, they’re not the materials they’re made out of,” 2bears recalls, “They’re about the people, the Haudenosaunee people. And it’s about our creative cultural practices in that space that make it a sacred place.”

The virtual recreation of the longhouse connects storytelling to the physical landscape, while also offering a shared space for community members to gather. In Haudenosaunee worldview, says 2bears, “stories are both durational, but they’re also dimensional.” With “Ne:Kahwistará:ken Kanónhsa’kówa í:se Onkwehonwe,” the longhouse was brought to life with drumming, dancing, knowledge-sharing, and storytelling. The immersive experience was designed to be communal. “We wanted to develop a story that we could work on with a bunch of other people rather than just having a story writer or director,” 2bears says, “We didn’t want to do headsets. We wanted to do something where we could be together, which is part of the longhouse mentality,” he says.

The power of collaboration

2bears produced the project with the support of Co-Creation Studio at MIT’s Open Documentary Lab. “We think of co-creation as a dance, as a way of working that challenges the notion of the singular author, the single one point of view,” says documentarian Kat Cizek, the artistic director and co-founder of the studio, who began her work at MIT as a CAST visiting artist. “And Jackson does that. He does that within the community at Six Nations, but also with other communities and other Indigenous artists.”

In an individualist society that so often centers the idea of the singular author, 2bears’s practice offers a powerful example of what it means to work as a collective, says Cizek. “It’s very hard to operate, I think, in any discipline without some level of collaboration,” she says, “What’s different about co-creation for us is that people enter the room with no set agenda. You come into the room and you come with questions and curiosity about what you might make together.”

2bears at MIT

At first, 2bears thought his time at MIT would help with the technical side of his work. But over time, he discovered a rich community at MIT, a place to explore the larger philosophical questions relating to technology, Indigenous knowledge, and artificial intelligence. “We think very often about not only human intelligence, but animal intelligence and the spirit of the sky and the trees and the grass and the living earth,” says 2bears, “and I’m seeing that kind of reflected here at the school.”

In 2023, 2bears participated in the Co-Creation Studio Indigenous Immersive Incubator at MIT, an historic gathering of 10 Indigenous artists, who toured MIT labs and met with Indigenous leaders from MIT and beyond. As part of the summit, he shared “Ne:Kahwistará:ken Kanónhsa’kówa í:se Onkwehonwe” as a work in progress. This spring, he presented the latest iteration of the work at MIT in smaller settings with groups of students, and in a large public lecture presented by CAST and the Art, Culture and Technology Program. His “experimental method of storytelling and communication really conveys the power of what it means to be a community as an Indigenous person, and the unique beauty of all of our people,” says Nicole McGaa, Oglala Lakota, co-president of MIT’s Native American Indigenous Association.

Storytelling in 360 degrees

2bear’s virtual recreation became even more important after the longhouse in the community unexpectedly burned down midway through the process, after the team had created 3D scans of the structure. With no building to project onto, they used ingenuity and creativity to pivot to the project’s current iteration.

The immersive experience was remarkable in its sheer size: 8-foot tall images played on a canvas screen 34 feet in diameter. With video mapping using multiple projectors and 14-channel surround sound, the story of Sky Woman coming down to Turtle Island was given an immense form. It premiered at the 2RO MEDIA Festival, and was met with an enthusiastic response from the Six Nations community. “It was so beautiful. You can look in any direction, and there was something happening,” says Gary Joseph, director of Thru the RedDoor. “It affects you in a way that you didn’t think you could be affected because you're seeing the things that are sacred to you being expressed in a way that you’ve never imagined.”

In the future, 2bears hopes to make the installation more interactive, so participants can engage with the experience in their own ways, creating multiple versions of the creation story. “I’ve been thinking about it as creating a living installation,” he says. “It really was a project made in community, and I couldn’t have been happier about how it turned out. And I’m really excited about where I see this project going in the future.”



de MIT News https://ift.tt/mL5xVDb

With programmable pixels, novel sensor improves imaging of neural activity

Neurons communicate electrically, so to understand how they produce such brain functions as memory, neuroscientists must track how their voltage changes — sometimes subtly — on the timescale of milliseconds. In a new open-access paper in Nature Communications, MIT researchers describe a novel image sensor with the capability to substantially increase that ability.

The invention led by Jie Zhang, a postdoc in the lab of Matt Wilson, who is the Sherman Fairchild Professor at MIT and member of The Picower Institute for Learning and Memory, is a new take on the standard “CMOS” (complementary metal-oxide semiconductor) technology used in scientific imaging. In that standard approach, all pixels turn on and off at the same time — a configuration with an inherent trade-off in which fast sampling means capturing less light. The new chip enables each pixel’s timing to be controlled individually. That arrangement provides a “best of both worlds” in which neighboring pixels can essentially complement each other to capture all the available light without sacrificing speed.

In experiments described in the study, Zhang and Wilson’s team demonstrates how “pixelwise” programmability enabled them to improve visualization of neural voltage “spikes,” which are the signals neurons use to communicate with each other, and even the more subtle, momentary fluctuations in their voltage that constantly occur between those spiking events.

“Measuring with single-spike resolution is really important as part of our research approach,” says senior author Wilson, a professor in MIT’s departments of Biology and Brain and Cognitive Sciences (BCS), whose lab studies how the brain encodes and refines spatial memories both during wakeful exploration and during sleep. “Thinking about the encoding processes within the brain, single spikes and the timing of those spikes is important in understanding how the brain processes information.”

For decades, Wilson has helped to drive innovations in the use of electrodes to tap into neural electrical signals in real time, but like many researchers he has also sought visual readouts of electrical activity because they can highlight large areas of tissue and still show which exact neurons are electrically active at any given moment. Being able to identify which neurons are active can enable researchers to learn which types of neurons are participating in memory processes, providing important clues about how brain circuits work.

In recent years, neuroscientists including co-senior author Ed Boyden, the Y. Eva Tan Professor of Neurotechnology in BCS and the McGovern Institute for Brain Research and a Picower Institute affiliate, have worked to meet that need by inventing “genetically encoded voltage indicators” (GEVIs) that make cells glow as their voltage changes in real time. But as Zhang and Wilson have tried to employ GEVIs in their research, they’ve found that conventional CMOS image sensors were missing a lot of the action. If they operated too fast, they wouldn’t gather enough light. If they operated too slowly, they’d miss rapid changes.

But image sensors have such fine resolution that many pixels are really looking at essentially the same place on the scale of a whole neuron, Wilson says. Recognizing that there was resolution to spare, Zhang applied his expertise in sensor design to invent an image sensor chip that would enable neighboring pixels to each have their own timing. Faster ones could capture rapid changes. Slower-working ones could gather more light. No action or photons would be missed. Zhang also cleverly engineered the required control electronics so they barely cut into the space available for light-sensitive elements on a pixels. This ensured the sensor’s high sensitivity under low light conditions, Zhang says.

In the study the researchers demonstrated two ways in which the chip improved imaging of voltage activity of mouse hippocampus neurons cultured in a dish. They ran their sensor head-to-head against an industry standard scientific CMOS image sensor chip.

In the first set of experiments, the team sought to image the fast dynamics of neural voltage. On the conventional CMOS chip, each pixel had a zippy 1.25 millisecond exposure time. On the pixelwise sensor each pixel in neighboring groups of four stayed on for 5 ms, but their start times were staggered so that each one turned on and off 1.25 seconds later than the next. In the study, the team shows that each pixel, because it was on longer, gathered more light, but because each one was capturing a new view every 1.25 ms, it was equivalent to simply having a fast temporal resolution. The result was a doubling of the signal-to-noise ratio for the pixelwise chip. This achieves high temporal resolution at a fraction of the sampling rate compared to conventional CMOS chips, Zhang says.

Moreover, the pixelwise chip detected neural spiking activities that the conventional sensor missed. And when the researchers compared the performance of each kind of sensor against the electrical readings made with a traditional patch clamp electrode, they found that the staggered pixelwise measurements better matched that of the patch clamp.

In the second set of experiments, the team sought to demonstrate that the pixelwise chip could capture both the fast dynamics and also the slower, more subtle “subthreshold” voltage variances neurons exhibit. To do so they varied the exposure durations of neighboring pixels in the pixelwise chip, ranging from 15.4 ms down to just 1.9 ms. In this way, fast pixels sampled every quick change (albeit faintly), while slower pixels integrated enough light over time to track even subtle slower fluctuations. By integrating the data from each pixel, the chip was indeed able to capture both fast spiking and slower subthreshold changes, the researchers reported.

The experiments with small clusters of neurons in a dish was only a proof of concept, Wilson says. His lab’s ultimate goal is to conduct brain-wide, real-time measurements of activity in distinct types of neurons in animals even as they are freely moving about and learning how to navigate mazes. The development of GEVIs and of image sensors like the pixelwise chip that can successfully take advantage of what they show is crucial to making that goal feasible.  

“That’s the idea of everything we want to put together: large-scale voltage imaging of genetically tagged neurons in freely behaving animals,” Wilson says.

To achieve this, Zhang adds, “We are already working on the next iteration of chips with lower noise, higher pixel counts, time-resolution of multiple kHz, and small form factors for imaging in freely behaving animals.”

The research is advancing pixel by pixel.

In addition to Zhang, Wilson, and Boyden, the paper’s other authors are Jonathan Newman, Zeguan Wang, Yong Qian, Pedro Feliciano-Ramos, Wei Guo, Takato Honda, Zhe Sage Chen, Changyang Linghu, Ralph-Etienne Cummings, and Eric Fossum.

The Picower Institute, The JPB Foundation, the Alana Foundation, The Louis B. Thalheimer Fund for Translational Research, the National Institutes of Health, HHMI, Lisa Yang, and John Doerr provided support for the research.



de MIT News https://ift.tt/WOjcyTH

Featured video: Researchers discuss queer visibility in academia

“My identity as a scientist and my identity as a gay man are not contradictory, but complimentary,” says Jack Forman, PhD candidate in media arts and sciences and co-lead of LGBT Grad, a student group run by and for LGBTQ+ grad students and postdocs at MIT.

He and co-leads Miranda Dawson and Tunahan Aytas ’23 recently interviewed queer MIT faculty about their experiences and the importance of visibility in “Scientific InQueery,” a video meant to inspire young LBGTQ+ academics to take pride in the intersections of their identities and their academic work.

“In professional settings, people need to create spaces for researchers to be able to discuss their scientific work and also be queer,” says Nergis Mavalvala, the Curtis and Kathleen Marble Professor of Astrophysics and dean of the MIT School of Science. “That [space] gives a sense of safety [to say] ‘I can be successful in my profession; I can be queer; and I can be out here flying my rainbow flag.’”

“As queer graduate students, we find community in our peers. However, as one progresses up the academic ladder, it can be harder to find examples of queer people in higher positions. Bringing visibility to the queer faculty helps younger queer academics find a greater sense of community,” says Dawson, a PhD student in MIT’s Department of Biological Engineering. In her years as co-lead of LGBT Grad, she has been a visible advocate for LGBTQ+ graduate students across MIT.

“We would love it if a young queer person with curiosity and a love for learning saw this video and realized that they belong here, at a place like MIT,” says Dawson.

In addition to Aytas, Dawson, Forman, and Mavalvala, the video features Sebastian Lourido, associate professor of biology; Lorna Gibson, professor of materials science and engineering; and Bryan Bryson, associate professor of biological engineering.



de MIT News https://ift.tt/sq6iCSW

miércoles, 12 de junio de 2024

Scientists preserve DNA in an amber-like polymer

In the movie “Jurassic Park,” scientists extracted DNA that had been preserved in amber for millions of years, and used it to create a population of long-extinct dinosaurs.

Inspired partly by that film, MIT researchers have developed a glassy, amber-like polymer that can be used for long-term storage of DNA, whether entire human genomes or digital files such as photos.

Most current methods for storing DNA require freezing temperatures, so they consume a great deal of energy and are not feasible in many parts of the world. In contrast, the new amber-like polymer can store DNA at room temperature while protecting the molecules from damage caused by heat or water.

The researchers showed that they could use this polymer to store DNA sequences encoding the theme music from Jurassic Park, as well as an entire human genome. They also demonstrated that the DNA can be easily removed from the polymer without damaging it.

“Freezing DNA is the number one way to preserve it, but it’s very expensive, and it’s not scalable,” says James Banal, a former MIT postdoc. “I think our new preservation method is going to be a technology that may drive the future of storing digital information on DNA.”

Banal and Jeremiah Johnson, the A. Thomas Geurtin Professor of Chemistry at MIT, are the senior authors of the study, published yesterday in the Journal of the American Chemical Society. Former MIT postdoc Elizabeth Prince and MIT postdoc Ho Fung Cheng are the lead authors of the paper.

Capturing DNA

DNA, a very stable molecule, is well-suited for storing massive amounts of information, including digital data. Digital storage systems encode text, photos, and other kind of information as a series of 0s and 1s. This same information can be encoded in DNA using the four nucleotides that make up the genetic code: A, T, G, and C. For example, G and C could be used to represent 0 while A and T represent 1.

DNA offers a way to store this digital information at very high density: In theory, a coffee mug full of DNA could store all of the world’s data. DNA is also very stable and relatively easy to synthesize and sequence.

In 2021, Banal and his postdoc advisor, Mark Bathe, an MIT professor of biological engineering, developed a way to store DNA in particles of silica, which could be labeled with tags that revealed the particles’ contents. That work led to a spinout called Cache DNA.

One downside to that storage system is that it takes several days to embed DNA into the silica particles. Furthermore, removing the DNA from the particles requires hydrofluoric acid, which can be hazardous to workers handling the DNA.

To come up with alternative storage materials, Banal began working with Johnson and members of his lab. Their idea was to use a type of polymer known as a degradable thermoset, which consists of polymers that form a solid when heated. The material also includes cleavable links that can be easily broken, allowing the polymer to be degraded in a controlled way.

“With these deconstructable thermosets, depending on what cleavable bonds we put into them, we can choose how we want to degrade them,” Johnson says.

For this project, the researchers decided to make their thermoset polymer from styrene and a cross-linker, which together form an amber-like thermoset called cross-linked polystyrene. This thermoset is also very hydrophobic, so it can prevent moisture from getting in and damaging the DNA. To make the thermoset degradable, the styrene monomers and cross-linkers are copolymerized with monomers called thionolactones. These links can be broken by treating them with a molecule called cysteamine.

Because styrene is so hydrophobic, the researchers had to come up with a way to entice DNA — a hydrophilic, negatively charged molecule — into the styrene.

To do that, they identified a combination of three monomers that they could turn into polymers that dissolve DNA by helping it interact with styrene. Each of the monomers has different features that cooperate to get the DNA out of water and into the styrene. There, the DNA forms spherical complexes, with charged DNA in the center and hydrophobic groups forming an outer layer that interacts with styrene. When heated, this solution becomes a solid glass-like block, embedded with DNA complexes.

The researchers dubbed their method T-REX (Thermoset-REinforced Xeropreservation). The process of embedding DNA into the polymer network takes a few hours, but that could become shorter with further optimization, the researchers say.

To release the DNA, the researchers first add cysteamine, which cleaves the bonds holding the polystyrene thermoset together, breaking it into smaller pieces. Then, a detergent called SDS can be added to remove the DNA from polystyrene without damaging it.

Storing information

Using these polymers, the researchers showed that they could encapsulate DNA of varying length, from tens of nucleotides up to an entire human genome (more than 50,000 base pairs). They were able to store DNA encoding the Emancipation Proclamation and the MIT logo, in addition to the theme music from “Jurassic Park.”

After storing the DNA and then removing it, the researchers sequenced it and found that no errors had been introduced, which is a critical feature of any digital data storage system.

The researchers also showed that the thermoset polymer can protect DNA from temperatures up to 75 degrees Celsius (167 degrees Fahrenheit). They are now working on ways to streamline the process of making the polymers and forming them into capsules for long-term storage.

Cache DNA, a company started by Banal and Bathe, with Johnson as a member of the scientific advisory board, is now working on further developing DNA storage technology. The earliest application they envision is storing genomes for personalized medicine, and they also anticipate that these stored genomes could undergo further analysis as better technology is developed in the future.

“The idea is, why don’t we preserve the master record of life forever?” Banal says. “Ten years or 20 years from now, when technology has advanced way more than we could ever imagine today, we could learn more and more things. We’re still in the very infancy of understanding the genome and how it relates to disease.”

The research was funded by the National Science Foundation.



de MIT News https://ift.tt/D2IdUQF

Symposium highlights scale of mental health crisis and novel methods of diagnosis and treatment

Digital technologies, such as smartphones and machine learning, have revolutionized education. At the McGovern Institute for Brain Research’s 2024 Spring Symposium, “Transformational Strategies in Mental Health,” experts from across the sciences — including psychiatry, psychology, neuroscience, computer science, and others — agreed that these technologies could also play a significant role in advancing the diagnosis and treatment of mental health disorders and neurological conditions.

Co-hosted by the McGovern Institute, MIT Open Learning, McClean Hospital, the Poitras Center for Psychiatric Disorders Research at MIT, and the Wellcome Trust, the symposium raised the alarm about the rise in mental health challenges and showcased the potential for novel diagnostic and treatment methods.

John Gabrieli, the Grover Hermann Professor of Health Sciences and Technology at MIT, kicked off the symposium with a call for an effort on par with the Manhattan Project, which in the 1940s saw leading scientists collaborate to do what seemed impossible. While the challenge of mental health is quite different, Gabrieli stressed, the complexity and urgency of the issue are similar. In his later talk, “How can science serve psychiatry to enhance mental health?,” he noted a 35 percent rise in teen suicide deaths between 1999 and 2000 and, between 2007 and 2015, a 100 percent increase in emergency room visits for youths ages 5 to 18 who experienced a suicide attempt or suicidal ideation.

“We have no moral ambiguity, but all of us speaking today are having this meeting in part because we feel this urgency,” said Gabrieli, who is also a professor of brain and cognitive sciences, the director of the Integrated Learning Initiative (MITili) at MIT Open Learning, and a member of the McGovern Institute. "We have to do something together as a community of scientists and partners of all kinds to make a difference.”

An urgent problem

In 2021, U.S. Surgeon General Vivek Murthy issued an advisory on the increase in mental health challenges in youth; in 2023, he issued another, warning of the effects of social media on youth mental health. At the symposium, Susan Whitfield-Gabrieli, a research affiliate at the McGovern Institute and a professor of psychology and director of the Biomedical Imaging Center at Northeastern University, cited these recent advisories, saying they underscore the need to “innovate new methods of intervention.”

Other symposium speakers also highlighted evidence of growing mental health challenges for youth and adolescents. Christian Webb, associate professor of psychology at Harvard Medical School, stated that by the end of adolescence, 15-20 percent of teens will have experienced at least one episode of clinical depression, with girls facing the highest risk. Most teens who experience depression receive no treatment, he added.

Adults who experience mental health challenges need new interventions, too. John Krystal, the Robert L. McNeil Jr. Professor of Translational Research and chair of the Department of Psychiatry at Yale University School of Medicine, pointed to the limited efficacy of antidepressants, which typically take about two months to have an effect on the patient. Patients with treatment-resistant depression face a 75 percent likelihood of relapse within a year of starting antidepressants. Treatments for other mental health disorders, including bipolar and psychotic disorders, have serious side effects that can deter patients from adherence, said Virginie-Anne Chouinard, director of research at McLean OnTrackTM, a program for first episode psychosis at McLean Hospital.

New treatments, new technologies

Emerging technologies, including smartphone technology and artificial intelligence, are key to the interventions that symposium speakers shared.

In a talk on AI and the brain, Dina Katabi, the Thuan and Nicole Pham Professor of Electrical Engineering and Computer Science at MIT, discussed novel ways to detect Parkinson’s and Alzheimer's, among other diseases. Early-stage research involved developing devices that can analyze how movement within a space impacts the surrounding electromagnetic field, as well as how wireless signals can detect breathing and sleep stages.

“I realize this may sound like la-la land,” Katabi said. “But it’s not! This device is used today by real patients, enabled by a revolution in neural networks and AI.”

Parkinson’s disease often cannot be diagnosed until significant impairment has already occurred. In a set of studies, Katabi’s team collected data on nocturnal breathing and trained a custom neural network to detect occurrences of Parkinson’s. They found the network was over 90 percent accurate in its detection. Next, the team used AI to analyze two sets of breathing data collected from patients at a six-year interval. Could their custom neural network identify patients who did not have a Parkinson’s diagnosis on the first visit, but subsequently received one? The answer was largely yes: Machine learning identified 75 percent of patients who would go on to receive a diagnosis.

Detecting high-risk patients at an early stage could make a substantial difference for intervention and treatment. Similarly, research by Jordan Smoller, professor of psychiatry at Harvard Medical School and director of the Center for Precision Psychiatry at Massachusetts General Hospital, demonstrated that AI-aided suicide risk prediction model could detect 45 percent of suicide attempts or deaths with 90 percent specificity, about two to three years in advance.

Other presentations, including a series of lightning talks, shared new and emerging treatments, such as the use of ketamine to treat depression; the use of smartphones, including daily text surveys and mindfulness apps, in treating depression in adolescents; metabolic interventions for psychotic disorders; the use of machine learning to detect impairment from THC intoxication; and family-focused treatment, rather than individual therapy, for youth depression.

Advancing understanding

The frequency and severity of adverse mental health events for children, adolescents, and adults demonstrate the necessity of funding for mental health research — and the open sharing of these findings.

Niall Boyce, head of mental health field building at the Wellcome Trust — a global charitable foundation dedicated to using science to solve urgent health challenges — outlined the foundation’s funding philosophy of supporting research that is “collaborative, coherent, and focused” and centers on “What is most important to those most affected?” Wellcome research managers Anum Farid and Tayla McCloud stressed the importance of projects that involve people with lived experience of mental health challenges and “blue sky thinking” that takes risks and can advance understanding in innovative ways. Wellcome requires that all published research resulting from its funding be open and accessible in order to maximize their benefits. 

Whether through therapeutic models, pharmaceutical treatments, or machine learning, symposium speakers agreed that transformative approaches to mental health call for collaboration and innovation.

“Understanding mental health requires us to understand the unbelievable diversity of humans,” Gabrieli said. “We have to use all the tools we have now to develop new treatments that will work for people for whom our conventional treatments don’t.”



de MIT News https://ift.tt/cJS2nC8

Just thinking about a location activates mental maps in the brain

As you travel your usual route to work or the grocery store, your brain engages cognitive maps stored in your hippocampus and entorhinal cortex. These maps store information about paths you have taken and locations you have been to before, so you can navigate whenever you go there.

New research from MIT has found that such mental maps also are created and activated when you merely think about sequences of experiences, in the absence of any physical movement or sensory input. In an animal study, the researchers found that the entorhinal cortex harbors a cognitive map of what animals experience while they use a joystick to browse through a sequence of images. These cognitive maps are then activated when thinking about these sequences, even when the images are not visible.

This is the first study to show the cellular basis of mental simulation and imagination in a nonspatial domain through activation of a cognitive map in the entorhinal cortex.

“These cognitive maps are being recruited to perform mental navigation, without any sensory input or motor output. We are able to see a signature of this map presenting itself as the animal is going through these experiences mentally,” says Mehrdad Jazayeri, an associate professor of brain and cognitive sciences, a member of MIT’s McGovern Institute for Brain Research, and the senior author of the study.

McGovern Institute Research Scientist Sujaya Neupane is the lead author of the paper, which appears today in Nature. Ila Fiete, a professor of brain and cognitive sciences at MIT, a member of MIT’s McGovern Institute for Brain Research, and director of the K. Lisa Yang Integrative Computational Neuroscience Center, is also an author of the paper.

Mental maps

A great deal of work in animal models and humans has shown that representations of physical locations are stored in the hippocampus, a small seahorse-shaped structure, and the nearby entorhinal cortex. These representations are activated whenever an animal moves through a space that it has been in before, just before it traverses the space, or when it is asleep.

“Most prior studies have focused on how these areas reflect the structures and the details of the environment as an animal moves physically through space,” Jazayeri says. “When an animal moves in a room, its sensory experiences are nicely encoded by the activity of neurons in the hippocampus and entorhinal cortex.”

In the new study, Jazayeri and his colleagues wanted to explore whether these cognitive maps are also built and then used during purely mental run-throughs or imagining of movement through nonspatial domains.

To explore that possibility, the researchers trained animals to use a joystick to trace a path through a sequence of images (“landmarks”) spaced at regular temporal intervals. During the training, the animals were shown only a subset of pairs of images but not all the pairs. Once the animals had learned to navigate through the training pairs, the researchers tested if animals could handle the new pairs they had never seen before.

One possibility is that animals do not learn a cognitive map of the sequence, and instead solve the task using a memorization strategy. If so, they would be expected to struggle with the new pairs. Instead, if the animals were to rely on a cognitive map, they should be able to generalize their knowledge to the new pairs.

“The results were unequivocal,” Jazayeri says. “Animals were able to mentally navigate between the new pairs of images from the very first time they were tested. This finding provided strong behavioral evidence for the presence of a cognitive map. But how does the brain establish such a map?”

To address this question, the researchers recorded from single neurons in the entorhinal cortex as the animals performed this task. Neural responses had a striking feature: As the animals used the joystick to navigate between two landmarks, neurons featured distinctive bumps of activity associated with the mental representation of the intervening landmarks.

“The brain goes through these bumps of activity at the expected time when the intervening images would have passed by the animal’s eyes, which they never did,” Jazayeri says. “And the timing between these bumps, critically, was exactly the timing that the animal would have expected to reach each of those, which in this case was 0.65 seconds.”

The researchers also showed that the speed of the mental simulation was related to the animals’ performance on the task: When they were a little late or early in completing the task, their brain activity showed a corresponding change in timing. The researchers also found evidence that the mental representations in the entorhinal cortex don’t encode specific visual features of the images, but rather the ordinal arrangement of the landmarks.

A model of learning

To further explore how these cognitive maps may work, the researchers built a computational model to mimic the brain activity that they found and demonstrate how it could be generated. They used a type of model known as a continuous attractor model, which was originally developed to model how the entorhinal cortex tracks an animal’s position as it moves, based on sensory input.

The researchers customized the model by adding a component that was able to learn the activity patterns generated by sensory input. This model was then able to learn to use those patterns to reconstruct those experiences later, when there was no sensory input.

“The key element that we needed to add is that this system has the capacity to learn bidirectionally by communicating with sensory inputs. Through the associational learning that the model goes through, it will actually recreate those sensory experiences,” Jazayeri says.

The researchers now plan to investigate what happens in the brain if the landmarks are not evenly spaced, or if they’re arranged in a ring. They also hope to record brain activity in the hippocampus and entorhinal cortex as the animals first learn to perform the navigation task.

“Seeing the memory of the structure become crystallized in the mind, and how that leads to the neural activity that emerges, is a really valuable way of asking how learning happens,” Jazayeri says.

The research was funded by the Natural Sciences and Engineering Research Council of Canada, the Québec Research Funds, the National Institutes of Health, and the Paul and Lilah Newton Brain Science Award.



de MIT News https://ift.tt/1apeTCA

Nancy Kanwisher, Robert Langer, and Sara Seager named Kavli Prize Laureates

MIT faculty members Nancy Kanwisher, Robert Langer, and Sara Seager are among eight researchers worldwide to receive this year’s Kavli Prizes.

A partnership among the Norwegian Academy of Science and Letters, the Norwegian Ministry of Education and Research, and the Kavli Foundation, the Kavli Prizes are awarded every two years to “honor scientists for breakthroughs in astrophysics, nanoscience and neuroscience that transform our understanding of the big, the small and the complex.” The laureates in each field will share $1 million.

Understanding recognition of faces

Nancy Kanwisher, the Walter A Rosenblith Professor of Brain and Cognitive Sciences and McGovern Institute for Brain Research investigator, has been awarded the 2024 Kavli Prize in Neuroscience with Doris Tsao, professor in the Department of Molecular and Cell Biology at the University of California at Berkeley, and Winrich Freiwald, the Denise A. and Eugene W. Chinery Professor at the Rockefeller University.

Kanwisher, Tsao, and Freiwald discovered a specialized system within the brain to recognize faces. Their discoveries have provided basic principles of neural organization and made the starting point for further research on how the processing of visual information is integrated with other cognitive functions.

Kanwisher was the first to prove that a specific area in the human neocortex is dedicated to recognizing faces, now called the fusiform face area. Using functional magnetic resonance imaging, she found individual differences in the location of this area and devised an analysis technique to effectively localize specialized functional regions in the brain. This technique is now widely used and applied to domains beyond the face recognition system. 

Integrating nanomaterials for biomedical advances

Robert Langer, the David H. Koch Institute Professor, has been awarded the 2024 Kavli Prize in Nanoscience with Paul Alivisatos, president of the University of Chicago and John D. MacArthur Distinguished Service Professor in the Department of Chemistry, and Chad Mirkin, professor of chemistry at Northwestern University.

Langer, Alivisatos, and Mirkin each revolutionized the field of nanomedicine by demonstrating how engineering at the nano scale can advance biomedical research and application. Their discoveries contributed foundationally to the development of therapeutics, vaccines, bioimaging, and diagnostics.

Langer was the first to develop nanoengineered materials that enabled the controlled release, or regular flow, of drug molecules. This capability has had an immense impact for the treatment of a range of diseases, such as aggressive brain cancer, prostate cancer, and schizophrenia. His work also showed that tiny particles, containing protein antigens, can be used in vaccination, and was instrumental in the development of the delivery of messenger RNA vaccines. 

Searching for life beyond Earth

Sara Seager, the Class of 1941 Professor of Planetary Sciences in the Department of Earth, Atmospheric and Planetary Sciences and a professor in the departments of Physics and of Aeronautics and Astronautics, has been awarded the 2024 Kavli Prize in Astrophysics along with David Charbonneau, the Fred Kavli Professor of Astrophysics at Harvard University.

Seager and Charbonneau are recognized for discoveries of exoplanets and the characterization of their atmospheres. They pioneered methods for the detection of atomic species in planetary atmospheres and the measurement of their thermal infrared emission, setting the stage for finding the molecular fingerprints of atmospheres around both giant and rocky planets. Their contributions have been key to the enormous progress seen in the last 20 years in the exploration of myriad exoplanets. 

Kanwisher, Langer, and Seager bring the number of all-time MIT faculty recipients of the Kavli Prize to eight. Prior winners include Rainer Weiss in astrophysics (2016), Alan Guth in astrophysics (2014), Mildred Dresselhaus in nanoscience (2012), Ann Graybiel in neuroscience (2012), and Jane Luu in astrophysics (2012).



de MIT News https://ift.tt/co0bWjg

martes, 11 de junio de 2024

Researchers use large language models to help robots navigate

Someday, you may want your home robot to carry a load of dirty clothes downstairs and deposit them in the washing machine in the far-left corner of the basement. The robot will need to combine your instructions with its visual observations to determine the steps it should take to complete this task.

For an AI agent, this is easier said than done. Current approaches often utilize multiple hand-crafted machine-learning models to tackle different parts of the task, which require a great deal of human effort and expertise to build. These methods, which use visual representations to directly make navigation decisions, demand massive amounts of visual data for training, which are often hard to come by.

To overcome these challenges, researchers from MIT and the MIT-IBM Watson AI Lab devised a navigation method that converts visual representations into pieces of language, which are then fed into one large language model that achieves all parts of the multistep navigation task.

Rather than encoding visual features from images of a robot’s surroundings as visual representations, which is computationally intensive, their method creates text captions that describe the robot’s point-of-view. A large language model uses the captions to predict the actions a robot should take to fulfill a user’s language-based instructions.

Because their method utilizes purely language-based representations, they can use a large language model to efficiently generate a huge amount of synthetic training data.

While this approach does not outperform techniques that use visual features, it performs well in situations that lack enough visual data for training. The researchers found that combining their language-based inputs with visual signals leads to better navigation performance.

“By purely using language as the perceptual representation, ours is a more straightforward approach. Since all the inputs can be encoded as language, we can generate a human-understandable trajectory,” says Bowen Pan, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this approach.

Pan’s co-authors include his advisor, Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Watson AI Lab, and a senior research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL); Philip Isola, an associate professor of EECS and a member of CSAIL; senior author Yoon Kim, an assistant professor of EECS and a member of CSAIL; and others at the MIT-IBM Watson AI Lab and Dartmouth College. The research will be presented at the Conference of the North American Chapter of the Association for Computational Linguistics.

Solving a vision problem with language

Since large language models are the most powerful machine-learning models available, the researchers sought to incorporate them into the complex task known as vision-and-language navigation, Pan says.

But such models take text-based inputs and can’t process visual data from a robot’s camera. So, the team needed to find a way to use language instead.

Their technique utilizes a simple captioning model to obtain text descriptions of a robot’s visual observations. These captions are combined with language-based instructions and fed into a large language model, which decides what navigation step the robot should take next.

The large language model outputs a caption of the scene the robot should see after completing that step. This is used to update the trajectory history so the robot can keep track of where it has been.

The model repeats these processes to generate a trajectory that guides the robot to its goal, one step at a time.

To streamline the process, the researchers designed templates so observation information is presented to the model in a standard form — as a series of choices the robot can make based on its surroundings.

For instance, a caption might say “to your 30-degree left is a door with a potted plant beside it, to your back is a small office with a desk and a computer,” etc. The model chooses whether the robot should move toward the door or the office.

“One of the biggest challenges was figuring out how to encode this kind of information into language in a proper way to make the agent understand what the task is and how they should respond,” Pan says.

Advantages of language

When they tested this approach, while it could not outperform vision-based techniques, they found that it offered several advantages.

First, because text requires fewer computational resources to synthesize than complex image data, their method can be used to rapidly generate synthetic training data. In one test, they generated 10,000 synthetic trajectories based on 10 real-world, visual trajectories.

The technique can also bridge the gap that can prevent an agent trained with a simulated environment from performing well in the real world. This gap often occurs because computer-generated images can appear quite different from real-world scenes due to elements like lighting or color. But language that describes a synthetic versus a real image would be much harder to tell apart, Pan says. 

Also, the representations their model uses are easier for a human to understand because they are written in natural language.

“If the agent fails to reach its goal, we can more easily determine where it failed and why it failed. Maybe the history information is not clear enough or the observation ignores some important details,” Pan says.

In addition, their method could be applied more easily to varied tasks and environments because it uses only one type of input. As long as data can be encoded as language, they can use the same model without making any modifications.

But one disadvantage is that their method naturally loses some information that would be captured by vision-based models, such as depth information.

However, the researchers were surprised to see that combining language-based representations with vision-based methods improves an agent’s ability to navigate.

“Maybe this means that language can capture some higher-level information than cannot be captured with pure vision features,” he says.

This is one area the researchers want to continue exploring. They also want to develop a navigation-oriented captioner that could boost the method’s performance. In addition, they want to probe the ability of large language models to exhibit spatial awareness and see how this could aid language-based navigation.

This research is funded, in part, by the MIT-IBM Watson AI Lab.



de MIT News https://ift.tt/myjYuGg

Making climate models relevant for local decision-makers

Climate models are a key technology in predicting the impacts of climate change. By running simulations of the Earth’s climate, scientists and policymakers can estimate conditions like sea level rise, flooding, and rising temperatures, and make decisions about how to appropriately respond. But current climate models struggle to provide this information quickly or affordably enough to be useful on smaller scales, such as the size of a city. 

Now, authors of a new open-access paper published in the Journal of Advances in Modeling Earth Systems have found a method to leverage machine learning to utilize the benefits of current climate models, while reducing the computational costs needed to run them. 

“It turns the traditional wisdom on its head,” says Sai Ravela, a principal research scientist in MIT’s Department of Earth, Atmospheric and Planetary Sciences (EAPS) who wrote the paper with EAPS postdoc Anamitra Saha. 

Traditional wisdom

In climate modeling, downscaling is the process of using a global climate model with coarse resolution to generate finer details over smaller regions. Imagine a digital picture: A global model is a large picture of the world with a low number of pixels. To downscale, you zoom in on just the section of the photo you want to look at — for example, Boston. But because the original picture was low resolution, the new version is blurry; it doesn’t give enough detail to be particularly useful. 

“If you go from coarse resolution to fine resolution, you have to add information somehow,” explains Saha. Downscaling attempts to add that information back in by filling in the missing pixels. “That addition of information can happen two ways: Either it can come from theory, or it can come from data.” 

Conventional downscaling often involves using models built on physics (such as the process of air rising, cooling, and condensing, or the landscape of the area), and supplementing it with statistical data taken from historical observations. But this method is computationally taxing: It takes a lot of time and computing power to run, while also being expensive. 

A little bit of both 

In their new paper, Saha and Ravela have figured out a way to add the data another way. They’ve employed a technique in machine learning called adversarial learning. It uses two machines: One generates data to go into our photo. But the other machine judges the sample by comparing it to actual data. If it thinks the image is fake, then the first machine has to try again until it convinces the second machine. The end-goal of the process is to create super-resolution data. 

Using machine learning techniques like adversarial learning is not a new idea in climate modeling; where it currently struggles is its inability to handle large amounts of basic physics, like conservation laws. The researchers discovered that simplifying the physics going in and supplementing it with statistics from the historical data was enough to generate the results they needed. 

“If you augment machine learning with some information from the statistics and simplified physics both, then suddenly, it’s magical,” says Ravela. He and Saha started with estimating extreme rainfall amounts by removing more complex physics equations and focusing on water vapor and land topography. They then generated general rainfall patterns for mountainous Denver and flat Chicago alike, applying historical accounts to correct the output. “It’s giving us extremes, like the physics does, at a much lower cost. And it’s giving us similar speeds to statistics, but at much higher resolution.” 

Another unexpected benefit of the results was how little training data was needed. “The fact that that only a little bit of physics and little bit of statistics was enough to improve the performance of the ML [machine learning] model … was actually not obvious from the beginning,” says Saha. It only takes a few hours to train, and can produce results in minutes, an improvement over the months other models take to run. 

Quantifying risk quickly

Being able to run the models quickly and often is a key requirement for stakeholders such as insurance companies and local policymakers. Ravela gives the example of Bangladesh: By seeing how extreme weather events will impact the country, decisions about what crops should be grown or where populations should migrate to can be made considering a very broad range of conditions and uncertainties as soon as possible.

“We can’t wait months or years to be able to quantify this risk,” he says. “You need to look out way into the future and at a large number of uncertainties to be able to say what might be a good decision.”

While the current model only looks at extreme precipitation, training it to examine other critical events, such as tropical storms, winds, and temperature, is the next step of the project. With a more robust model, Ravela is hoping to apply it to other places like Boston and Puerto Rico as part of a Climate Grand Challenges project.

“We’re very excited both by the methodology that we put together, as well as the potential applications that it could lead to,” he says. 



de MIT News https://ift.tt/Z0R42vF

New algorithm discovers language just by watching videos

Mark Hamilton, an MIT PhD student in electrical engineering and computer science and affiliate of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), wants to use machines to understand how animals communicate. To do that, he set out first to create a system that can learn human language “from scratch.”

“Funny enough, the key moment of inspiration came from the movie ‘March of the Penguins.’ There’s a scene where a penguin falls while crossing the ice, and lets out a little belabored groan while getting up. When you watch it, it’s almost obvious that this groan is standing in for a four letter word. This was the moment where we thought, maybe we need to use audio and video to learn language.” says Hamilton. “Is there a way we could let an algorithm watch TV all day and from this figure out what we're talking about?”

“Our model, ‘DenseAV,’ aims to learn language by predicting what it’s seeing from what it’s hearing, and vice-versa. For example, if you hear the sound of someone saying ‘bake the cake at 350’ chances are you might be seeing a cake or an oven. To succeed at this audio-video matching game across millions of videos, the model has to learn what people are talking about,” says Hamilton.

Once they trained DenseAV on this matching game, Hamilton and his colleagues looked at which pixels the model looked for when it heard a sound. For example, when someone says “dog,” the algorithm immediately starts looking for dogs in the video stream. By seeing which pixels are selected by the algorithm, one can discover what the algorithm thinks a word means.

Interestingly, a similar search process happens when DenseAV listens to a dog barking: It searches for a dog in the video stream. “This piqued our interest. We wanted to see if the algorithm knew the difference between the word ‘dog’ and a dog’s bark,” says Hamilton. The team explored this by giving the DenseAV a “two-sided brain.” Interestingly, they found one side of DenseAV’s brain naturally focused on language, like the word “dog,” and the other side focused on sounds like barking. This showed that DenseAV not only learned the meaning of words and the locations of sounds, but also learned to distinguish between these types of cross-modal connections, all without human intervention or any knowledge of written language.

One branch of applications is learning from the massive amount of video published to the internet each day: “We want systems that can learn from massive amounts of video content, such as instructional videos,” says Hamilton. “Another exciting application is understanding new languages, like dolphin or whale communication, which don’t have a written form of communication. Our hope is that DenseAV can help us understand these languages that have evaded human translation efforts since the beginning. Finally, we hope that this method can be used to discover patterns between other pairs of signals, like the seismic sounds the earth makes and its geology.” 

A formidable challenge lay ahead of the team: learning language without any text input. Their objective was to rediscover the meaning of language from a blank slate, avoiding using pre-trained language models. This approach is inspired by how children learn by observing and listening to their environment to understand language.

To achieve this feat, DenseAV uses two main components to process audio and visual data separately. This separation made it impossible for the algorithm to cheat, by letting the visual side look at the audio and vice versa. It forced the algorithm to recognize objects and created detailed and meaningful features for both audio and visual signals. DenseAV learns by comparing pairs of audio and visual signals to find which signals match and which signals do not. This method, called contrastive learning, doesn’t require labeled examples, and allows DenseAV to figure out the important predictive patterns of language itself.

One major difference between DenseAV and previous algorithms is that prior works focused on a single notion of similarity between sound and images. An entire audio clip like someone saying “the dog sat on the grass” was matched  to an entire image of a dog. This didn’t allow previous methods to discover fine-grained details, like the connection between the word “grass” and the grass underneath the dog. The team’s algorithm searches for and aggregates all the possible matches between an audio clip and an image’s pixels. This not only improved performance, but allowed the team to precisely localize sounds in a way that previous algorithms could not. “Conventional methods use a single class token, but our approach compares every pixel and every second of sound. This fine-grained method lets DenseAV make more detailed connections for better localization,” says Hamilton.

The researchers trained DenseAV on AudioSet, which includes 2 million YouTube videos. They also created new datasets to test how well the model can link sounds and images. In these tests, DenseAV outperformed other top models in tasks like identifying objects from their names and sounds, proving its effectiveness. “Previous datasets only supported coarse evaluations, so we created a dataset using semantic segmentation datasets. This helps with pixel-perfect annotations for precise evaluation of our model's performance. We can prompt the algorithm with specific sounds or images and get those detailed localizations,” says Hamilton.

Due to the massive amount of data involved, the project took about a year to complete. The team says that transitioning to a large transformer architecture presented challenges, as these models can easily overlook fine-grained details. Encouraging the model to focus on these details was a significant hurdle.

Looking ahead, the team aims to create systems that can learn from massive amounts of video- or audio-only data. This is crucial for new domains where there’s lots of either mode, but not together. They also aim to scale this up using larger backbones and possibly integrate knowledge from language models to improve performance.

“Recognizing and segmenting visual objects in images, as well as environmental sounds and spoken words in audio recordings, are each difficult problems in their own right. Historically researchers have relied upon expensive, human-provided annotations in order to train machine learning models to accomplish these tasks,” says David Harwath, assistant professor in computer science at the University of Texas at Austin who was not involved in the work. “DenseAV makes significant progress towards developing methods that can learn to solve these tasks simultaneously by simply observing the world through sight and sound — based on the insight that the things we see and interact with often make sound, and we also use spoken language to talk about them. This model also makes no assumptions about the specific language that is being spoken, and could therefore in principle learn from data in any language. It would be exciting to see what DenseAV could learn by scaling it up to thousands or millions of hours of video data across a multitude of languages.”

Additional authors on a paper describing the work are Andrew Zisserman, professor of computer vision engineering at the University of Oxford; John R. Hershey, Google AI Perception researcher; and William T. Freeman, MIT electrical engineering and computer science professor and CSAIL principal investigator. Their research was supported, in part, by the U.S. National Science Foundation, a Royal Society Research Professorship, and an EPSRC Programme Grant Visual AI. This work will be presented at the IEEE/CVF Computer Vision and Pattern Recognition Conference this month.



de MIT News https://ift.tt/B6wvJrK