We present the research projects that we offer to the students to work on for the semester. These are a range of interesting problems for students to work on under the supervision of the PhD students, and Prof. Venkat. The projects tend to be a semester-long effort, but can be extended beyond said duration based on the progress made.
Research Projects for Fall 2025
Recent advances in machine learning have accelerated model discovery by deriving governing equations directly from observational data. While these so-called black-box models often achieve accurate predictions, they tend to overlook fundamental laws that are critical in chemical engineering applications. Here, we developed a hybrid framework that integrates first-principles-based feature engineering with data-driven techniques to uncover underlying physicochemical mechanisms. We demonstrate its robust performance across diverse domains, including atmospheric chemistry, cellular signaling, and electrochemistry, using synthetically generated sparse and noisy data.
Statistical mechanics (or statistical thermodynamics) is the theory of interactions and emergent behavior of molecules. Game theory is the standard conceptual framework for modeling goal-driven strategic interactions. Combining these two can yield a comprehensive mathematical framework to explain and predict macroscopic behavior from microscopic properties. Our methodology extends our prior work in agent-based modeling of complex systems, where large populations of dynamical agents consume and dissipate energy, leading to emergent behaviors at the macroscopic scale.
Scientists must process thousands of unstructured documents to obtain key information and infer results when researching new drugs. These documents are rich in technical information and domain-specific terms, which, despite their success in other fields, large language models like ChatGPT have difficulty tackling. We developed SUSIE, an ontology-based pharmaceutical information extraction tool that is built to extract semantic triples and present them to the user as knowledge graphs (KGs). The ontology that the student will be interacting with is the Columbia Ontology of Pharmaceutical Engineering (COPE). We have previously explored different methods to interface the generated knowledge graphs and the ontology, finding the best methods for knowledge graph embedding. We are now interested in populating the ontology with the KGs and using neural-symbolic models to query the populated ontology using first-order logic.
This project aims to achieve a deeper understanding of Large Language Models (LLMs) and the influence of the tokens. By analyzing the findings of several research papers and Anthropic AI’s dictionary learning, it can be inferred that tokens are essentially the data points lying in an extremely high-dimensional space. This vector-like property of the token can be compared to the famous example, “King - Man + Woman = Queen”. The objective of this project is to study the embedding vector of the tokens to understand the relationship between individual tokens and a group of tokens. By understanding that “similar tokens appear closer” in the embedding space, these embedding vectors will be studied for their clustering ability. Finally, the token will be modeled as a system using a game-theoretic framework.
We will identify and quantify the minima of an energy-based model trained on the MNIST dataset via equilibrium propagation.
It is known that each layer in a fully connected neural network contributes incrementally by reducing the complexly distributed data points into linearly separable points, allowing the last layer to simply draw the boundary for classification. Previous analysis by our group has shown that each layer in a fully trained Neural Network model exhibits a Log-Normal trend in its neuronal weights. Realizing that the Lognormal weights represent the perfectly trained state or ideal state, this project aims to understand the ability of these ideal layers to reduce complexly shaped data into linearly separable data points.
Research Projects for Fall 2022
Research Projects for Spring 2022
Research Projects for Fall 2021
Black box model identification often involves intractability in functional transformations. In multiple engineering domains, large classes of mechanistic models have been proposed in literature, and it is unwise to not leverage this information.
Aim:
- Estimating multiple input-output relationships from data
- Increase code-base to include specific model forms for efficient model identification
We have built natural language-based models for performing reaction modeling – both in the forward and reverse directions. These models utilized context-free grammars (CFG) for incorporating chemistry information. Our objective is to extend these ideas to build hybrid models using ontologies for drug discovery and manufacturing.
Aim:
- Extend the NLP-based methods for molecule search/optimization
- Populate ontologies using unstructured datasets using named entity recognition and relation extraction