Projects
Tags
Want to learn about how this website works under the hood?
Developing a web application to help Chinese language learners practice conversation with a focus on Chinese cultural elements and practical situations.
Working on building a Kaggle like website for prompt engineering competitions.
I led the development of an MVP for a qualitative data collection platform. This platform facilitated one-on-one, scalable conversations between participants and an AI assistant. All conversation and idea data is processed and displayed for easy exploration.
This is a starter project for Django Rest Framework applications including djoser and JWT for authentication.
This project analyzes online polarization surrounding the Israel-Palestine conflict by leveraging data from Reddit. LDA and BERTopic models were employed to categorize posts into key topics such as conflict violence and geopolitical discourse, followed by fine-tuning a Hugging Face XLNet model to classify polarization.
Can inserted errors improve the performance of LLMs? This project investigates the impact of inserting errors in proposed coding solutions and if the performance of LLMs can be improved by doing so.
This project includes an algorithm and visual for the Vehicle Routing Problem with Drones (VRPD) using a dataset from the 2021 Amazon Last Mile Routing Research Challenge. The VRPD is a variant of the Vehicle Routing Problem (VRP) that includes the use of drones to deliver packages.
In 2023, many large language models (LLMs) had limited context windows, which made it difficult to directly summarize lengthy texts, such as books. While this issue has been largely addressed in more recent models, it still persists in some smaller ones. This project aims to develop smart summarization techniques using LLMs to efficiently summarize large documents.
This project automates the extraction of Chinese text from video frames, translates that text, and then adds the extracted text, its Pinyin, and the translation as subtitles to the video.
Led a 2–3-person team for application development of a blockchain-based decentralized patient identity system. The system used decentralized identifiers (DIDs) and the InterPlanetary File System (IPFS) to manage patients. I presented this work in Kyto, Japan, at the 2023 International Conference on Medical and Health Informatics.
In this project, my team executed Spark jobs on AWS EMR to join a 26GB traffic dataset with traffic incidents from S3, processed geographic data with GeoPandas, and visualized slow-response roads using Plotly based on a weighted metric.
In this project I web scraped 7000+ forum discussion posts, cleaned with Pandas, and hand labled with NER metrics. I then analyzed the perforamce of multiple LLMS on NER and sentiment.
This project applies topic modeling to posts from the LoseIt subreddit using LDA. SpaCy is used for text preprocessing, extracting key nouns, verbs, and adjectives, followed by filtering common terms to create a corpus. The optimal number of topics is determined by coherence scores, and results are visualized with pyLDAvis.
My team developed a blockchain-based Le Chat Noir application. The game was programmed using Reach, a novel programming language for blockchain applications, and deployed as a web app using React.