Projects

Tags
AI
Amazon EMR
Amazon RDS
Amazon S3
Azure Machine Learning
BeautifulSoup
Big Data
Celery
CI/CD
C++
Django
Docker
Fine-tuning
Hadoop
Heroku
Java
JavaScript
LangChain
Next.js
NLTK
NumPy
OpenAI API
Pandas
Pinecone
Plotly
PostgreSQL
Python
PyTorch
React
Redis
Redux
RESTful API
Scrapy
scikit-learn
Solidity
spaCy
Spark
TypeScript
Active Development2 min read
Personal Website

Want to learn about how this website works under the hood?

React
Next.js
TypeScript
OpenAI API
Active Development3 min read
Conversation-based LLM App for Practical Chinese Language Learning

Developing a web application to help Chinese language learners practice conversation with a focus on Chinese cultural elements and practical situations.

Python
Django
TypeScript
React
PostgreSQL
RESTful API
Redux
OpenAI API
Next.js
Active Development1 min read
Prompt Engineering Competition Web App

Working on building a Kaggle like website for prompt engineering competitions.

Python
Django
PostgreSQL
RESTful API
OpenAI API
23 Aug 202410 min read
Tacit

I led the development of an MVP for a qualitative data collection platform. This platform facilitated one-on-one, scalable conversations between participants and an AI assistant. All conversation and idea data is processed and displayed for easy exploration.

Python
Django
React
PostgreSQL
Heroku
Amazon RDS
Redis
Celery
RESTful API
CI/CD
Redux
Pinecone
OpenAI API
JavaScript
20 Aug 20242 min read
Django Project Structure with Authentication

This is a starter project for Django Rest Framework applications including djoser and JWT for authentication.

Python
Django
RESTful API
30 Apr 20245 min read
NLP Analysis on Subreddit Polarization

This project analyzes online polarization surrounding the Israel-Palestine conflict by leveraging data from Reddit. LDA and BERTopic models were employed to categorize posts into key topics such as conflict violence and geopolitical discourse, followed by fine-tuning a Hugging Face XLNet model to classify polarization.

Python
Fine-tuning
Big Data
LangChain
NLTK
scikit-learn
24 Apr 20244 min read
Improving LLM Code Generation with Intentional Mistakes

Can inserted errors improve the performance of LLMs? This project investigates the impact of inserting errors in proposed coding solutions and if the performance of LLMs can be improved by doing so.

OpenAI API
Python
Docker
22 Apr 20246 min read
Vehicle Routing Problem with Drones

This project includes an algorithm and visual for the Vehicle Routing Problem with Drones (VRPD) using a dataset from the 2021 Amazon Last Mile Routing Research Challenge. The VRPD is a variant of the Vehicle Routing Problem (VRP) that includes the use of drones to deliver packages.

Python
Big Data
Pandas
Plotly
30 Nov 20234 min read
LLM Smart Summarization

In 2023, many large language models (LLMs) had limited context windows, which made it difficult to directly summarize lengthy texts, such as books. While this issue has been largely addressed in more recent models, it still persists in some smaller ones. This project aims to develop smart summarization techniques using LLMs to efficiently summarize large documents.

Python
LangChain
OpenAI API
21 Aug 20235 min read
Chinese Video Translation and Captioning

This project automates the extraction of Chinese text from video frames, translates that text, and then adds the extracted text, its Pinyin, and the translation as subtitles to the video.

Python
15 Jun 20235 min read
Blockchain-Based Patient Identity System

Led a 2–3-person team for application development of a blockchain-based decentralized patient identity system. The system used decentralized identifiers (DIDs) and the InterPlanetary File System (IPFS) to manage patients. I presented this work in Kyto, Japan, at the 2023 International Conference on Medical and Health Informatics.

Solidity
React
JavaScript
02 May 20235 min read
Emergency Response Big Data Analytics

In this project, my team executed Spark jobs on AWS EMR to join a 26GB traffic dataset with traffic incidents from S3, processed geographic data with GeoPandas, and visualized slow-response roads using Plotly based on a weighted metric.

Spark
Big Data
Amazon EMR
Amazon S3
Plotly
Pandas
25 Apr 20233 min read
Named Entity Recognition with LLMs

In this project I web scraped 7000+ forum discussion posts, cleaned with Pandas, and hand labled with NER metrics. I then analyzed the perforamce of multiple LLMS on NER and sentiment.

Scrapy
Python
OpenAI API
20 Mar 20236 min read
Topic Modeling with Gensim LDA

This project applies topic modeling to posts from the LoseIt subreddit using LDA. SpaCy is used for text preprocessing, extracting key nouns, verbs, and adjectives, followed by filtering common terms to create a corpus. The optimal number of topics is determined by coherence scores, and results are visualized with pyLDAvis.

Python
spaCy
15 Jun 20213 min read
Lechat Noir with Reach Blockchain Programming Language

My team developed a blockchain-based Le Chat Noir application. The game was programmed using Reach, a novel programming language for blockchain applications, and deployed as a web app using React.

Solidity
React
Docker
JavaScript