Mini Project: Data Cleaning App using ChatGPT API and Streamlit
Discover how I built an AI-powered data cleaning app using the ChatGPT API to automate CSV file cleaning. Learn about its features, challenges, benefits, and future improvements to make it production-ready.

In my previous blog, we learnt how to use ChatGPT API and understood the key parameters used in it.

Naturally, the next step was to put that knowledge into action by building something practical.
What did I build?
I created an AI-powered Data Cleaning App that allows users to clean their CSV files easily. The key features of the app include:
- Upload a CSV file and preview its contents.
- Generate AI-powered suggestions for potential data cleaning steps.
- Select the suggested cleaning steps you want to apply.
- Automatically apply the selected steps and download a clean, processed CSV.
Video Demo
Here's a short video to see the app in action!
I’ve uploaded a sample dataset containing employee salary information. Once uploaded, the app displays a preview of your data along with a summary of key information, including the number of rows and columns. For a detailed overview of the dataset, I clicked on the ‘Data Profiling’ option. This generated a quick profile, showing column names, data types, missing values, and unique entries.
What are the benefits of the App?
Below are the reasons why an app like this is a must-have.
- No Coding Required - The app makes data cleaning accessible to everyone, regardless of their technical background.
- User Control and Flexibility - Users have full control over the cleaning process by choosing which suggestions to apply.
- Saves Time and Effort - No need to manually scan and spend time on what needs to be cleaned in the data and spend hours on data prep.
- Clean and Ready-to-Use Data - After applying the selected cleaning steps, the app generates a clean version of the CSV, ready for analysis or further processing.
Now you may ask, why should I use an app like this when I can do the same tasks using Chat GPT directly?
While it’s true that you can perform data cleaning tasks by interacting with ChatGPT directly, here's why the App can be more beneficial:
- Automated Cleaning in One Go
Instead of asking ChatGPT multiple questions and manually applying suggestions to your data, the app automates the process. You just review and select the cleaning steps, and the app applies them instantly, generating a clean CSV for download. - Reduces Cognitive Load
Instead of figuring out the right prompts for every data cleaning task, the app suggests steps automatically—freeing you up to focus on making decisions rather than crafting instructions.
In essence, while ChatGPT can assist with data cleaning, this app takes it a step further by offering speed, automation, and ease of use—making the entire process more efficient and hassle-free.
Challenges Along the Way:
While I’ve got myself a working version that does a decent job, getting there wasn’t exactly a walk in the park. Streamlit, for all its simplicity, requires the code to be in a .py
file and run from a command prompt. I was used to working with Jupyter Notebooks, where everything gets saved in a .ipynb
format. This mismatch had me stuck for a while until I stumbled upon this blog that showed me the easiest way to convert .ipynb
files to .py
. A simple fix, but it felt like a huge win at that point.
Next came the challenge of displaying the cleaning suggestions in a way that looked professional and intuitive. I didn’t just want to dump a list of suggestions — I wanted it to have clear headers, sub-headers, and checkboxes to allow users to select what they wanted to apply. This part was a lot of trial and error with the Streamlit UI. I played around with different layouts, experimented with various components, and spent way too much time tweaking the aesthetics until I finally got something that looked polished.
But the trickiest bit? Making sure that only the selected steps were applied when the “Apply” button was clicked. It wasn’t enough to just store the steps — I had to ensure that only the chosen ones were executed, which meant tracking user selections dynamically and applying them conditionally. This part took some serious debugging and patience, but seeing it all come together in the end made it worth the effort.
Improvements to make it live (Production Ready)!
To turn this prototype into a fully functional app, there’s still plenty of work to be done. While the current version gets the job done with basic cleaning operations, the app can be made more dynamic and robust.
Here’s what’s on my mind:
- Make the cleaning function use ChatGPT API as well: Right now, the
apply_cleaning()
function is relatively simple and can only handle basic operations. To enhance this, I could leverage the ChatGPT API again to dynamically generate and apply selected cleaning steps. This would allow the app to cater to a wider range of cleaning scenarios without hard-coding every possible step. - Categorize Cleaning Suggestions: To give users more control and structure over the cleaning process, I’m thinking of introducing categories for the suggested cleaning steps. Something like:
- Must Apply Steps – Essential operations like removing duplicates, handling missing values, and standardizing formats.
- Nice-to-Have Options – Recommended steps such as trimming whitespace, correcting data types, or minor formatting tweaks.
- Advanced Cleaning Options – More complex transformations like feature engineering, outlier removal, or deduplication based on fuzzy matching.
- User-Defined Rules and Custom Cleaning Logic: More often that we think, the cleaning steps are based on custom business logic. So perhaps give the user the ability to define those custom rules like filter out rows where age is greater than 72 can make the app a lot more useful.
- Upload and Export Files Multiple Formats: Currently the app only take CSV file as input and allows export in CSV format as well. However, offering options to import and export it in different formats (CSV, Excel, JSON, etc.) would make the app more versatile and user-friendly.
These enhancements would take the app from a basic prototype to a powerful, user-centric data cleaning tool. With a more dynamic backend, a refined UI, and advanced functionality, it could easily become a go-to solution for data preparation tasks.
What do you think? Anything else you’d wish the app should have? 😊
Reach out on socials and let's collaborate or chat about it more!