Our project uses a dataset about stroke prediction acquired from kaggle to predict the chances of getting a stroke. For this prediction, our program utilizes a person's gender, age, marital and smoking status, presence of heart disease or hypertension, work and residence type, bmi, and average glucose level..
In the United States, every 40 seconds, someone has a stroke, and every 3 ½ minutes someone dies of a stroke. 1 in 4 people over the age of 25 in the world will experience a stroke. For people who go to the emergency room within 3 hours of the stroke, they are much more likely to avoid serious disability. Our project is aimed at helping people, especially in the healthcare industry, identify if someone has a higher risk of a stroke, so they can spot warning signs/symptoms of a stroke and access treatment quicker.
In 3 weeks, our final product will have accomplised complete analysis of characteristics that results in stroke, and our product will be to make accurate predictions whether a person will encounter a stroke in the near future based on these chatacteristics.
This project uses HTML, CSS, Bootstrap for the front end development and website creation. Along, with these coding languages we have used Python as our backbone. We used Pandas for data management, Numpy for mathematical operations, Sklearn for machine learning and Plotly for generating our interactive plots..

For the first week of AI camp, our team was introduced to Data Science and Data Analytics. We learned about the basics of the Python programming language and analysis. At the end of week 1, we explored various datasets and our team chose to do Stroke Prediction.

After choosing our dataset, we utilized the skills we learned in week 1 and we started cleaning our data and we performed exploratory data analysis by making plots and graphs to analyze patterns and trends to determine which variables were correlated. We learned about the different types of machine learning models and implemented various classification machine learning models. We analyzed each machine learning model by checking for the mean squared error of each model and by creating a confusion matrix. After analyzing the models, we analyzed various metrics including precision, accuracy, F1 score, and recall.

After interpreting our dataset and acquiring the knowledge we needed to understand our dataset, we moved on to front end development. In week 3, we learned about HTML (Hypertext Markup Language) and CSS (Cascading Style Sheets) which we used to start developing our website. We compiled all of our data and built our website into our final product.
Since the problem we are trying to solve is to predict whether a person would encounter a stroke or not, it is classified as a classification problem..
Because we fed our model data that was already labeled, this problem is an example of supervised machine learning.
For the KNN model we specified n_neighbors and leaf_size as equal to 10. After that, we fit the model to the x_train and y_train and tested it. This allowed us to acquire a minimum observed mean squared error hence validating its accuracy for our data.
For the LR model we specified solver as equal to 'liblinear' and random_state as equal to 0. After that, we fit the model to the x_train and y_train and tested it. This allowed us to acquire the same minimum observed mean squared error hence validating its accuracy for our data.
For the SVC model we specified kernel as equal to 'linear'. After that, we fit the model to the x_train and y_train and tested it. This again allowed us to acquire a minimum observed mean squared error hence validating its accuracy for our data.
Accuracy is calculated as (tp + tn)/ (tp + fp _ tn + fn)and you want accuracy to be as close to 1 as possible for the best model.
MSE is calculated by taking the distances of points to the regression line and squaring them. It tells us the average of a set of errors, and we want it to be as close to 0 as possible.
These graphs make it clear that the KNN model is best.
Our results show that the variables in our dataset that have a positive correlation with stroke includes females, people with no heart disease, people who work in the private sector, people with hypertension, people who have never smoked, people who are older in age, people who are married, and people with a slightly higher body mass index. These variables indicate that a person may be at an increased risk for getting a stroke. We evaluated that the most accurate machine learning model that resulted in the lowest mean squared error was K Nearest Neighbors (KNN). We hope that our findings will help in aiding the healthcare industry in the prediction and classification of strokes.
Greetings all! I am an Accounting & Finance major in my Sophomore year at the University of Kansas. A Motorhead at heart, K-Pop/Dramas fan and pursuing a future in Corporate Law.
Hi guys! I'm a rising freshman at Northwestern University planning on majors in data science and biology. I love coding games, biking, and reading.
I am a rising junior in high school who enjoys video games, playing piano, reading, and learning new things.
Hi, I am a college freshman majoring in Computer/Data science with a minor in Bioinformatics. I enjoy playing nintendo, piano, art, stargazing, and listening to lofi music.
Hi! I am a freshman attending Chabot Community College and will be a transfer student to the University of California. After college I hope to attend Medical School to become a Doctor.