DSC 340: Machine Learning and Neural Network Processing Final Project
Instead of a final exam, in this class, you will submit a final project consisting of two parts: a presentation that will be given during the last week of the course and a final paper due on the last day of the course.
Goals
Perform a machine learning analysis from start to finish
Create a written report of your work in the form of a scientific report
Present your work in the form of a conference presentation
This project can be completed individually or in groups of up to three people. If you choose to work in a group for this project, each person must contribute equally. Appendix B in the textbook Hands-On Machine Learning by Aurélien Géron has a thorough list of the steps that need to be taken to complete a machine learning project from start to finish.
Warning: This project cannot be completed the night before it is due. Instead, you are expected to work on this project for an hour or two every week. In addition, most of the post-class homework assignments will ask you to submit something relating to your final project. Sometimes this will be a component of the report or presentation, and sometimes just a brief (one-paragraph) update on how the project is progressing.
Final Project Topics
For this project, you must create a question you want to answer using machine learning. This could be analyzing a particular data set, building an image classifier, investigating reinforcement learning, or any other aspect of machine learning you find interesting. The project chosen must have sufficient length and complexity for both the final presentation and report and be different from work already done. You can draw inspiration from work you find online, but the project you submit must be original work.
The topic you choose for your final project could be based on something you find interesting from an academic standpoint (astronomy or ecology data sets, for example) or something you are interested in from hobbies or other aspects of your personal life (sports data or housing prices, for example). You will spend a reasonable amount of time on this project throughout the semester, so choosing a topic you are interested in will keep you motivated and make working on this project easier.
Final Presentation Guidelines
You will present your project to your classmates during the last week of class in a 10-minute presentation. You should explain the problem you are trying to solve, how you applied machine learning to solve it, your results, and an analysis of the results. You can use Powerpoint slides or a Python notebook as visuals to help with your presentation. Everyone must present some part of the work if you are working in a group for this project. The class will have two minutes after the presentation to ask the presenter questions about their work.
Submitting Code for Final Project
You must submit the code you used to analyze this project through D2L before the final class of the semester. The code does not have to be perfect, but you should attempt to use the best coding practices while creating it, and the code should be original to the fullest extent possible.
Final Paper Guidelines
Your final paper should follow the format of a scientific article. For example, you can find the submission guidelines for the journal Nature here. You can find a journal from your field and use their submission guidelines if you do not like the format of the Nature articles.
The paper does not have any minimum or maximum length. It should be long enough to contain and thoroughly explain your project. A good length for this paper with a few figures would be about five pages but make it as long or as short as you need.
Your final paper must have the following section: abstract, introduction, methodology, results and discussion, conclusion, and references.
Abstract: An abstract is a summary of your work in one paragraph. It could contain a couple of sentences introducing your problem and why it should be studied, a couple of sentences explaining how you did your machine learning analysis, and a couple of sentences explaining your results.
Introduction: The introduction to your paper is where you introduce the problem you are trying to solve and how you will solve it. You should explain the data set you are working with, why the problem you are trying to solve is important, and details about the machine learning method you chose.
Methodology: This section is also called “Methods” or “Experimental,” depending on the journal on which you are basing your report format. Here is where you explain how you carried out the machine learning analysis. It should be to the level of detail that someone can read this section and exactly recreate what you did. Explain how you formatted and split the data, what implementation of the machine learning algorithm you used, what hyperparameters were chosen, etc.
Results and Discussion: In the results and discussion section, you present your results from performing your analysis (i.e., how well did the machine learning do) and discuss if this is a good result. You can also use this space to explore and plot the data set if you find this helpful in explaining your results. You can also use this space to discuss factors such as the time taken for the machine learning analysis, changes that had to be made to the methodology to make it work, and any problems you encountered performing the analysis. This section should contain graphs and/or tables with captions to help with your explanations.
Conclusion: The conclusion is where you reflect on whether you could fully solve the problem you set out to do. What are other avenues of research that could be performed using the results you discovered? If you could not fully answer the problem you set out to, what could future researchers do to improve your results?
References: This should be a list of formatted citations for all references you used in this work. You should also include in-text citations where appropriate.
All images and tables included in the report must be numbered and captioned.
You must include references for all information and images you did not create directly. You must provide in-text citations and a reference list at the end of your paper.
Final Project Deadlines
Data set and problem statement (DUE: September 8, 2023 BEFORE the start of class)
The problem statement should be ~one paragraph describing what data set you plan on using for this project and what problem you are trying to solve that arises from the data set.
Project analysis (DUE: September 22, 2023 BEFORE the start of class)
Your project analysis should be an approximately one-page document explaining what machine learning method you plan on using, how that method can be used to solve your proposed problem, how is your work different from similar work that has already been done, and what potential problems you foresee occurring as you are working on this project.
Abstract (DUE: October 27, 2023 BEFORE the start of class)
This will be abstract to your final report as described in the paper guidelines section. However, since you will not have your final results yet, you can instead describe any initial results you have gotten or what results you intend to have before you submit your final project. These will be collected into one document and released to the class so everyone can see what will be presented during the final week of class.
Introduction to report (DUE: November 10,2023 BEFORE the start of class)
You will submit the introduction section to your final report using the description in the paper guidelines section above. This is to give you an idea of whether your writing is sufficient for the final paper. The introduction draft you submit here does not have to be the final version of your paper.
Results graph (DUE: November 21, 2023 BEFORE 5pm)
You must submit one graph from the results section of your final project. It can be any graph you want, but it must be well formatted and convey the data it presents.
Presentation, report, and final code (DUE: December 4, 2023 BEFORE the start of class)
Your presentation, report, and final code need to be upload to D2L before the start of class on December 4. You will be given a presentation time during the week of December 4 to present your work to the class.
Final Project Grading
The final project will be a total of 100 grade points that you can earn and will be divided into the following categories:
Submitting the data set and the problem (5 points maximum): Full points will be awarded if a complete and precise description of the problem and data set to be used in the project is submitted, half points for a submission that is not clear or complete, and 0 points if no submission.
Submitting a project analysis (10 points maximum): Full points will be awarded if the project analysis is clear, complete, and contains all of the required information; 8 points will be awarded if there are minor problems with the analysis, 5 points if there are one or more significant problems, and 0 points if there is no submission.
Submitting an abstract (10 points maximum): Full points will be awarded for a complete and well-written abstract, half points for an abstract that is missing elements or unclear, and no points if there is no submission.
Submitting the report introduction (10 points maximum): Full points will be awarded if the introduction is complete and well written, 8 points if there are minor problems, 5 points if there are one or more significant problems, and no points if there is no submission.
Submitting a results graph (5 points maximum): Full points will be awarded if a well-formatted graph is submitted, half points if a graph is submitted but not well-formatted, and no points if there is no submission.
Submitting the final code (10 points maximum): Full points will be awarded if the code is submitted, complete, and is original
Final paper (25 points maximum): Your paper will be scored on the information it contains, whether or not it follows the guidelines listed above, and to a small extent on its grammar and readability. It should be evident that you have put some time into ensuring the paper is well-written and contains all the necessary information.
A score of 25 means the paper is well-written, clear, and contains all necessary information.
A score of 20 means a few minor problems with the paper, but otherwise, it is acceptable (multiple grammatical mistakes or apparent signs of not proofreading, some parts of the paper could be more straightforward, small amounts of missing information, etc.).
A score of 15 means one major problem with the paper (i.e., missing sections, lack of citations, one section that needs to be completed, very poorly written, etc.).
A score of 10 means two or more significant problems with the paper (multiple sections missing or incomplete, etc.).
A score of 5 means that minimum work was put into the paper that was turned in.
A score of 0 means that no paper was turned in.
Final presentation (25 points maximum): Your presentation will be graded on its content, organization, and the preparation of the presenters.
A score of 25 means the presentation contained all the needed information, was clear, and the presenters were well prepared.
A score of 20 means the presentation was good, but there were minor problems (missing small amounts of information, parts needed to be more explicit, or presenters seemed to know the material but were unprepared for the presentation).
A score of 15 means there was one major problem with the presentation (large amounts of missing information, most of the presentation was unclear, presenters did not seem to be familiar with the topic, the presentation was too short or went way too long, etc.).
A score of 10 means there were two or more significant problems.
A score of 5 means that minimum effort was put into the presentation.
A score of 0 means that the presentation was not given.
If any of the work you submit is found to be plagiarized or not original, you will receive an automatic zero for the project.
Project Examples and Inspiration
There are many data sets found on www.kaggle.com that span a wide range of fields and machine learning paradigms (i.e., classification, regression, clustering, etc.)
For those interested in biology and ecology, the iris data set is a famous machine learning data set that aims to classify irises into one of three species based on some measurements from each flower. There are data sets online for different types of plants and animals, so your project could be to build a species classification algorithm for a data set you find interesting. You could also perform clustering or dimensionality reduction to further your analysis.
Differential equations occur in all fields of science and engineering and can be used to model many exciting systems. Neural networks can be used to solve differential equations, sometimes too much better accuracy than numerical differential equation solvers.
Image classification has many applications, from nuclear and particle physics experiments to classifying images of animals and plants to classifying types of architecture or art. As a warning, if you take this path, image classification algorithms are time-consuming and require a large amount of training data that may have to be hand labeled.
There are many data sets based on different sports. You could attempt to predict batting averages for baseball players for the next season, predict if a college football player is a good choice for the NFL draft, or investigate a sport that interests you.
Recurrent neural networks predict future values of data sets such as weather, stock markets, populations, and many other data sets that evolve over time.