- Prof. David Beck (email@example.com)
- Prof. Stéphanie Valleau (firstname.lastname@example.org)
- Sabiha Rustam (email@example.com)
- Nisarg Joshi (firstname.lastname@example.org)
Logistics for 2021
Zoom meeting ID numbers can be found on the Canvas course webpages, under the Zoom tab
- Data Science Methods for Clean Energy Research (DSMCER, ChemE 545)
Also known as Molecular Data Science Survey
- Tue & Thr
- 11:30 - 12:50
- Software Engineering for Molecular Data Scientists (SEMDS, ChemE 546)
- Tue & Thr
- 2:30 - 3:50
Scientists, engineers, and other technical professionals require skills in computing and data analysis to do their jobs. We refer to these as data science skills.
These two courses teach graduate students the software engineering and molecular data science skills to be successful technical professionals in the 21st Century. In particular, this courses teach how to approach computational research with reproducibility in mind: to create sharable and reusable research projects that incorporate both computation and data. The courses also provide students with a survey of machine learning methods including supervised and unsupervised methods.
In SEMDS students will learn the following skills:
- Developing software in a way that it can be used by others, including documentation, installing packages, automating setup, and running computational studies.
- Creating technical specifications for what a program should do (its use cases) and how this is accomplished (software design).
- Creating, updating, and sharing a project using version control (specifically GitHub) for collaborative software development.
- Programming using the Python scientific stack, including numpy, pandas, and matplotlib.
- Developing unit tests that validate important aspects of the project implementation, and, more broadly, using test-driven development to build software.
- Searching, evaluating, and integrating into a project an externally developed Python packages as well as creating your own Python packages.
In DSMCER students will learn the following skills:
- Statistical reasoning and methods including distributions, hypothesis testing and error analysis for multiple data types
- Modern data visualization methods
- A wide range of machine learning methods with direct applications for problems in the design, synthesis and characterization of molecules, molecular systems and reactions
- Hands on experience with the Python scientific stack and machine learning tools like TensorFlow and PyTorch
- Data management strategies such as relational data models and SQL
The courses emphasize a hands-on learning approach in which class time is often used for problem solving in small groups. The first 6 weeks will teach the skills described above. The remaining weeks are devoted to the student’s class project, creating a computational research project of their choosing.