Slack Canva
Here we would like to store our Slack Canva that we used for distributing the tasks throughout the course.
Week 1
- [x] Create a git repository (Lukas M.)
- [x] Make sure that all team members have write access to the github repository (Lukas M.)
- [x] Create a dedicated environment for you project to keep track of your packages (Everyone)
- [x] Create the initial file structure using cookiecutter (Lukas M.)
- [x] Fill out the make_dataset.py file such that it downloads whatever data you need and (Vraťa)
- [x] Add a model file and a training script and get that running (Lukas R. + Liza + Weihang)
- [x] Use Pytorch-lightning (if applicable) to reduce the amount of boilerplate in your code (Lukas R. + Weihang)
- [x] Use Weights & Biases to log training progress and other important metrics/artifacts in your code. (Lukas R.)
- [x] Used Hydra to load the configurations and manage your hyperparameters nope, will be replaced by Lightning CLI
- [x] Remember to fill out the requirements.txt file with whatever dependencies that you are using (Lukas M.)
- [x] Setup version control for your data or part of your data (Lukas R.)
- [x] Construct one or multiple docker files for your code (Lukas M.)
- [x] When you have something that works somewhat, remember at some point to to some profiling and see if you can optimize your code
- [x] Build the docker files locally and make sure they work as intended
Week 2
- [x] Write unit tests related to the data part of your code (Lukas M.)
- [x] Update cache command (Lukas M.)
- [x] Write unit tests related to model construction and or model training (the tests should be written into test folder and the workflow into .github/workflow folder) (liza)
- [x] Read the csv from datafolder (Lukas M.)
- [x] Consider running a hyperparameter optimization sweep. (Liza)
- [x] Update the dvc bucket with the new files (Lukas R.)
- [x] path while running
python src/data/make_dataset.py
is wrong inside conda (works for windows/ubuntu tho, would be nice if someone can double check) - [x] columns for
src/model/train_model.py
are not the same as inside data/processed/dataset_concatenated.csv` - [x] Calculate the coverage.
- [x] Get some continuous integration running on the github repository (Lukas M.)
- [x] Create a data storage in GCP Bucket for you data and preferable link this with your data version control setup (Lukas R.)
- [x] Create a trigger workflow for automatically building your docker images (Lukas R.)
- [x] Get your model training in GCP using either the Engine or Vertex AI
- [x] Create a FastAPI application that can do inference using your model (Lukas R.)
- [x] If applicable, consider deploying the model locally using torchserve (liza)
- [x] Deploy your model in GCP using either Functions or Run as the backend
- [x] Wandb monitoring (Lukas R) https://wandb.ai/02476mlops/automatic-wheel-assembly-detection?workspace=user-lukyrasocha
- [x] Figure out wandb auth stuff so you can also monitor runs when training via docker (Lukas R)
- [x] LOGGING!!!! (Lukas R)
- [x] Save trained model locally (Lukas R.)
- [x] Save trained model in cloud (so that we can access models that were trained in cloud) (Lukas M.)
- [x] Hyperparameters (now they are set in the beginning of the file, try calling the training via the client and not the file itself... Try using Lightning CLI? or hydra? → OmegaConf (Liza)
- [x] Try automatic hyperparameter tuning using optuna/Lighngtning CLI/Forecasting → WandB (Liza)
Week 3
- [x] Create documentation using MkDocs (include there your personal notes or the readme from the docs folder) (Lukas M.)
- [x] Answer the questions that are part of the report
- [x] Setup monitoring for the system telemetry of your deployed model?
- [x] Setup monitoring for the performance of your deployed model?
BRAINSTORM
- [x] Do we want so save and load checkpoints as well? (might be good practice for large-scale models)
- [x] Do we want to somehow optimize the parameter tuning (e.g. start already tuning around the best parameters from previous runs maybe?)
- [x] We are uploading and using last trained model, should we use the best model instead?
- [x] Should we have just environment.yaml and remove requirements.txt? To reduce overhead
CHECK BEFORE SUBMISSION
- [x] Remember to comply with good coding practices (pep8) while doing the project
- [x] Do a bit of code typing and remember to document essential parts of your code
- [x] Check whether docker runs correctly if started from scratch
- [x] Save slack canva to README
- [x] Update the fodler structure in README
- [x] Try making new conda environment and fill all missing/wrong requirements
- [x] Add branch protection rules to check all pytest before merging
- [x] Delete useless data from GCP bucket
- [x] Revisit your initial project description. Did the project turn out as you wanted?
- [x] Make sure all group members have a understanding about all parts of the project
- [x] Check if all your code is uploaded to github
- [x] Change default flag train to sweep? (now its only -wandb_on)
- [x] Check Coverage Report (now it does not work)(Lukas M.)
UNNECESSARY
- [x] If applicable, play around with distributed data loading
- [x] If applicable, play around with distributed model training
- [x] Check how robust your model is towards data drifting
- [x] Play around with quantization, compilation and pruning for you trained models to increase inference speed