Code Organization

#programming

Main Idea

This is really how I wish I had always organized the code for my research projects. I develop in VSCode with fairly standard extensions. For each new idea, I almost always begin with a Jupyter notebook. This setup is great for quickly changing things and getting immediate feedback. By only running the cells you need and being able to interactively output and plot, the debugging process is generally faster and easier for getting a proof-of-concept up and running. Most of these explorations stay small, and I lazily use a large shared folder and borrow python environments from other larger projects. At a certain point, I start pulling larger functions and classes into separate .py files, mostly when the notebook starts to get a bit crowded.

The next step beyond this is setting up a standalone project, which usually arises when there are more than just a few .py or more likely when there is some annoying hyperparameter tuning or reproducibility to start paying closer attention to. For reference, while I have close to 50 "exploration notebooks", maybe one third of these make it to this stage. For each of these, I create its own python environment (previously I used exclusively conda, but now I'd strongly recommend uv). In this environment I set up the project as an editable pip project (pip install . -e) and create a repo on GitHub. I generally use a folder structure like the following (in order of priority or creation), all within the main folder project/:

README.md
pyproject.toml
.gitignore
src/
	project/
		__init__.py
		...
notebooks/
data/
tests/
scripts/
configs/

After doing this a few times, I created a helper script to automate some of it.

dir_name='project'
# Or set from an input argument
# dir_name=$1
# Set up the directory structure
mkdir -p src/$dir_name
mkdir tests
mkdir notebooks
mkdir data

# create a .gitignore file
curl -o .gitignore https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore

# create a pyproject.toml file for setup
touch pyproject.toml

touch src/$dir_name/__init__.py
touch tests/test_import.py
  
# edit the pyproject.toml file
cat <<EOT >> pyproject.toml
[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "$dir_name"
version = "0.1.0"
readme = "README.md"

[tool.pytest.ini_options]
minversion = "6.0"
addopts = [
  "-ra -q",
  "--color=yes"
]
testpaths = "tests"
EOT
cat <<EOT >> tests/test_import.py
def test_import():
    try:
        import $dir_name
    except ImportError:
        assert False, "Failed to import $dir_name"
    assert True
EOT
# create a README.md file
touch README.md
cat <<EOT >> README.md
# $dir_name

## Project Description
TODO

## Project Structure
- `src/`: Source Code
- `tests/`: Tests, using pytest
- `notebooks/`: Jupyter Notebooks
- `data/`: Data

## Quick Start
Once an environment is created and activated with the proper dependencies, run the following commands
```bash
git clone <repo_url>
cd $dir_name
pip install -e .
pytest
```
## Github Setup
1. \`git remote add origin <repo_url>\`
2. \`git add .\`
3. \`git commit -m "Initial commit"\`
4. \`git push -u origin main\`
EOT
pip install -e .
pytest
git init
git add .
git commit -m "Initial commit"
echo "Setup complete. Now, set up a new repository on Github and run the following commands:"
grep -A 4 "Github Setup" README.md | tail -n 5