0

Integrate Output Strip for Jupyter Notebooks

Royi 3 weeks ago updated 2 weeks ago 3

Could you integrate something like `nbstripout` to handle Jupyter Notebooks?

Usually Jupyter notebooks contains the output data which classified as binary (Serialization of plots, etc...).

Usually people use a script like `nbstripout` or something based on JQ to remove the output cells before doing the VCS stuff.

You may have a look at: Using IPython / Jupyter Notebooks Under Version Control.

I added the needed things on a repository (Editing its config and attributes files + adding JQ to path using the system `.bashrc` file) while it worked on the Git Shell it had no difference in the UI itself (The diff window still considered the files binary).

It would be great to have that option.

Do you know Jupyter Notebooks?

OK, So we should start with that.
Jupyter Notebooks
(Previously known as iPython Notebooks) are interactive notebooks which are highly popular among data scientists and researchers.

The file format is basically a JSON file with 3 main types of cells:

1. Code.

2. MarkDown.

3. Output.

The problem is the file isn't version control friendly as the output cells might include binary data (Images / Plots, etc...).

Yet since the only data needed to recreate the output data is the code form the previous cells there is a bypass.

You may, before committing, do a cleanup of the Jupyter file.

Currently the way to do so is as described in the links above.
Yet, for some reason, it has no effect on SmartGit itself.

So what I'd like and I think can be a killer feature of SmartGit is to add built in support for Jupyter Notebooks by integrating the stripping phase into SmartGit as a user option.

The easiest way to do so is utilizing JQ as I linked above.

But the issue here is why adding a filter on the repository config file doesn't have effect on SmartGit itself.