A complete suite of Open Source projects for Code as Data and Machine Learning on Code

How to Contribute OSS Projects

Philosophy & Governance

At source{d} we are creating a suite of Open Source tools enabling “Code as Data” and “Machine Learning on Code”.

We are also great believers in Open Source and its philosophy. Not only is our source code developed out in the open and made available to all, but also our culture, guides, and even OKRs are openly accessible on GitHub.

Upon request from the community, we plan to hold an election for a Technical Steering Committee (TSC) which will act as an escalation point for potential conflicts within projects, encourage cross-project communication/coordination and help with project governance.

Licensing

Our current approach has our technology stack fully open-source under permissive licenses such as Apache 2.0 and GPL 3.0. We have released the source{d} Engine product for single node deployments; multi-node deployments of the source{d} Engine which allow distributed computing over a vast amount of repositories with a large number of concurrent users consist in a proprietary product.

This allows us to charge enterprises who are in need for a large number of nodes but not disadvantage individual developers or smaller organizations to take advantage of our technology.

How to contribute

No need to be an expert in Machine Learning to start contributing to the source{d} tech stack. As Open Source enthusiasts, we think everyone has a unique perspective and ideas that deserve to be heard either online on GitHub or in person at Meetups or conferences. From simple documentation improvements to more advanced pull requests for new features and help organizing #MLonCode meetups, we welcome all kind of contributions from the broader community.

Start contributing today by:

  • Creating issues and submitting pull requests on GitHub
  • Discussing design & change proposals with the source{d} team on Slack
  • Sending an email to devrel@sourced.tech about organizing events & meetups

Projects

Open-source components that make machine learning on source code a reality

The source{d} stack is built on top of open-source components that make machine learning on source code a reality: from datasets to models as well as data retrieval, language analysis and machine learning tools, all is freely available

OSS Projects Highlights

Code Analysis

Babelfish

Babelfish is a self-hosted server for source code parsing. It can parse any file, in any supported language, extracting an Abstract Syntax Tree (AST) from it and converting it into a Universal Abstract Syntax Tree (UAST).

Code Retrieval

go-git

go-git is a highly extensible Git implementation in pure Go language

Code Analysis

Gemini

Gemini is a tool for searching for similar ‘items’ in source code repositories. Supported granularity level or items are: repositories, files and functions.

Datasets

Dataset

Public Git Archive

Models

Model

Topic Modeling

Model

Identifier Embeddings

Model

TF/IDF BoW

Code Retrieval Tools

Code Retrieval

go-git

Code Retrieval

Rovers

Code Retrieval

Borges

Code Analysis Tools

Code Analysis

Babelfish

Code Analysis

Gitbase

Code Analysis

Engine

Code Analysis

Lookout

Machine Learning

Machine Learning

sourced.ml

Applications

Applications

Gemini

Applications

Hercules

Try source{d} Engine today

Discover how source{d} can help your business