Week 1 – Logistic Regression block by block

Hello, this is Legg. I am a Fall 2020 Scholar at OpenAI, working with my mentor Gabriel Goh in the Clarity team. Between October 2020 and April 2021, my peers and I will be publishing blog posts biweekly to share our journeys as scholars between October 2020 and April 2021.

I tend to get questions about career transition and whether one should invest in additional schooling or industry research residencies. Therefore, before diving into any technical discussions, I want to share a few things about myself so that the readers of this blog understand where my baseline is. Hopefully, this information, in addition to the collection of upcoming technical discussions, can help the readers make better judgments for their careers.

  1. Between 2006-2017, I was trained and practiced as an Architect and Urban Designer. I did not receive any quantitative or engineering training from my bachelor’s and first master’s degrees. (I wouldn’t count grasshopper scripting, 3D modeling, or rendering as technical for AI purposes).
  2. Between 2017-2019, I completed a master’s degree in data science. It is a popular degree for folks who want to transition into Data Science. The courses include statistics, machine learning, and NLP with deep learning. I also took two semesters off for full-time machine learning and data science internships.
  3. Between 2019-2020, I was an AI resident at Microsoft Research. I focused on Vision-Language Navigation(VLN) and Instruction Generation.
  4. Before my start date at OpenAI, I took about ten days to complete the Deep Learning Specialization (deeplearning.ai, Coursera) to refresh my memories and fill any knowledge gaps in the basics. 

In summary, I am still very new to the field of AI. Through this scholar program, I am hoping to gain more experience in research engineering and developing research intuitions.

Back to documenting my first week at OpenAI, I took advantage of the general on-boarding period to learn a little more about Pytorch. I have been using Pytorch for 12 months, mostly on extending Pytorch research codebases such as ALFRED and Room2Room. I want to understand Autograd and the custom functions more. Therefore, I coded up logistic regression from scratch and compared it with the off-the-shelf implementation on a toy problem. More specifically, I implemented the following:

  • Linear, Sigmoid, and loss layers as custom torch.autograd.Functions
  • Gradient checking using torch.autograd.gradcheck
  • SGD optimizer as custom torch.optim.Optimizer
  • Toy dataset and dataloader using sklearn.make_classification, torch.utils.dataDataset, torch.utils.dataDatase

The exercise took about two days. I spent most of the time reading the Pytorch documentation, coding up my implementation, and debugging when the losses of my implementation mismatch the default implementation. The process was quite educational. The implementation is here (feel free to comment with questions/suggestions), along with links to all the references I have used. Along this line of small exercises, I am interested in getting a similar toy problem to train on multiple GPUs. There are at least three ways to do this in Pytorch. More updates on that in about two weeks 🙂