Jan-Feb Learning 2019

What’s This?

I’m trying to give myself at least half an hour during the workdays (or at least blocking two hours or so a week at least) to learn something new – namely taking classes/reviewing what I know on Treehouse, reading job related articles, and reading career-related books. Tracking notables here on a monthly basis as a self-commitment and to retain in memory and as reference. I tell off posting this the last six months with work and life has been insanely busy and my notes inconsistent across proprietary work versus my own, but worth a round-up here. Posting with good intentions for next year. Reminding myself that if I don’t capture every bit I did, it’s alright. Just keep yourself accountable.

Books Read

Inspired: How To Create Products Customers Love

Some favorite quotes from Kindle highlights:

  • This means constantly creating new value for their customers and for their business. Not just tweaking and optimizing existing products (referred to as value capture) but, rather, developing each product to reach its full potential. Yet, many large, enterprise companies have already embarked on a slow death spiral. They become all about leveraging the value and the brand that was created many years or even decades earlier. The death of an enterprise company rarely happens overnight, and a large company can stay afloat for many years. But, make no mistake about it, the organization is sinking, and the end state is all but certain.
  • The little secret in product is that engineers are typically the best single source of innovation; yet, they are not even invited to the party in this process.
  • To summarize, these are the four critical contributions you need to bring to your team: deep knowledge (1) of your customer, (2) of the data, (3) of your business and its stakeholders, and (4) of your market and industry.
  • In the products that succeed, there is always someone like Jane, behind the scenes, working to get over each and every one of the objections, whether they’re technical, business, or anything else. Jane led the product discovery work and wrote the first spec for AdWords. Then she worked side by side with the engineers to build and launch the product, which was hugely successful.
  • Four key competencies: (1) team development, (2) product vision, (3) execution, and (4) product culture.
  • It’s usually easy to see when a company has not paid attention to the architecture when they assemble their teams—it shows up a few different ways. First, the teams feel like they are constantly fighting the architecture. Second, interdependencies between teams seem disproportionate. Third, and really because of the first two, things move slowly, and teams don’t feel very empowered.
  • I strongly prefer to provide the product team with a set of business objectives—with measurable goals—and then the team makes the calls as to what are the best ways to achieve those goals. It’s part of the larger trend in product to focus on outcome and not output.

In my experience working with companies, only a few companies are strong at both innovation and execution. Many are good at execution but weak at innovation; some are strong at innovation and just okay at execution; and a depressing number of companies are poor at both innovation and execution (usually older companies that lost their product mojo a long time ago, but still have a strong brand and customer base to lean on).


Articles Read

Machine learning – is the emperor wearing clothes

  1. “The purpose of a machine learning algorithm is to pick the most sensible place to put a fence in your data.”
  2. Different algorithms, eg. vector classifier, decision tree, neural network, use different kinds of fences
  3. Neural networks give you a very flexible boundary which is why they’re so hot now

Some Key Machine Learning Definitions

  1. “A model is overfitting if it fits the training data too well and there is a poor generalization of new data.”
  2. Regularization is used to estimate a preferred complexity of a machine learning model so that the model generalizes to avoid overfitting and underfitting by adding a penalty on different parameters of the model – but this reduces the freedom of the model
  3. “Hyperparameters cannot be estimated from the training data. Hyperparameters of a model are set and tuned depending on a combination of some heuristics and the experience and domain knowledge of the data scientist.”

Audiences-Based Planning versus Index-Based Planning

  • Index is the relative composition of a target audience of a specific program or network as compared to the average size audience in TV universe to give marketers/agencies a gauge of the value of a program or network relative to others using the relative concentrations of a specific target audience
  • Audience-based buying is not account for relative composition of an audience or the context within the audience is likely to be found but rather values the raw number of individuals in a target audience who watch given program, their likelihood of being exposed to an ad, and the cost of reaching them with a particular spot. Really it’s buying audiences versus buying a particular program
  • Index-based campaigns follow TV planning model: maximum number of impressions of a given audience at minimum price -> buy high-indexing media against a traditional age/demo: note this doesn’t include precision index targeting
  • Huge issue is tv is insanely fragmented so even if campaigns are hitting GRP targets, they’re doing so by increasing frequency rather than total reach
  • Note: GRP is measure of a size of an ad campaign by medium or schedule – not size of audience reached. GRPs quantify impressions as a percentage of target population and this percent may thus be greater than 100. This is meant to measure impressions in relation to number of people and is the metric used to compare strength of components in a media plan. There are several ways to calculate GRPs, eg. GRP % = 100 * Reach % * Avg Freq or even just rating TV rating with a rating of 4 gets placed on 5 episodes = 20 GRPS
  • Index-based planning is about impressions delivered over balancing of reach and frequency. Audience-based is about reaching likely customers for results
  • DSPs, etc. should be about used optimized algorithms to assign users probability of being exposed to a spot to maximize probabilities of a specific target-audience reach
  • Audience-based planning is about maximizing reach in most efficient way possible whereas index-based buying values audience composition ratios

Finding the metrics that matter for your product

  1. “Where most startups trip up is they don’t know how to ask the right questions before they start measuring.”
  2. Heart Framework Questions:
    • If we imagine an ideal customer who is getting value from our product, what actions are they taking?
    • What are the individual steps a user needs to take in our product in order to achieve a goal?
    • Is this feature designed to solve a problem that all our users have, or just a subset of our users?
  3. Key points in a customer journey:
    1. Intent to use: The action or actions customers take that tell us definitively they intend to use the product or feature.
    2. Activation: The point at which a customer first derives real value from the product or feature.
    3. Engagement: The extent to which a customer continues to gain value from the product or feature (how much, how often, over how long a period of time etc).

A Beginners Guide to Finding the Product Metrics That Matter

  1. It’s actually hard to find what metrics that matter, and there’s a trap of picking too many indicators
  2. Understand where your metrics fall under, eg. the HEART framework: Happiness, Engagement, Adoption, Retention, Task Success
  3. Don’t measure all you can and don’t fall into the vanity metrics trap, instead examples of good customer-oriented metrics:
    • Customer retention
    • Net promoter score
    • Churn rate
    • Conversions
    • Product usage
    • Key user actions per session
    • Feature usage
    • Customer Acquisition costs
    • Monthly Recurring Revenue
    • Customer Lifetime Value

Algorithms and Data Structures

Intro to Algorithms

  • Algorithm steps a program takes to complete a task – the key skill to derive is to be able to identify which algorithm or data structure is best for the task at hand
  • Algorithm:
    • Clearly defined problem statement, input, and output
    • Distinct steps need to be a specific order
    • Should produce a consistent result
    • Should finish in finite amount of time
  • Evaluating Linear and Binary Search Example
  • Correctness
    • 1) in every run against all possible values in input data, we always get output we expect
    • 2) algorithm should always terminate
  • Efficiency:
  • Time Complexity: how long it takes
  • Space Complexity: amount of memory taken on computer
  • Best case, Average case, Worst case

Efficiency of an Algorithm

  • Worst case scenario/Order of Growth used to evaluate
  • Big O: Theoretical definition of a complexity of an algorithm as a function of the size O(n) – order of magnitude of complexity
  • Logarithmic pattern: in general for a given value of n, the number of tries it takes to find the worst case scenario is log of n + 1 or O(log n)
  • Logarithmic or sublinear runtimes are preferred to linear because they are more efficient

Google Machine Learning Crash Course

Reducing Loss

  • Update model parameters by computing gradient – negative gradient tells us how to adjust model parameters to reduce lost
  • Gradient: derivative of loss with respect to weights and biases
  • Take small (negative) Gradient Steps to reduce lost known as gradient descent
  • Neural nets: strong dependency on initial values
  • Stochastic Gradient Descent: one example at a time
  • Mini-Batch Gradient Descent: batches of 10-1000 – losses and gradients averaged over the batch
  • Machine learning model gets trained with an initial guess for weights and bias and iteratively adjusting those guesses until weights and bias have the lowest possible loss
  • Convergence refers to when a state is reached during training in which training loss and validation loss change very little or not with each iteration after a number of iterations – additional training on current data set will not improve the model at this point
  • For regression problems, the resulting plot of loss vs. w1 will always be convex
  • Because calculating loss for every conceivable value of w1 over an entire data set would be an inefficient way of finding the convergence point – gradient descent allows us to calculate loss convergence iteratively
  • The first step is to pick a starting value for w1. The starting point doesn’t matter so many algorithms just use 0 or a random value.
  • The gradient descent algorithm calculates the gradient of the loss curve at the starting point as the vector of partial derivatives with respect to weights. Note that a gradient is a vector so it has both a direction and magnitude
  • The gradient always points in the direction of the steepest increase in the loss function and the gradient descent algorithm takes a step in the direction of the negative gradient in order to reduce loss as quickly as possible
  • The gradient descent algorithm adds some fraction of the gradient’s magnitude to the starting point and steps and repeats this process to get closer to the minimum
  • Gradient descent algorithms multiply the gradient by a scalar learn as the learning rate/step size
    • eg. If gradient magnitude is 3.5 and the learning rate is .01, the gradient descent algorithm will pick the next point .025 away from the previous point
  • Hyperparameters are the knobs that programmers tweak in machine learning algorithms. You want to pick a goldilocks learning rate, learning rate too small will take too long, too large, the next point will bounce haphazardly and could overshoot the minimum
  • Batch is the total number of examples you use to calculate the gradient in a single iteration in gradual descent
  • Redundancy becomes more likely as the batch size grows and there are diminishing returns in after awhile in smoothing out noisy gradients
  • Stochastic gradient descent is a batch size one 1 – a single example. With enough iterations this can work but is super noisy
  • Mini-batch stochastic gradient descent: compromise between full-batch and SGD, usually between 10 to 1000 examples chosen at random, reduces the noise more than SGD but more effective than full batch

First Steps with Tensorflow

  • Tensorflow is a framework for building ML models. TFX provides toolkits that allows you construct models at your preferred layer of abstraction.
  • The estimator class encapsulates logic that builds a TFX graph and runs a TF session graph, in TFX is a computation specification – nodes in graph represent operations. Edges are directed and represent the passing of an operation (a Tensor) as an operand to another operation