Apr-June 2018 Learning

Books Read (related to work/professional development/betterment):

Articles:

Giving meaning to 100 billion analytics events a day

  1. Tracking events were sent by browser over HTTP to a dedicated component and enqueues them in a Kafka topic. You can build a Kafka equivalent in BigQuery to use as Data Warehouse system
  2. ‘When dealing with tracking events, the first problem you face is the fact that you have to process them unordered, with unknown delays.
    1. The difference between the time the event actually occurred (event time) and the time the event is observed by the system (processing time) ranges from the millisecond up to several hours.’
  3. The key for them was finding ideal batch duration

What is a Predicate Pushdown?

  1. Basic idea is certain parts of SQL queries (the predicates) can be “pushed” to where the data lives and reduces query/processing time by filtering out data earlier rather than later. This allows you to optimize your query by doing things like filtering data before it is transferred over a network, loading into memory, skipping reading entire files or chunks of files.
  2. ‘A “predicate” (in mathematics and functional programming) is a function that returns a boolean (true or false). In SQL queries predicates are usually encountered in the WHEREclause and are used to filter data.’
  3. Predicate pushdowns filters differently in various query environments, eg Hive, Parquet/ORC files, Spark, Redshift Spectrum, etc.

I hate MVPs. So do your customers. Make it SLC instead.

  1. Customers hate MVPS, too M and almost never V – simple, complete, and lovable is the way to go
  2. The success loop of a product “is a function of love, not features”
  3. “An MVP that never gets additional investment is just a bad product. An SLC that never gets additional investment is a good, if modest product.”

Documenting for Success

  1. Keeping User Stories Lean and Precise with:
    1. User Story Objectives
    2. Use Case Description
    3. User Interaction and Wireframing
    4. Validations
    5. Communication Scenarios
    6. Analytics
  2. Challenges
    1. Lack of participation
    2. Documentation can go sour
  3. Solutions
    1. Culture – tradition of open feedback
    2. Stay in Touch with teams for updates
    3. Documentation review and feedback prior to sprint starts
    4. Track your documents

WTF is Strategy

  1. Strategic teaming is what sets apart seniors from juniors
  2. Strategy needs
    1. Mission: Problem you’re trying to solve and who for
    2. Vision: Idealized solution
    3. Strategy: principles and decisions informed by reality and caveated with assumptions that you commit to ahead of dev to ensure likelihood of success in achieving your vision
    4. Roadmap: Concreate steps
    5. Execution: Day-today activities
  3. “Strategy represents the set of guiding principles for your roadmapping and execution tasks to ensure they align with your mission and vision.”

Corporate Culture in Internet Time

  1. “”The dirty little secret of the Internet boom,” says Christopher Meyer, who is the author of “Relentless Growth,” the 1997 management-based-on-Silicon-Valley-principles book, “is that neither startup wizards nor the venture capitalists who fund them know very much about managing in the trenches.”
  2. “ The most critical factor in building a culture is the behavior of corporate leaders, who set examples for everyone else (by what they do, not what they say). From this perspective, the core problem faced by most e-commerce companies is not a lack of culture; it’s too much culture. They already have two significant cultures at play – one of hype and one of craft.”
  3. Leaders need to understand both craft and hype cultures since they have to rely on teams that come from both to deliver. They need to set-up team cultures and infrastructure that supports inter-team learning.

Do You Want to Be Known For Your Writing, or For Your Swift Email Responses? Or How the Patriarchy has fucked up your priorities

  1. Women are conditioned to keep proving themselves – our value is contingent on ability to meet expectation of others or we will be discredited. This is often true, but do you want to a reliable source of work or answering e-mails?
  2. Stop trying to get an A+ in everything, it’s a handicap in making good work. “Again, this speaks most specifically to women, POC, queers, and other “marginalized” folks. I am going to repeat myself, but this shit bears repeating. Patriarchy (and institutional bigotry) conditions us to operate as if we are constantly working at a deficit. In some ways, this is true. You have to work twice as hard to get half the credit. I have spent most of my life trying to be perfect. The best student. The best dishwasher. The best waitress. The best babysitter. The best dominatrix. The best heroin addict. The best professor. I wanted to be good, as if by being good I might prove that I deserved more than the ephemeral esteem of sexist asshats.”

Listen to me: Being good is a terrible handicap to making good work. Stop it right now. Just pick a few secondary categories, like good friend, or good at karaoke. Be careful, however of categories that take into account the wants and needs of other humans. I find opportunities to prove myself alluring. I spent a long time trying to maintain relationships with people who wanted more than I was capable of giving

  1. Stop thinking no as just no but saying yes to doing your best work

Dear Product Roadmap, I’m Breaking Up with You

  1. A major challenge is setting up roadmap priorities without real market feedback, especially in enterprise software
  2. Roadmaps should be planned with assets in place tied closely to business strategy
    1. A clearly defined problem and solution
    2. Understanding of your users’ needs
    3. User Journeys for the current experience
    4. Vision -> Business Goals -> User Goals -> Product Goals -> Prioritize -> Roadmap
  3. Prioritization should be done through the following lens: feasibility, desirability, and viability

The 7 Steps of Machine Learning Google Video

  • Models are created via training
  • Training helps create accurate models that answers questions correctly most of the time
  • This require data to train on
    • Defined features for telling apart beer and wine could be color and alcohol percentage
  • Gathering data, quality and quantity determine how good model can be
  • Put data together and randomize so order doesn’t affect how that determines what is a drink for example
  • Visualize and analyze during data prep if there’s a imbalance in data in the model
  • Data needs to be split, most for (70-80%) and some left for evaluation to test accuracy (20-30%)
  • A big choice is choosing a model – eg some are better for images versus numerical -> in the beer or wine example is only two features to weigh
  • Weights matrix (m for linear)
  • Biases metric (b for linear)
  • Start with random values to test – creates iterations and cycles of training steps and line moves to split wine v beer where you can evaluate the data
  • Parameter tuning: How many times we through the set -> does that lead to more accuracies, eg learning rate how far we are able to shift each line in each step – hyperparameters are experimental process bit more art than science
  • 7 Steps: Gathering Data -> Preparing Data -> Choosing a Model -> Training -> Evaluation -> Hyperparameter Tuning -> Prediction

Qwik Start Baseline Infra Quest: 

  • Cloud Storage Google Consolae
  • Cloud IAM
  • Kubernetes Engine

Treehouse Learning:  

Javascript OPP

  • In JavaScript, state are represented by objects properties and behaviors are represented by object methods.
    • Radio that has properties like station and volume and methods like turning off or changing a station
  • An object’s states are represented by “property” and its behaviors are presented by “method.”
  • Putting properties and methods into a package and attaching it to a variable is called encapsulation.

Intro SQL Window Functions

  • Function available in some variations of SQL that lets you analyze a row in context of entire result set – compare one row to other rows in a query, eg percent of total or moving average

Common Table Expressions using WITH

  • CTE – a SQL query that you name and reuse within a longer query, a temporary result set
  • You place a CTE at the beginning of a complete query using a simple context
--- create CTES using the WITH statement
WTH cte_name AS (
  --- select query goes here
)

--- use CTEs like a table
SELECT * FROM cte_name
  • CTE name is like an alias for the results returned by the query, you can then use the name just like a table name in the queries that follow the CTE
WITH product_details AS (
  SELECT ProductName, CategoryName, UnitPrice, UnitsInStock
  FROM Products
  JOIN Categories ON PRODUCTS.CategoryID = Categories.ID
  WHERE Products.Discontinued = 0
)

SELECT * FROM product_details
ORDER BY CategoryName, ProductName
SELECT CategoryName, COUNT(*) AS unique_product_count, 
SUM(UnitsInStock) AS stock_count
FROM product_details
GROUP BY CategoryName
ORDER BY unique_product_count
  • CTE makes code more readable, organizes queries into reusable modules, you can combine multiple CTEs into a single query, it can better match of how we think of results set in the real world
    • all orders in past month-> all active customers -> all products and categories
    • Each would be a CTE
  • Subqueries create result sets that look just like a table that can be joined to another tables
WITH all_orders AS (
  SELECT EmployeeID, Count(*) AS order_count
  FROM Orders
  GROUP BY EmployeeID
),
late_orders AS (
    SELECT EmployeeID, COUNT(*) AS order_count
    FROM Orders
    WHERE RequiredDate <= ShippedDate
    GROUP BY EmployeeID
)
SELECT Employees.ID, LastName,
all_orders.order_count AS total_order_count,
late_orders.order_count AS late_order_count
FROM Employees
JOIN all_orders ON Employees.ID = all_orders.EmployeeID
JOIN late_orders ON Employees.ID = late_orders.EmployeeID
  • Remember one useful feature of CTES is you can reference them later in other CTEs, eg. revenue_by_employee below pulling from all_sales
  • You can only reference a CTE created earlier in the query, eg first CTE can’t reference the third
WITH
all_sales AS (
  SELECT Orders.Id AS OrderId, Orders.EmployeeId,
  SUM(OrderDetails.UnitPrice * OrderDetails.Quantity) AS invoice_total
  FROM Orders
  JOIN OrderDetails ON Orders.id = OrderDetails.OrderId
  GROUP BY Orders.Id
),
revenue_by_employee AS (
  SELECT EmployeeId, SUM(invoice_total) AS total_revenue
  FROM all_sales
  GROUP BY EmployeeID
),
sales_by_employee AS (
  SELECT EmployeeID, COUNT(*) AS sales_count
  FROM all_sales
  GROUP BY EmployeeID
)
SELECT revenue_by_employee.EmployeeId,
Employees.LastName,
revenue_by_employee.total_revenue,
sales_by_employee.sales_count,
revenue_by_employee.total_revenue/sales_by_employee.sales_count AS avg_revenue_per_sale
FROM revenue_by_employee
JOIN sales_by_employee ON revenue_by_employee.EmployeeID = sales_by_employee.EmployeeID
JOIN Employees ON revenue_by_employee.EmployeeID = Employees.Id
ORDER BY total_revenue DESC
Advertisements

Weekly Data Viz Decomp: The Guardian’s Premier League Transfer Window Summer 2016

Weekly data visualization decomps to keep a look out for technique and learning.

This week’s viz: Premier League: transfer window summer 2016 – interactive

Decomposition of a Visualization:

  • What are the:
    • Variables (Data points, where they are, and how they’re represented):
      • Bubble size for size of transfer
      • Color hue denoting transfer or out of team
      • Position for date close to transfer window
    • Data Types (Quantitative, Qualitative, Categorical, Continuous, etc.):
      • Qualitative and categorial
    • Encodings (Shape, Color, Position, etc.):
      • Shape, position, size, color hue
  • What works well here?
    • Showing a small multiples type view for each team and their transfers
  • What does not work well and what would I improve?
    • Having the totals summary numbers on the side of the charts is a little unorthodox and unintuitive
    • Bubbles seem to be placed arbitrarily without thought to the y-axis, even though the x-axis has meaning
    • Not immediately clear why some players are featured and noted in tooltips versus those that are not
  • What is the data source?  Do I see any problems with how it’s cited/used
    • Seems to be original Guardian data collected about the English Premier League, but not as clearly stated as I’d like to expect
  • Any other comments about what I learned?
    • Example of something pleasing to the eye in terms of color hue and perhaps some flash factor, but perhaps not that functional to explore upon closer examination.
      • Certainly sense for the purposes of the Guardian though in putting out this story and is a technique I’d borrow if I had a use case
      • Good for showing a bigger picture view
    • Probably not worth it in terms of the work it would be taken incrementally as filters are difficult to work and can be computationally expensive, but the nerd in me would have liked to search for the player

More MLS 2015 Visual Exploration Tools

mlsmore.PNG

Previously, I created an interactive view just looking at goal breakdowns by Major League Soccer teams overall for the 2015 season.  I’ve added several more frames to look at the breakdown of the same dataset to this new view in a exploratory way to see variables that correspond or not.

MOTIVATION

As per verbatim from my previous post:

I’ve really into soccer in general, especially international play, after my grad school project where I worked at the Annenberg Innovation Lab collaborating with Havas Sports and Entertainment and IBM on a research project studying soccer fans (see Fans.Passions.Brands, Fan Favorites, and Sports Fan Engagement Dashboard). I also did my degree practicum on Marketing to Female Sports Fans.

I’m now in another universe creating data visualization at an advertising agency and am trying to combine the geeky fandom with practical practice related to my daily work.

 

DESIGN

I deliberately tried to use Tableau components and styling that was out of the ordinary, to some mixed level of success.  I put in a custom color palette in the Tableau repository preferences file.  Also, I tried to take advantage of using the context filters (when you click on one bar graph of a team for example, the other charts only show stats about that team you just click on instantly), scale filters, and pivoting the data on the third dimension, using both a color gradient and size of a value in a chart for instance.

TECH

Trying to stretch the design and exploratory strengths of Tableau here.  The one knock I give for Tableau in 2016 is that it doesn’t present data in a sexy enough way compared to Javascript-based visuals.  On the other hand, none of the Javascript based tools democratizes creating views you can explore with some pretty heavy statistical tools, one could some of the basic functions many people use SPSS for could be better left using Tableau as one integrated tool.  In particular, I utilized the r-squared and p-values to show correlations across different metrics that might matter or be interesting to see how they hold to some teams or not.  There’s not much of correlation between Corner Kicks and Goals for teams overall for instance, but there is greater negative correlation for those teams who have more losses.

Esc the City Un-Conference Talk: Simple Slide Design and Data Viz Crash Course

I’m currently attending a program called Escape the City which “helps mainstream professionals make proactive & entrepreneurial career changes.”  We’re part of the “Founding Members” cohort as the first iteration of this program stateside.

I’m not looking to leave my current job, which I’m pretty happy with. I definitely did want to meet other ambitious people and unconventional thinkers.  A mentor who had done the program in London, where Escape started, said I’d benefit.  I have been loving it so far.  I’ve found it beneficial to being more present, proactive, and creative at work and outside of it.

One of part of the program we did last night was the Un-Conference, where individuals in the program presented on different topics: everything from Learnings from Training for Endurance Races, Self-Acceptance and How to Love Yourself, Web Development 101, Relaxation with Origami, to name just a few.

As part of building on my knowledge and sharing it, I did a talk on Simple Design with a Data Visualization Crash Course below.  I hope my fellow participants found it useful, especially since many of them are thinking about starting and pitching their own businesses.

After 180 Days of Data Viz Learning #jfdi #dataviz #done

I noticed this when I logged into my WordPress account and realized I really need to do this debrief now that I’ve more than properly decompressed and feel a surge from inspiration from attending the OpenVis Conf.

afterlearning.png

A summary of what I did:

I read:

  • The Functional Art by Alberto Cairo
  • The Visual Display of Quantitative Information by Edward Tufte
  • Data Points Visualization That Means Something by Nathan Yau
  • Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics by Nathan Yau
  • The Wall Street Journal Guide to Information Graphics by Dona M. Wong
  • The Best American Infographics 2014 by Gareth Cook
  • Show Me The Numbers by Stephen Few

2016-03-21 09.13.12.jpg

I studied:

  • Knight Center Course on D3.js and Data Visualization
  • Treehouse D3 Course
  • Data Visualization and D3.js on Udacity

I coded/created:

  • Tableau business dashboards on digital marketing attribution, including customized DMA maps, etc that are beyond the typical drag and drop.
  • D3 scatterplots, scatterplot matrixes, node link trees, hierarchal force layouts, sankey, bar charts, bubble charts, sunbursts, histograms, and even pie chart

I accomplished:

  • Gaining a design sensibility for data visualization
  • Understanding data connections and issues around them (eg. Vertica, Cubes, SQL, S3, etc.)
  • Solid foundation of D3
  • Strong skills in Tableau
  • Conceptual understanding of visualization suites in general, such as R libraries, other Javascript libraries, and other Enterprise BI tools (Quikview, Power BI)
  • Being the thought leader in data visualization in my organization

To take a step back, I embarked on this journey because I got new role with the job title as Data Visualization Manager.  I talked about this in my first post and embarked on 180 Days of Data Viz Learning as inspired by Jennifer Dewalt’s Project 180 Websites in 180 Days.  It’s been a journey with highs and lows, good, bad, and ugly.  I walked away with a strong design and business sensibility, new hard skills and knowledge, and an understanding of data visualization at the both an enterprise and the open source level.  

Creating a Design, Business Intelligence, and Product Sensibility

One big thing I set out on as a differentiator was that I didn’t just want to learn to code or just be able to make pretty visualizations.  There are many people who can code better than me and will always be able to code better than me.  There are also many people who can make much more beautiful visuals than me.  I’m not naturally inclined toward art or to computer science in terms of innate talent or passion, but I recognize the importance of bridging those two disciplines would be for this endeavor in my career.  For me, I don’t consider coding my passion.  I’m also no designer or artist.  I’ve never considered myself in the “creatives.” I consider communication and storytelling as my passion, and code is a means to construct that narrative.  Being a bridger of ideas and practices is a number one priority in my life.

The Good

The process really forced me to learn and focus, even if in the end it took far longer than 180 Days, roughly seven months.  Not bad I think considering I did go on two overseas vacations and did a cross country move during that time.  I sincerely think I would not have gotten so much done had I not felt compelled to do some work everyday.  

For my own practical purposes as the “bridger.” I wanted to make sure I had a strong background on design concepts related to data visualization and also how gain a proficiency in the tools required for my role.  Tying that all together is what I wanted to develop out as a strength.  I can talk intelligently about how performance issues in a dashboard can be influenced by projections in HP Vertica or how the data needs to be prepared in a Hadoop cluster first and then how to query it into the right format for a visualization.  I can talk visual encodings and perception from a design perspective, the grammar of graphics and all that.  And I can talk about the strengths and weakness of Tableau/other enterprise tools and what libraries we can use to scale D3.  I can talk about these things, and I slowly get better at doing these things everyday.

Doing a little bit of these things everyday really pushed me in a way I don’t think I would have.  Sometimes it just ended up being five minutes of extra reading at night to five hour work sessions.  Ironically doing the 180 Days had a great side effect in making me aware of a larger strategic view at my work that I realize I lost when I stopped.  I also inadvertently lost 10 lbs and started reading more everyday because this whole endeavor made me much more mindful.

The Bad:

Learning theory and application at once isn’t easy from a motivational perspective.  I’m the only person working with Open Source tools and doing data visualization beyond a dashboard or business analyst (eg excel type) graphs perspective, but I had to do a lot of that in my day-to-day as well.  It can really grind on you reading and seeing beautiful visualization and then taking over ten hours to do a successful visualization in D3.  Prepare for that.

There’s a flipside in doing something everyday, in that by having to do a little bit everyday, it can become a quantity over quality game.  I had more nights that ended up later than I wanted because I rushed to read and learn before going to bed.  It might have made more sense on some days to just do a longer block to focus than try to something EVERY DAY.  I’m trying to learn Javascript and Front-End in general now along with a couple of other topics in my life, and I’m not going about the same way I did with the 180 Days of Data Viz Learning.

Lessons Learned and Tips

  1. Really go for easy wins if you do try to get in some work everyday.  My main goal was to have three takeaways for every lecture session where I was watching videos online or from reading I did that day to absorb the information.  Decomposing visuals was especially helpful and is a good process to learn when you need to turn the tables on your own.
  2. Find partners in your journey.  Up until the OpenVisConf last week, I had no barometer to measure to see how much I knew and learned.  I got down on myself more often than I needed to given how much I did learn and all the balls I had in the air at once.  Having a support group/sounding board would have made the journey better and I would have learned more.
  3. It was hard to learn theory and application at once – mainly because you’ll be making so little progress at first.  It was a bummer and is still a bummer to see how far I am from being able to do the work of people I admire.  Also, unlike other skills I have (eg. I’ve been doing advertising and project management for years), Data Ziz is still new.  I have bigger ambitions than what I can make, and that is reality and that is ok, but it’s hard to accept sometimes.  Maybe this is a mental thing for me, but mentioning it as I’m sure it’s something someone else may run into in their process.
  4. Procrastination is normal.  Figure out if you’re tired out/burned out or else just annoyed and anxious.  If annoyed and anxious, just try working for five minutes.  If you can keep working, chances are you aren’t tired or burned out and have some internal dialogue to work through or some process to improve on.
  5. Data Viz Burn out.  It can happen, and I definitely felt that way, which is why I actually advise against the 180 days straight versus working on a few projects concurrently and going through books and tutorials in a more paced versus trying to pick up bits and pieces everyday.  Also, I got to the point where I was doing Data Viz at work and going home to do more Data Viz, that I got to a point where didn’t like something I enjoyed anymore.  Also, on this note, take  time to just enjoy other Data Viz projects too like a regular audience would, rather than doing critique or learning.  Burnout is normal, rest accordingly.
  6. Don’t give up!  I didn’t get where I wanted to in terms of a creative or technical skillset that I had hoped for after a 180 days and even now, but I did progress significantly in what I know and enjoyed the journey.

 

Day 180 of 180 Days of Data Viz Learning #jfdi #done #dataviz

I’m doing some form of data visualization learning for 180 days because I need to #JFDI.

See post explaining how and why I’m doing this.

Guess what?  It took longer than 180 days, but it’s been a pretty cool journey.  Did my daily learning will post a debrief early next week.  This has been quite the intellectual and emotional exercise for me.  Learned so much about data viz + more.

Eljiah Meeks D3.js in Action

Chapter 5 Layouts
Some last takeaways
  • One key with generators eg. d3.svg.arc is that they have particular settings p 144
  • “One of the core uses of a layout in D3 is to update the graphical chart. All we need to do is make changes to the data or layout and then rebind the data to the existing graphical elements” p 146
  • If transitions are distorted because of default data-binding keys, you may need to change sorts, eg pieChart.sort(null); in conjunction with exit and update behavior p 147
  • Layouts in D3 expect a default representation of data,  usually a JSON object array where the child elements in a hierarchy are stored in child attribute that points to an array.  p 149
  • You can either data munge or get use to using accessor functions p 149
  • Pack layouts have a built-in .padding() function to adjust spacing and a .value() function to size out spacing and influence size of parent nodes p 151

Reading and Learning Data Visualization Theoretically/Critically:

Show Me the Numbers by Stephen Few

Three Takeaways Today

Chapter 5 Visual Perception and Graphical Communication
  • “Our perception of space is primarily two dimension.  We perceive differences in vertical position (up and down) and in horizontal position (left and right) clearly and accurately. We also perceive a third dimension, depth, but not nearly as well.” p 71
  • “We perceive hues only as categorically different, not quantitatively different; one hue is not more or less than another, they’re just different.  In contrast, we perceive color intensity quantitatively, from low to high” p 71
  • Both size and color intensity are not the best way to code quantitative values.  The key is not good at matching a number to a relative size or color intensity -> use length or position if possible instead p 73