Lorenzo Alberton

Reading / London, UK   ·   Contact me   ·  

Hi! I'm an experienced Software Engineering Manager and Technical Architect, currently CTO at DataSift. I build efficient Engineering organisations and large-scale data platforms (at Twitter, Facebook and Linkedin scale). I occasionally speak at international conferences, contribute to open source projects and mentor other startups.

When I'm not at work, I love studying art history, reading, playing the piano, travelling and doing outdoor activities.


  Latest articles  

On batching vs. latency, and jobqueue models

A little journey exploring job queue models and debunking some programming folklore around the effects of batching on latency.  

14 November 2012

Updated Kafka PHP client library

New PHP client library for the Apache Kafka project. Full support for Kafka 0.7+, with robust socket handling, complete test suite, Zookeeper-based consumer and many other improvements.  

18 September 2012

Musings on some technical papers I read this weekend: Google Dremel, NoSQL comparison, Gossip Protocols

Some random comments on Dremel and a benchmark on Key-Value stores. How to evaluate technical papers and read between the lines.  

3 September 2012

Historical Twitter access - A journey into optimising Hadoop jobs

A journey into optimising Hadoop jobs: the strategies to scan and filter a PetaByte of archived data, schedule new jobs and deliver data fast.  

7 August 2012

Kafka proposed as Apache incubator project

Kafka is a distributed publish-subscribe messaging system developed at LinkedIn, designed to support a very high throughput, persistent messages and parallel loading into Hadoop. A proposal has been submitted to make Kafka an Apache incubator project.  

24 June 2011

  Portfolio  

BBC Homepage

https://www.bbc.co.uk/     Broadcasting / Media

BBC, a world-renowned media company, needs no introduction. After redesigning the home page of their web site on December 2007, they decided to rewrite the underlying code and architecture.  

BBC

BBC iPlayer

https://www.bbc.co.uk/iplayer/     Broadcasting / Media

The BBC iPlayer, simply known as the iPlayer, is an internet television, radio, and cable television service, developed by the BBC and offering high-definition media streams. The new generation of the site (released in summer 2010) brings a new, cleaner interface, and integration with various social networking sites to the TV on-demand service.  

BBC iPlayer

BBC Sport

https://www.bbc.co.uk/sport     Broadcasting / Media

BBC, a world-renowned media company, needs no introduction. I helped several BBC teams (Sports, Music, ...) with the migration to the new web framework and with website performances.  

BBC Sport

Channel 5

https://www.channel5.com/     Broadcasting / Media

Channel 5 (aka Five.tv) is the UK's fifth terrestrial TV channel. Ibuildings was one of the few selected tech partners chosen to drive their effort towards a renewed and stronger web presence.  

Channel 5

AVG

https://www.avg.com/     Security

AVG is one of the world's most recognizable names in online threat protection, with more than 110 million users protected by their software and excellent ratings from the independent antivirus testing labs.  

AVG

Facebook Topic Data

https://www.facebook.com/business/news/topic-data     Media, Advertising, Content Planning

At DataSift, we built the next generation, privacy-first analysis platform for Facebook Topic Data. Facebook Topic Data shows marketers what audiences are saying on Facebook about events, brands, subjects and activities, all in a way that keeps personal information private. Marketers use the information from topic data to make better decisions about how they market on Facebook and other channels, and build product roadmaps.  

Facebook Topic Data

LinkedIn Engagement Insights

PYLON for LinkedIn Engagement Insights enables companies to discover what professionals are reading, sharing, and saying about products, industries, brands, and news on the world’s largest professional network.  

LinkedIn Engagement Insights

  Open Source Projects  

Memcache mover - migrate contents between clusters

This is a simple tool to help with copying the contents of a memcache cluster into a new one, helping with migrations.  

Memcache content migration tool

Golang client library for Apache Kafka v.0.5 - 0.7.x.

Apache Kafka is a high-throughput distributed publish-subscribe messaging system written by the LinkedIn Data Team. I contributed a Golang client providing both a Producer and a Consumer for v.0.5 - 0.7.x.  

Layered Dependency Solver

Layer-based scheduling algorithm for parallel tasks with dependencies. Determines which tasks can be executed in parallel, by evaluating dependencies. Given a list of entries (each with its own dependency list), it can sort them in layers of execution, where all entries in the same layer can be executed in parallel, and have no other dependency than the previous layer.  

Task / worker pool manager in Go

* Start cli tasks automatically * Maintain the desired number of worker processes for each task * Handle automatic restarts when a worker dies or stalls The task manager is be able to start any cli (shell) script from the chosen directory. For tasks that are long-running and meant to be monitored continuously, each worker process should send regular keep-alive messages via a ZeroMQ PUB-SUB channel to communicate its health, and should handle SIGTERM messages when asked to terminate. If the worker doesn't respond to a SIGTERM signal, it will be killed with SIGKILL after a (configurable) grace period. The number of workers stalled/stopped since the task manager was started is reported in the task status.  

Golang client library for Statsd

Golang client library for StatsD. Contains a direct and a buffered client. The buffered version will hold and aggregate values for the same key in memory before flushing them at the defined frequency.  

Go Statsd client

  Presentations  

Scaling teams, processes and architectures

Meltwater Entrepreneurial School of Technology (MEST), Accra, Ghana, 9 December 2017

Talk about the soft side of scalability, covering team management, process implementation and some solid technology-related principles. Based on 10 years of experience building scalable teams and scalable data platforms.  

Monitoring at scale - Intuitive dashboard design

PHPDay 2013, Verona, Italy, 17 May 2013

At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.  

Monitoring at scale - Intuitive dashboard design

Atmosphere Conference 2013, Poznan, Poland, 14 May 2013

At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.  

Scaling teams, processes and architectures

Atmosphere Conference 2013, Poznan, Poland, 13 May 2013

When we think about scalability, we only focus on the technical details, forgetting two equally important aspects, people and processes. In this talk we'll cover the fundamental elements of scalability, both organisational and technical, with sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application  

Monitoring at scale - Intuitive dashboard design

PHP UK Conference 2013, London, 23 February 2013

At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.  


Experience  

Chief Technology Officer

DataSift

Direction of the Engineering/QA/Operations groups (~60 people) and part of the Executive Team. Design and hands-on implementation of the DataSift Data Platform - a SaaS platform producing state-of-the-art, privacy-safe data analysis, filtering and aggregation technology, handling billions of user-generated messages and events in real time every day. We built the data-business for the world’s biggest social networks, and pioneered privacy-safe analytics models for analyzing human-created data.

Highlights:   Facebook Topic Data  |  Linkedin Engagement Insights

October 2010 - February 2019

Sr Director of Engineering

Meltwater

Direction of the Platform Engineering/QA/Operations groups. Promoted to leading the entire Engineering organisation responsible for the delivery of the fairhair.ai platform (4+ worldwide teams) within 5 months of DataSift’s acquisition, thanks to demonstrated ability to deliver on Product and Technology missions. Worked on internal reorg of the Engineering and R&D department to improve delivery capabilities.

January 2018 - February 2019

Technical Team Lead

Ibuildings UK

Technical Lead for various teams and different projects, specialised in the design and optimisation of business-critical, scalable web architectures for large enterprise companies (BBC, Channel 5, Ladbrokes, AVG, Sybase, Cable & Wireless, Dennis Publishing, etc.), always delivering on time and with outstanding quality: in processes, design, code, tests, business analysis, communication and security.

May 2008 - September 2010

Machine Learning / Natural Language Processing Researcher

CELI S.r.l.

Natural Language Processing R&D job, fund granted by the Research Consortium of Turin Polytechnic. Developed several Automatic Text Classifiers (with focus on opinion mining and sentiment analysis), an Information Extraction system and many language processing modules. SVM, Neural Networks, Naïve Bayes, Associative Classifiers.

May 2007 - April 2008

Consultant - Software Engineer

IT consulting for various companies. Designer and developer of Intranet applications and web sites. Active contributor to various popular PHP & Go open-source projects and frameworks. Author of technical articles.   Some notable projects:

January 2000 - June 2007

Education

Online Courses (Coursera / Udacity)

  • Stanford University:
    • Artificial Intelligence (advanced track, with prof. Sebastian Thrun and Peter Norvig)
    • Machine Learning (advanced track, with Prof. Andrew Ng)
    • Programming a Robotic Car (prof. Sebastian Thrun) - highest distinction
    • Building a Search Engine (prof. David Evans) - highest distinction
  • Princeton University: Sociology (Advanced track, with prof. Mitchell Duneier)
  • Duke University: Behavioral Economics (prof. Dan Ariely)
  • University of Michigan: Model Thinking (prof. E. Page)
  • Toronto University: GIS, Mapping, and Spatial Analysis Specialization (prof. Don Boyes):
    • Introduction to GIS Mapping
    • GIS Data Acquisition and Map Design
    • Spatial Analysis and Satellite Imagery in a GIS
    • GIS, Mapping, and Spatial Analysis Capstone
  • University of California, Irvine: An Introduction to Programming the Internet of Things (IOT) Specialization (prof. Ian G. Harris)
  • Yale University: Roman Architecture (prof. Diana E.E. Kleiner)
  • École Polytechnique Fédérale de Lausanne: Functional Programming Principles in Scala (prof. Martin Odersky) - 70/70 with distinction
  • Udacity: Intro to the Design of Everyday Things (prof. Don Norman) - with highest distinction
  • Udacity/Google: Website Performance Optimization (with Ilya Grigorik)
  • Google: Industrial IoT on Google Cloud Platform
  • Università la Sapienza di Roma: Early Renaissance Architecture in Italy (prof. Francesco Paolo Fiore) - with distinction
  • Khan Academy: Art history

Politecnico di Torino - B.S. and Master's degree

Computer Science / Software Engineering
Master Thesis on Associative Text Classifiers and novel feature weighting algorithms. Winner of research grant

Liceo Classico "Cesare Balbo", Chieri, Torino

Classical Studies

Selected to participate to regional Math Olympics and to international Latin translation competitions (Certamen Ciceronianum Arpinas and Certamen Bugellensis).


Skills

Programming Languages & Tools
Professional Interests
  • Building Efficient Engineering Organisations
  • Technical Leadership
  • Cross-Team Coordination Focused on Delivery
  • Problem Decomposition, Software Design
  • Algorithms and Data Structures
  • Scalable Architectures
  • Database Design
  • Performance Testing and Profiling
  • Automation, Continuous Integration and Delivery
  • Operational Monitoring and Alerting
  • Natural Language Processing, Machine Learning

Certifications & Patents

  • Filed a few patents (US 15611517, 15588306, 15445904) on complex streaming data joins for data analysis at scale
  • Filed a patent (US 15462369) on policies for privacy-safe data analysis at scale, and helped drafting several others.
  • Sun Certified MySQL 5.0 Developer (07/2010) · License: SUN644757
  • Zend PHP5 Certified Engineer (10/2008) · License: ZEND009122
  • Winner of research fund (Research Consortium of Turin Polytechnic) in the NLP / Sentiment Analysis field
  • Winner of grant for R&D project (Progetto Alfieri 2007/2008, Fondazione CRT) on text mining/classification in the archaeological domain