Hi! I'm an experienced Software Engineering Manager and Technical Architect, currently CTO at DataSift.
I build efficient Engineering organisations and large-scale data platforms (at Twitter, Facebook and Linkedin scale).
I occasionally speak at international conferences, contribute to open source projects and mentor other startups.
When I'm not at work, I love studying art history, reading, playing the piano, travelling and doing outdoor activities.
A little journey exploring job queue models and debunking some programming folklore around the effects of batching on latency.
New PHP client library for the Apache Kafka project. Full support for Kafka 0.7+, with robust socket handling, complete test suite, Zookeeper-based consumer and many other improvements.
Some random comments on Dremel and a benchmark on Key-Value stores. How to evaluate technical papers and read between the lines.
A journey into optimising Hadoop jobs: the strategies to scan and filter a PetaByte of archived data, schedule new jobs and deliver data fast.
Kafka is a distributed publish-subscribe messaging system developed at LinkedIn, designed to support a very high throughput, persistent messages and parallel loading into Hadoop. A proposal has been submitted to make Kafka an Apache incubator project.
BBC, a world-renowned media company, needs no introduction. After redesigning the home page of their web site on December 2007, they decided to rewrite the underlying code and architecture.
The BBC iPlayer, simply known as the iPlayer, is an internet television, radio, and cable television service, developed by the BBC and offering high-definition media streams. The new generation of the site (released in summer 2010) brings a new, cleaner interface, and integration with various social networking sites to the TV on-demand service.
BBC, a world-renowned media company, needs no introduction. I helped several BBC teams (Sports, Music, ...) with the migration to the new web framework and with website performances.
Channel 5 (aka Five.tv) is the UK's fifth terrestrial TV channel. Ibuildings was one of the few selected tech partners chosen to drive their effort towards a renewed and stronger web presence.
AVG is one of the world's most recognizable names in online threat protection, with more than 110 million users protected by their software and excellent ratings from the independent antivirus testing labs.
At DataSift, we built the next generation, privacy-first analysis platform for Facebook Topic Data. Facebook Topic Data shows marketers what audiences are saying on Facebook about events, brands, subjects and activities, all in a way that keeps personal information private. Marketers use the information from topic data to make better decisions about how they market on Facebook and other channels, and build product roadmaps.
PYLON for LinkedIn Engagement Insights enables companies to discover what professionals are reading, sharing, and saying about products, industries, brands, and news on the world’s largest professional network.
This is a simple tool to help with copying the contents of a memcache cluster into a new one, helping with migrations.
Apache Kafka is a high-throughput distributed publish-subscribe messaging system written by the LinkedIn Data Team. I contributed a Golang client providing both a Producer and a Consumer for v.0.5 - 0.7.x.
Layer-based scheduling algorithm for parallel tasks with dependencies. Determines which tasks can be executed in parallel, by evaluating dependencies. Given a list of entries (each with its own dependency list), it can sort them in layers of execution, where all entries in the same layer can be executed in parallel, and have no other dependency than the previous layer.
* Start cli tasks automatically * Maintain the desired number of worker processes for each task * Handle automatic restarts when a worker dies or stalls The task manager is be able to start any cli (shell) script from the chosen directory. For tasks that are long-running and meant to be monitored continuously, each worker process should send regular keep-alive messages via a ZeroMQ PUB-SUB channel to communicate its health, and should handle SIGTERM messages when asked to terminate. If the worker doesn't respond to a SIGTERM signal, it will be killed with SIGKILL after a (configurable) grace period. The number of workers stalled/stopped since the task manager was started is reported in the task status.
Golang client library for StatsD. Contains a direct and a buffered client. The buffered version will hold and aggregate values for the same key in memory before flushing them at the defined frequency.
Talk about the soft side of scalability, covering team management, process implementation and some solid technology-related principles. Based on 10 years of experience building scalable teams and scalable data platforms.
At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.
At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.
When we think about scalability, we only focus on the technical details, forgetting two equally important aspects, people and processes. In this talk we'll cover the fundamental elements of scalability, both organisational and technical, with sound and proven principles and some advice on how to shape your organisation, set the right processes and design your application
At a certain scale, millions of events happen every second, and all of them are important to evaluate the health of the system. If not handled correctly, such a volume of information can overwhelm both the infrastructure that needs to support them, and people who have to make a sense out of thousands of signals and make decisions upon them, fast. By understanding how our rational mind works, how people process information, we can present data so it's more evident and intuitive. This talk will explain how to collect useful metrics, and to create the perfect monitoring dashboard to organise and display them, letting our intuition operate automatically and quickly, and saving attention and mental effort to activities that demand it.
Direction of the Engineering/QA/Operations groups (~60 people) and part of the Executive Team. Design and hands-on implementation of the DataSift Data Platform - a SaaS platform producing state-of-the-art, privacy-safe data analysis, filtering and aggregation technology, handling billions of user-generated messages and events in real time every day. We built the data-business for the world’s biggest social networks, and pioneered privacy-safe analytics models for analyzing human-created data.
Highlights: Facebook Topic Data | Linkedin Engagement Insights
Direction of the Platform Engineering/QA/Operations groups. Promoted to leading the entire Engineering organisation responsible for the delivery of the fairhair.ai platform (4+ worldwide teams) within 5 months of DataSift’s acquisition, thanks to demonstrated ability to deliver on Product and Technology missions. Worked on internal reorg of the Engineering and R&D department to improve delivery capabilities.
Technical Lead for various teams and different projects, specialised in the design and optimisation of business-critical, scalable web architectures for large enterprise companies (BBC, Channel 5, Ladbrokes, AVG, Sybase, Cable & Wireless, Dennis Publishing, etc.), always delivering on time and with outstanding quality: in processes, design, code, tests, business analysis, communication and security.
Natural Language Processing R&D job, fund granted by the Research Consortium of Turin Polytechnic. Developed several Automatic Text Classifiers (with focus on opinion mining and sentiment analysis), an Information Extraction system and many language processing modules. SVM, Neural Networks, Naïve Bayes, Associative Classifiers.
IT consulting for various companies. Designer and developer of Intranet applications and web sites. Active contributor to various popular PHP & Go open-source projects and frameworks. Author of technical articles. Some notable projects:
Selected to participate to regional Math Olympics and to international Latin translation competitions (Certamen Ciceronianum Arpinas and Certamen Bugellensis).