Build Your Own LinkedIn Analytics Part 11: Key Takeaways and Lessons Learned

Build Your Own LinkedIn Analytics Part 11: Key Takeaways and Lessons Learned

Orginally published on Medium on 5 January 2026

If you’ve followed along and made it this far in the blog series, congratulations. You have a working LinkedIn analytics pipeline: from ingestion of Excel exports all the way to a production-style dashboard, complete with orchestration, maintainability and observability.

Now it’s time to recap and review.

This is part of the following blog series:

TL;DR

  • Observability turns this LinkedIn analytics pipeline from a black box into a predictable data product with clear Service Level Agreements (SLAs) for reliability.
  • Operational observability ensures you know when jobs and tasks fail, who needs to act, and how alerts flow via Databricks notifications.
  • Data observability tracks quality dimensions like completeness, timeliness and uniqueness using Databricks’ built-in profiling, so you catch silent failures, not just broken runs.
  • For a single-creator LinkedIn stack, Databricks’ native observability is sufficient; platform-neutral stacks (OpenTelemetry, Prometheus, Grafana) pay off once you’re coordinating many pipelines across teams and platforms.

I. Recap: What Did We Build?

a. The context

In the opening post of the series, I outlined why I wanted to build my own LinkedIn analytics data product instead of relying on existing tools. To sum up: 

  • Native LinkedIn analytics were too limited and clunky for the type of analysis I wanted to do.
  • Third‑party analytics tools were powerful but expensive overkill for a single‑creator workflow.
  • Databricks Free Edition offered a realistic way to prototype a small, end‑to‑end production‑style data product.
  • The project would double as a reference implementation that budding data engineers could learn from, not just a personal dashboard.
The chosen method for ingesting LinkedIn data from the second post.

I explored my options for data sources in the second post, and eventually chose manual Excel exports as “the least bad” approach for an individual creator. The third post built on that decision by articulating a medallion-style data architecture that would serve as the scaffold for everything else in the series.

b. From raw exports to insights

The next few posts focused on bringing the raw data in the exported LinkedIn Excel files through a medallion-style data journey all the way to insights on a dashboard.

The data flow for the LinkedIn analytics data product (summary graphic of complete data architecture from the eighth post).
  • The fourth post detailed the process of ingesting the raw Excel files into the bronze layer, with minimal transformations. I also augmented the Excel files with additional data from crawling the LinkedIn post links embedded in those files as well as manual patching.
  • The fifth post expounded on the transformation of bronze data into silver-layer tables that are cleaned and standardised.
  • In the sixth post, I modelled the silver data into fact and dimension tables (or materialized views) on the gold layer for my analytic needs.
  • Then in the seventh post, I built a dashboard on top of those gold-layer tables that finally provided me with LinkedIn insights.
A dashboard panel screencap showing LinkedIn insights from the seventh post.

This is where most guides to building personal projects stop, but for the benefit of budding data engineers, I wanted to go further than just a dashboard.

c. Making it production-ready

Once the proof-of-concept was in place, it was time to make the pipeline production-ready. What did that mean in practice?

The complete setup for the LinkedIn analytics pipeline from the eighth post.
  • I orchestrated and automated the pipeline by adding file arrival triggers and linking the individual pieces of code into Databricks Jobs and Pipelines, as detailed in the eighth post in the series.
  • Then I made the pipeline maintainable in the ninth post by creating a Databricks Asset Bundle within a Git repository that ensured that everything (or as much as possible) was version controlled and documented.
  • Finally, in the tenth post, I outlined the steps taken to improve the observability of the pipeline so that maintainers can be alerted to any issues and triage them in a timely manner.

II. What Went Well?

a. Personal learnings from the project

  • I set out to explore Databricks Free Edition as a way to build an end-to-end, production-like stack, and for the most part, it proved a good fit for a solo, learning-oriented project.
  • Though I use Databricks regularly in my day job, this was still my first time interacting with some of its features such as Dashboards and Databricks Asset Bundles, and it was a great opportunity to deepen my experience with this major data platform.

b. Teachable moments in the blog series

Throughout the series, I highlighted best practices in production and enterprise environments. My aim was to show how these ideas appear in real code and architecture diagrams rather than in abstract checklists.

  • I established a clear separation of concerns with my medallion architecture (bronze ingestion and patching; single source of truth in the silver layer; gold semantic modelling).
  • I repeatedly called out where the constraints of Databricks Free Edition and manual Excel exports differ from an enterprise setup, so readers could see how to adapt the same patterns to a more robust environment.
  • I used real‑world patterns, such as fan‑in orchestration, materialized views for gold tables and idempotent pipeline design, to connect everyday LinkedIn analytics to the kinds of workflows data engineers encounter in production.

III. What Can Be Done Better?

a. Technical limitations

  • Because I am an individual user rather than a business, I could only do manual LinkedIn Excel exports and uploads. There is scope here for a business process automation or even an agentic workflow to remove this manual element, but it needs to be explored with care, especially around LinkedIn’s terms of service and rate limits.
  • Databricks Free Edition has constraints such as table limits, a single region and workspace, and limited serverless compute, which meant that I couldn’t fully demonstrate what an enterprise-level deployment would look like. If there is sufficient interest, I may explore that in a follow-up series using an Enterprise workspace instead of the Free Edition.
  • Some Databricks capabilities are still evolving beyond Free Edition constraints; Dashboards, the IDENTIFIER clause and Databricks Asset Bundles are prime examples, with feature gaps that Databricks is actively closing over time.

b. Potential technical improvements

  • A lot more can be done on the analytics front, especially as new data comes in, from further exploration of the data to machine learning models for forecasting, classification and segmentation (for example, predicting post performance or clustering post themes).
  • On a related note, the dashboard itself can be expanded upon or modified as new actionable insights become apparent.
  • I did not pull in multimedia data; some of my posts include images that can be highly relevant to their performance. Neither did I ingest my LinkedIn articles associated with some of my posts, even though article-level metrics and metadata can give important context to post performance.
  • Generative AI was barely touched on in this blog series, but there is a substantial scope for integrating it into the data product. Databricks offers strong support here, from Genie spaces and the Databricks One interface for ‘chatting’ with your data, to a wide range of foundational models you can deploy for tasks such as content summarisation or recommendation.

c. Potential blog series improvements

  • Some of my posts were code-heavy, which may be intimidating to our budding data engineers. I plan an open-source repository to make it easier to follow along and reuse the code patterns; more on that later.
  • Related to that, several posts make for heavy reading due to length and/or technical density. There is potentially room for tighter editing to make the writing more cogent and accessible.

IV. Review: Objectives

So did I meet the objectives set out at the very beginning of the series? I’d say yes, but let’s review.

a. To better understand my LinkedIn statistics

I have indeed achieved a better understanding of my LinkedIn statistics; for instance, I can see the impact of days of the week and posting frequency over a longer time horizon and at a per-post level, in a way that the native LinkedIn dashboard does not easily surface.

Objective met: Yes

b. To guide others in analyzing their own LinkedIn statistics

This comprehesive guide will, I hope, help those among you keen to analyse your own LinkedIn statistics. If you are less technical and want something more plug and play, the planned open-source repository will include instructions to get you started.

Objective met: Yes (and will be strengthened once the open-source repository is live)

c. To further explore Databricks’ capabilities and limitations

This objective has been thoroughly met, as reflected in the personal learnings and technical limitations discussed earlier in this post.

Objective met: Yes

d. To provide a reference end-to-end data product for budding data professionals

Once the open-source repository is released, this objective can be considered thoroughly met; even now, this blog series already serves as a comprehensive reference, with an end-to-end walkthrough and various tips and tricks of the trade.

Objective met: Yes (and will be strengthened once the open-source repository is live)

V. What’s Next?

In the next and final post of this series, I will outline my future follow-ups, as well as launch a public Git repository with the complete demonstration code that was built throughout the series. 

If you’ve built your own variant or plan to, I’d love to hear what worked, what broke, and what you’d like to see next.

Until then, stay tuned.

Yingzhao Ouyang is an AI and data engineering specialist with a distinctive blend of humanities, business, and technical expertise, bringing a uniquely holistic perspective to enterprise data challenges that others with purely technical backgrounds miss. To find out more, follow his LinkedIn profile at https://www.linkedin.com/in/yzouyang/

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.