Build Your Own LinkedIn Analytics Part 12: What’s Next? Open Sourcing and Community

Build Your Own LinkedIn Analytics Part 12: What’s Next? Open Sourcing and Community

Orginally published on Medium on 19 January 2026

This is the final part of my blog series on building a LinkedIn analytics pipeline, from ingestion of Excel exports all the way to a production-style dashboard, complete with orchestration, maintainability and observability. In the previous post, I went over the key takeaways and lessons learned when building this data product. Now it’s time to look to the future.

This is part of the following blog series:

TL;DR

  • The full Databricks‑based LinkedIn analytics pipeline is now open‑sourced, so you can clone it, plug in your own exports, and start analysing your content end‑to‑end.
  • Future work spans richer data (articles, multimedia, post features), deeper analytics (DS/ML), and automation (GenAI assistants, CI/CD with GitHub Actions).
  • This installment is an invitation: fork the project, adapt it to your context, and share back improvements or findings with the community.

I. Open Source Repository

First, here’s the link to the repository that you have all been waiting for.

GitHub – KunojiLym/databricks-linkedin-analytics: Build Your Own LinkedIn Analytics Part 12: What’s Next? Open Sourcing and Community

This is where you’ll find Databricks notebooks, table and pipeline configurations, and dashboard definitions that mirror what has been demonstrated and shared throughout the series. At the same time, the repository is meant to complement this blog series while still being able to stand on its own. The idea is that anyone can clone or fork this codebase to experiment with their own LinkedIn analytics, or adapt the codebase to create a different data product.

If there are any issues that you encounter, or if you have any ideas for improvement, feel free to contribute to or comment on the repository. See CONTRIBUTING.md for more details.

II. Future Plans

There is a lot of work that can still be done with this project, some of which could warrant its own blog series. The following list is not comprehensive and is in no particular order of importance or urgency; some of this was touched on in the previous post reflecting on lessons learned.

  1. Enhancements to data: There are post features that I did not ingest such as post type or holidays; I also did not ingest LinkedIn articles or any form of associated multimedia.
  2. Enhancements to ingestion: The current method of uploading files directly in the Databricks UI is not ideal; there are ways to ease the pain, including building an app on Databricks to simplify the upload process.
  3. Enhancements to dashboard: The current dashboard has plenty of room for improvement, especially as I discover additional requirements for my own analysis.
  4. Enhancements to documentation: Since I intend the repository to be able to stand on its own, eventually I will want to include a full tutorial/walkthrough in the repository documentation.
  5. Data science and machine learning: The amount of data that I collect will eventually increase to the point where it makes sense to perform deeper exploratory analytics on the data. Machine learning techniques can then be leveraged, such as classification to determine clusters of topics and predictive analytics to forecast how well different types of posts might fare or to identify the best timings for posts.
  6. CI/CD with GitHub Actions: I alluded to this when discussing improvements to maintainability, but the CI/CD of this data product is at present triggered via the Databricks UI and could be further automated by GitHub Actions.
  7. Generative AI: These days, dashboards are not the only way to derive insights from the data. Building a chatbot with Databricks Genie can prove to be a fruitful endeavour. There are other potential uses for GenAI, especially when one considers the potential of using an agentic framework to suggest topics for exploration.

III. Final Thoughts

Over the past twelve parts, this project has grown from a simple personal experiment into a reusable reference stack for building creator‑grade analytics on Databricks. Rather than declaring it “done,” open‑sourcing the code is a way to keep it alive in the hands of other practitioners who see different problems and opportunities than one person ever could. 

If you adapt this pipeline, whether for your own LinkedIn, for a team’s content program, or as a teaching tool, I would love to hear what you discover and what you change. 

What would you build next on top of this: richer data, ML‑driven insights, or GenAI‑powered assistants?

Yingzhao Ouyang is an AI and data engineering specialist with a distinctive blend of humanities, business, and technical expertise, bringing a uniquely holistic perspective to enterprise data challenges that others with purely technical backgrounds miss. To find out more, follow his LinkedIn profile at https://www.linkedin.com/in/yzouyang/

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.