Layline Datasets

I started Project Layline as a research initiative to leverage high performance and cloud computing to create publicly accessible datasets for research in financial economics. My goal is to lower barriers to entry for conducting empirical research in finance by democratizing access to data. I also hope that it will bring increased transparency to the field by facilitating replication studies.


The complete list of datasets is available at layline.org.

Layline insider trading dataset

Available via the Harvard Dataverse with monthly updates and on Kaggle with daily updates.

This dataset captures insider trading activity at publicly traded companies. The Securities and Exchange Commission has made these insider trading reports available on its web site in a structured format since mid-2003. However, most academic papers use proprietary commercial databases instead of regulatory filings directly, which makes replication challenging because the data manipulation and aggregation steps in commercial databases are opaque and historical records could be altered by the data provider over time. To overcome these limitations, the presented dataset is created from the original regulatory filings; it is updated daily and includes all information reported by insiders without alteration.

By using this dataset you agree to cite my publication describing this dataset:
Balogh, A. Insider trading. Scientific Data 10, 237 (2023). doi: 10.1038/s41597-023-02147-6


Layline institutional holding reports

Available via the Harvard Dataverse with monthly updates and on Kaggle with daily updates.

This dataset captures the quarterly investment holdings of institutional investment managers and maps the ownership structure of public firms. These Schedule 13F reports are submitted to the Securities and Exchange Commission quarterly by all institutional investment managers with at least $100 million in assets under management. Most academic research examining the common ownership of corporations and the portfolio holdings of large investment managers is based on proprietary commercial databases. This hinders the replication of prior work due to unequal access to these subscriptions and because the data manipulation steps in commercial databases are often opaque. To overcome these limitations, the presented dataset is created from the original regulatory filings; it is updated daily and includes all information reported by investment managers without alteration.

By using this dataset you agree to cite my publication describing this dataset:
Balogh, A. Mapping the ownership of public firms. Working paper. (2024). doi: 10.2139/ssrn.4378976


Layline corporate filings dataset

Available via the Harvard Dataverse with monthly updates and on Kaggle with daily updates.

Regulatory filing metadata obtained from the SEC's EDGAR system.

By using this dataset you agree to cite my publication describing this dataset:
Balogh, A. Finding the right fit: Value creation in activism through human capital. Working paper. (2023). doi: 10.2139/ssrn.4378976


Layline shareholder activism dataset

Available via the Harvard Dataverse with monthly updates and on Kaggle with daily updates.

A set of datasets that includes metadata for passive and active blockholders, corporate filings, institutional investor filings, and non-management proxy filings.

By using this dataset you agree to cite my publication describing this dataset:
Balogh, A. Finding the right fit: Value creation in activism through human capital. Working paper. (2023). doi: 10.2139/ssrn.4378976