The hidden AI bias in company financials

Mark Jolley

September 10, 2025

Transparency about potential bias is essential for building trust in AI systems.

When users understand the limitations of an AI, this fosters a more critical and informed user base that can drive demand for more transparent, fair, and accountable AI systems. This, in turn, can incentivize developers to prioritize ethical considerations and invest in techniques to identify and mitigate bias throughout the AI lifecycle.

This note describes the theoretical sources of bias in a risk engine which analyzes historical company data, like that developed by Transparently. AI trained on company data has different kinds of bias than that trained on data involving people and, if uncorrected, can lead to costly decisions.

In our previous discussion of AI bias, we identified eight possible sources of bias: training data bias, historical bias, selection bias, algorithmic bias, confirmation bias, interaction bias, stereotyping bias and exclusion bias.

Distinguishing company data bias from human data bias

AI bias is often associated with models trained on human data - language, demographics, or personal behavior - where issues like underrepresentation or historical prejudice can lead to discriminatory outcomes. But AI trained on company data carries a different set of challenges.

The Transparently Risk Engine is trained on decades of structured financial statements and disclosures from over 85,000 listed companies. It looks for subtle, data-driven patterns of accounting manipulation, for example accrual anomalies, irregular revenue recognition, and balance sheet distortions. These are signals rooted in the operational and reporting behaviors of firms, not people.

Because of this, the nature of bias is different:

Human data bias often reflects inequities in how individuals are represented in the data. This can lead to unfair treatment or flawed predictions when applied to real-world decisions about people.
Company data bias, by contrast, emerges from outdated accounting standards, industry-specific quirks, uneven regulatory regimes, or over-representation of certain firm types. These factors shape what the model learns—and if left uncorrected, can lead to distorted risk assessments.

In short, the patterns a model learns from historical company data may not fully reflect today’s reporting realities. This article explores how those distortions can occur, and how Transparently’s engineers work to minimize their effects without compromising the system’s ability to detect risk early and credibly.

Training data bias

If the historical financial data used to train the engine contains inherent biases (e.g., reflecting past reporting practices, industry norms, or even undetected historical manipulations), the model could learn and perpetuate these biases.

Let's consider a concrete example related to revenue recognition practices.

Scenario: In the distant past (say, 10-15 years ago), it was a common, though aggressive, industry practice for software companies to recognize a significant portion of their multi-year software license revenue upfront, even if the associated services or updates were to be delivered over several years. This practice, while perhaps compliant with accounting standards at the time, might have been less transparent or carried higher inherent risk compared to current, more stringent revenue recognition standards (like ASC 606 or IFRS 15).

How the bias occurs: If the Transparently Risk Engine was primarily trained on a large dataset of historical financial statements where this aggressive upfront revenue recognition was prevalent among "healthy" or "non-manipulative" companies, the model would learn to associate this pattern with low accounting risk. The AI would essentially "learn" that this specific revenue recognition behavior is normal and acceptable, as it was frequently observed in companies that did not subsequently experience fraud or financial failure within the historical training period.

Perpetuation of bias: When the engine analyzes a current software company that might be using a similar aggressive upfront revenue recognition method (even if it's now considered more risky or less transparent under newer standards), the model, due to its historical training, might underestimate the accounting manipulation risk. It would perpetuate the bias from the older data, failing to flag this practice as a significant concern because it was "normalized" by the historical training data. This could lead to a biased assessment where a company engaging in less transparent but historically common practices receives a lower risk score than it might warrant under contemporary scrutiny.

Historical bias

If past economic conditions, regulatory environments, or accounting standards differ significantly from current or future ones, the predictions of a model trained on these out-dated patterns might be less accurate or biased towards past patterns.

Let's consider an example related to lease accounting standards to see how changes in these standards could introduce bias.

Scenario: Before 2019, many companies used "operating leases" which were treated as off-balance sheet arrangements. This meant that significant lease obligations for assets like buildings, vehicles, or equipment were not recorded as liabilities on the company's balance sheet. This practice often made companies appear to have lower debt and better financial ratios (e.g., lower debt-to-equity) than if these obligations were fully recognized. For many industries, this was a common and accepted accounting practice.

In 2019, new lease accounting standards (IFRS 16 and ASC 842) came into effect, requiring companies to recognize most operating leases on their balance sheets as "right-of-use" assets and corresponding lease liabilities. This significantly increased the reported assets and liabilities for many companies, particularly those with extensive leased operations.

How the bias occurs: If, for example, the Transparently Risk Engine was primarily trained on historical financial data from periods before 2019, it would have learned that companies with substantial operational assets (like a large retail chain with many leased stores) but minimal reported lease liabilities were financially "normal" and generally low-risk in terms of gearing. The model would have associated this financial structure with good accounting quality, as it was the prevailing standard.

If the Transparently Risk Engine, still heavily influenced by its pre-2019 training data, were to analyze a company reporting under post-2019 standards, it might exhibit a bias. A company that now correctly reports its substantial lease liabilities on the balance sheet would suddenly appear to have significantly higher debt and leverage compared to the "normal" patterns the model learned from the old data.

The model, biased towards the old context where such liabilities were off-balance sheet, might incorrectly flag this company as having deteriorating accounting quality or higher gearing risk, simply because its reported financial structure now looks different from the historical "normal" due to a change in accounting standards, not due to actual manipulation. The model's predictions would be less accurate because the underlying rules of financial reporting have changed, but its learned patterns are still rooted in the past.

Selection bias

If the dataset used for training is not fully representative of the entire universe of publicly-listed companies (e.g., over-representing certain regions, industries, or company sizes), the engine's performance might be skewed when applied to under-represented groups.

Let's consider a concrete example of selection bias in the Transparently Risk Engine, focusing on how an unrepresentative training dataset could skew performance for under-represented groups.

Scenario: Imagine the Transparently Risk Engine's training dataset was predominantly composed of financial data from: (1) large, established companies in developed markets (e.g., S&P 500 companies in the US and FTSE 100 companies in the UK); and (2) traditional industries like manufacturing, finance, and retail, which tend to have stable business models, predictable revenue streams, and well-defined asset structures.

This training data would teach the model what "normal" accounting practices and "red flags" look like for these types of companies. For instance, it might learn that high levels of R&D expenditure relative to revenue are unusual for mature manufacturing firms and could signal aggressive accounting. If we now applied this engine to small-cap, high-growth biotechnology companies in emerging markets, we should expect considerable bias because these companies often have very different financial characteristics than established companies in developed economies. For example:

They might have negative or low revenue for many years while investing heavily in research and development (R&D) for future products;
Their asset base might be heavily weighted towards intangible assets (patents, intellectual property) rather than tangible property, plant, and equipment;
They might rely on frequent equity financing rounds rather than traditional debt;
Accounting practices and regulatory oversight in emerging markets might also differ from developed markets.

How the bias occurs: Due to selection bias, the model, trained on data from large, mature, traditional companies, might misinterpret the financial characteristics of the small-cap biotech company, For example:

The model might incorrectly flag the biotech company's high R&D expenses (which are normal for its industry and growth stage) as a significant risk factor, because it learned that such high R&D is "abnormal" for the large, mature companies it was trained on;
The model might struggle to accurately assess the quality of the biotech company's intangible-heavy asset base, as its training data provided limited examples of how to evaluate such assets in a high-growth, R&D-intensive context. It might incorrectly assign higher risk due to a lack of tangible assets, which is a "normal" characteristic for the over-represented group;
The model might incorrectly flag a high reliance on non-traditional financing methods, or heavy use of options issuance for employee compensation as suspicious because these methods would not be common in the training data set;
If the biotech company has unique revenue recognition patterns (e.g., milestone payments from drug development, which are common in biotech but rare in traditional manufacturing), the model might misinterpret these as aggressive or manipulative simply because they don't fit the patterns learned from the over-represented industries.

How the bias skews performance: In essence, the engine's performance would be skewed because it's applying "rules" learned from one type of company to a fundamentally different type of company, leading to biased risk assessments for the under-represented group. The system would have an in-built bias against small high growth companies reliant on unconventional financing.

In fact, users often question whether the Transparently Risk Engine has an in-built bias against small companies. Since the system is trained on a vast number of smaller companies of all types, there are many more small than large companies in the training data set and this mitigates the risk of selection bias against small companies.

However, we should always be aware that companies in small and unusual industries, or companies with unique accounting standards, require special attention from the user. For example, in some emerging markets where founders own a large percentage of a company, it is common to recognize capex expenses that should rightly be capitalized. While this understates earnings, it also reduces tax expense. This practice, compounded over decades in the right market conditions, can greatly enhance shareholder value. The risk engine would recognize irregularities, and hence risk, in this practice even though it might be advantageous to shareholders.

Algorithmic bias

The design of the algorithms themselves, including how features are weighted or how thresholds are set, could inadvertently introduce or amplify biases, even if the input data seems balanced.

Let's consider a concrete example of algorithmic bias related to how the Transparently Risk Engine might process and interpret a common financial metric, even if the underlying data appears balanced.

Scenario: The Transparently Risk Engine uses a feature called "Days Sales Outstanding (DSO)," which measures the average number of days it takes a company to collect its accounts receivable. A higher DSO can sometimes indicate aggressive revenue recognition (booking sales before cash is collected) or issues with collections.

Suppose the algorithm is designed to assign a higher accounting risk score if a company's DSO exceeds a fixed threshold, say, 60 days. This threshold might have been determined during the model's development based on an overall average or optimization across a broad historical dataset. Furthermore, the algorithm might apply a uniform weighting to this "DSO risk" across all companies.

Further suppose the training dataset used to build the engine contains a balanced representation of companies from various industries, including:

Software-as-a-Service (SaaS) companies: These often have subscription models with predictable, recurring revenue and typically very low DSOs (e.g., 10-20 days) as payments are often upfront or automated;
Heavy manufacturing companies: These companies might sell large, complex machinery or equipment to other businesses, often involving long sales cycles, customized contracts, and extended payment terms, leading to naturally higher DSOs (e.g., 90-120 days).

How the bias occurs: Even if the training data itself is balanced in terms of the number of companies from each industry, the algorithm's fixed threshold of 60 days for DSO can introduce bias:

A manufacturing company with a healthy and normal DSO of 95 days (due to its industry's standard payment terms) would be flagged as high-risk by the algorithm because it exceeds the 60-day threshold. The algorithm, due to its design, inadvertently penalizes companies in industries with naturally longer collection cycles, even if their practices are legitimate;
Conversely, a SaaS company that normally has a DSO of 15 days but suddenly sees it jump to 45 days (a significant deterioration for its industry, potentially signaling aggressive revenue recognition) might not be flagged as high-risk by the algorithm because 45 days is still below the 60-day threshold. The algorithm's uniform threshold fails to capture the relative change and industry-specific context, potentially missing emerging risks in industries with inherently lower DSOs.

In this example, the algorithmic design choice of a fixed, universal threshold for DSO, rather than an adaptive or industry-specific one, inadvertently introduces a bias that can lead to inaccurate risk assessments for companies whose normal operating characteristics fall outside that universal threshold.

Confirmation bias

While primarily a human bias, if the model's learning process is not rigorously designed, it could inadvertently prioritize patterns that confirm initial assumptions about what constitutes "risk," potentially overlooking novel forms of manipulation.

Let's consider an example of confirmation bias, focusing on how the model's learning process might inadvertently prioritize known patterns of risk, potentially overlooking novel forms of manipulation.

Scenario: Historically, a very common and well-documented form of accounting manipulation involves aggressive revenue recognition, particularly through "channel stuffing" (shipping excess inventory to distributors to book sales early) or prematurely recognizing revenue from long-term contracts. The Transparently Risk Engine, having been trained on numerous historical fraud cases, has learned to strongly associate specific financial signals (e.g., rapidly increasing accounts receivable relative to sales, declining inventory turnover, or unusual spikes in revenue at quarter-end) with this type of aggressive revenue recognition and, consequently, high accounting risk.

During its development and training, the model's designers might have implicitly or explicitly focused on ensuring the model accurately identifies these well-known aggressive revenue recognition patterns. The model's algorithms are optimized to detect these strong, clear signals, and they become highly weighted in its risk assessment. The "assumption" is that these are the primary ways revenue manipulation occurs.

Now, imagine a new, more subtle form of manipulation emerges, perhaps related to complex financial instruments or off-balance sheet entities that are used to obscure liabilities or inflate asset values, indirectly impacting the perception of financial health without directly manipulating revenue figures in the traditional sense. For example, a company might use complex special purpose entities (SPEs) to hide debt or transfer risky assets, making its balance sheet appear stronger. The signals for this new type of manipulation might be more diffuse, involving subtle shifts in cash flow classifications, unusual related-party transactions, or complex footnotes that are harder for the model to directly link to "risk" based on its primary training.

How the bias occurs: If the model's learning process was not rigorously designed to continuously adapt and seek out all potential risk indicators, it could exhibit confirmation bias:

When analyzing a company, the model might heavily prioritize the strong, clear signals it's been trained to recognize (e.g., looking for signs of aggressive revenue recognition);
Because the new manipulation (e.g., hidden liabilities via SPEs) doesn't perfectly align with the highly weighted "aggressive revenue recognition" patterns, the model might inadvertently downplay or overlook these novel, more subtle signals. The model's "attention" is biased towards confirming its existing understanding of risk, rather than actively searching for and learning from new, less familiar patterns.

Consequently, a company engaging in novel forms of manipulation might receive a better risk score than warranted, simply because the model's internal "risk radar" is tuned to detect older, more familiar forms of manipulation, confirming its initial assumptions about what "risk" looks like.

In this example, the confirmation bias isn't about human prejudice, but about the model's learned tendency to prioritize and interpret new data through the lens of its most strongly reinforced historical patterns, potentially leading it to miss evolving or less conventional forms of accounting manipulation.

Interaction bias

In a pre-computed system such as the Transparently Risk Engine, the system does not train on real-time interaction. In this case, interaction bias would more likely relate to how the initial design and refinement of the model by developers might have been influenced by their own perspectives, leading to a system that performs differently on new or evolving accounting practices.

Let's consider an example of interaction bias in the context of the Transparently Risk Engine's development and its impact on evolving accounting practices.

Scenario: Imagine the initial development team for the Transparently Risk Engine primarily consisted of financial experts and data scientists who had extensive experience from the traditional software industry (e.g., companies selling perpetual software licenses, where a large portion of revenue was recognized upfront upon sale and delivery). Their collective expertise and the historical data they initially focused on would naturally shape their understanding of "normal" and "risky" accounting behaviors.

During the design and refinement phases, the developers, influenced by their background in traditional software, might have:

Prioritized specific "red flags": They might have heavily weighted features related to the timing of large, one-time revenue recognition events, or anomalies in deferred revenue accounts (which were less significant in perpetual license models).
Developed validation metrics: The metrics used to evaluate the model's performance (e.g., how well it predicted past fraud in traditional software companies) would reinforce its ability to detect these specific types of manipulation.
Implicitly defined "normal": Their mental models of a "healthy" software company's financials would be based on the patterns observed in traditional licensing.

Over time, the software industry has shifted towards Software-as-a-Service (SaaS) models. In SaaS, revenue is typically recognized over the subscription period (e.g., monthly or annually), even if the customer pays upfront for a longer term. This leads to:

Significant deferred revenue on the balance sheet (as cash is received before it's earned);
A more gradual revenue stream compared to large, lumpy upfront license sales; and
Different patterns in working capital and cash flow.

How interaction bias occurs: When the Risk Engine, designed with the "traditional software" lens, is applied to a modern SaaS company, the interaction bias from its development could cause the engine to:

Misinterpret deferred revenue: The model might incorrectly flag a large and growing deferred revenue balance (which is perfectly normal and healthy for a SaaS company) as a potential risk, because its algorithms were tuned to see unusual deferred revenue patterns in the context of perpetual license models. The developers' initial focus on traditional revenue recognition might have led them to design features that misinterpret the healthy growth of deferred revenue in a SaaS context.
Overlook new risks: Conversely, the model might be less sensitive to subtle manipulation tactics unique to SaaS, such as aggressive capitalization of sales commissions or overly optimistic estimates for customer churn, because these weren't prominent "red flags" in the traditional software data that heavily influenced its initial design. The developers' perspectives, shaped by older industry norms, might have led them to inadvertently exclude or under-weight features that would detect these newer forms of manipulation.

In this example, the "interaction bias" isn't about a user's real-time input, but about how the collective perspectives and historical focus of the development team, during the crucial design and refinement stages, inadvertently built a system that performs differently (and potentially less accurately) when confronted with new accounting practices that evolved after its initial conception.

Stereotyping bias

The model might inadvertently create "stereotypes" of risky companies based on learned patterns from historical cases. This could lead to misclassifying companies that deviate from these learned patterns but are not actually engaging in manipulation.

Let's consider an example of stereotyping bias in the Transparently Risk Engine, where the model inadvertently creates a "stereotype" of a risky company and then misclassifies a legitimate one.

Scenario: Imagine that a significant portion of the historical accounting manipulation cases in the Transparently Risk Engine's training data involved companies that exhibited a specific "stereotype":

Rapid, unsustainable revenue growth often achieved through aggressive sales tactics or premature revenue recognition,
High levels of accounts receivable growing much faster than revenue, indicating difficulty in collecting payments;
Negative operating cash flow despite reported profits, meaning the company isn't generating cash from its core operations.
Frequent need for external financing, whether raising debt or equity to fund operations, rather than relying on internal cash generation.

The model learns to strongly associate this combination of characteristics with a "risky company" stereotype, as these patterns were consistently present in past manipulative firms.

Now, consider a legitimate, high-growth biotechnology startup that is developing a groundbreaking new drug. This company might genuinely exhibit some of these characteristics for entirely legitimate reasons:

Rapid revenue growth: After years of R&D, they might have just launched a successful product, leading to genuinely explosive, but initially lumpy, revenue growth;
High accounts receivable: They might be selling to large hospital systems or government entities with long, but secure, payment cycles, leading to naturally high DSOs;
Negative operating cash flow: They are still heavily investing in R&D for future pipeline drugs and scaling up manufacturing, meaning they are burning cash even with initial sales;
Frequent external financing: They require significant capital injections (equity rounds) to fund ongoing R&D and clinical trials, which is typical for biotech companies before profitability.

How stereotyping bias occurs: The Transparently Risk Engine, having learned the "risky company" stereotype from historical manipulation cases, might apply this stereotype to the legitimate biotech startup:

Despite the biotech company's legitimate business reasons for its financial patterns, the model's learned stereotype might cause it to incorrectly classify it as high-risk for accounting manipulation. The model sees the rapid revenue growth, high receivables, negative cash flow, and frequent financing, and these characteristics strongly match its "manipulative company" stereotype;
The model, due to this bias, might not adequately weigh the context (e.g., industry-specific norms for biotech, the nature of R&D investments, or the typical financing cycles for startups). It focuses on the surface-level financial patterns that align with its learned stereotype of risk, rather than understanding the underlying legitimate business drivers.

In this example, the stereotyping bias leads to a "false positive" – a legitimate company is misclassified as risky because its normal, industry-specific financial behavior happens to resemble the patterns of past manipulative companies that formed the model's "stereotype" of risk.

Exclusion bias

If certain types of companies, industries, or accounting practices are underrepresented or entirely absent from the training data, the model may not perform well or could be biased against these excluded groups when assessing their risk.

Let's consider an example of exclusion bias in the Transparently Risk Engine, focusing on a niche or emerging industry that might be underrepresented in its training data.

Scenario: Imagine the Transparently Risk Engine was primarily trained on historical financial data from a broad range of established industries like manufacturing, retail, traditional technology (software/hardware), and services. These industries typically have well-understood revenue models, tangible assets, and conventional debt structures.

Now let’s introduce some excluded or underrepresented groups such as Decentralized Autonomous Organizations (DAOs) or Crypto-Native Companies (if publicly listed and reporting financials).

Companies whose primary operations revolve around a DAO or a crypto-native business model will have unique financial characteristics that differ vastly from traditional companies:

Revenue might come from transaction fees on a blockchain, tokenomics, staking rewards, or treasury management, rather than traditional product sales or service subscriptions;
A significant portion of "assets" might be held in volatile cryptocurrencies, NFTs, or governance tokens, rather than traditional cash, inventory, or property, plant, and equipment;
The capital structure might involve token issuance, decentralized lending protocols, or community treasuries, which don't map directly to conventional debt or equity; and
The accounting standards for these nascent and rapidly evolving business models are still developing and can be highly complex, often involving fair value accounting for digital assets, or unique revenue recognition for token sales.

How exclusion bias occurs: If the Transparently Risk Engine's training data contained very few (or zero) examples of crypto-native companies, the model would have not learned patterns for what constitutes "normal" or "risky" financial behavior in this context. It wouldn't understand the typical volatility of crypto assets, the nature of token-based revenue, or the unique capital structures.

When the engine encounters a crypto-native company, it would try to apply the learned patterns from traditional industries. This would lead to:

The inherent volatility of crypto assets on the balance sheet might be incorrectly flagged as extreme asset quality risk or unusual financial fluctuations, simply because it deviates wildly from the stability expected of traditional assets;
Revenue derived from tokenomics or transaction fees might not fit the model's learned revenue recognition patterns, leading to false positives for aggressive revenue manipulation;
The model might struggle to assess true gearing or liquidity when assets and liabilities are primarily in digital, volatile forms, potentially leading to an overly conservative or overly optimistic risk assessment;

The result is a bias against this excluded group. The model might consistently assign higher (or sometimes lower, if it misinterprets a lack of traditional "red flags" as health) risk scores to crypto-native companies, not because they are inherently more manipulative, but because their financial characteristics are so fundamentally different from the "norm" established by the training data. The model simply lacks the necessary context and learned understanding to accurately assess their unique risk profile.

In this example, the exclusion bias arises directly from the absence of relevant data during training, leading to a model that is ill-equipped to perform accurately or without bias on companies operating outside its learned domain.

Conclusion

Potential bias in AI systems has become a hot topic. Like all AI models, the Transparently Risk Engine is subject to this danger and we have attempted to demonstrate the various ways in which bias might be introduced.

Needless to say our engineers have been at pains to minimize each source of possible risk. Three aspects of the Transparently system minimize bias with respect to size, industry and other common descriptors.

First, the engine's core function is to detect specific accounting manipulation patterns. These patterns are often based on financial ratios and relationships between accounts, which are inherently less sensitive to a company's absolute size. A manipulation signal, such as unusual revenue recognition or aggressive accruals, can manifest in both large and small companies.

Second, the system provides "rank percent" (percentile rank among all companies in the same year) and "rank percent within sector" (percentile rank among companies in the same sector and year). These relative rankings inherently normalize for company size, as they compare a company's risk profile against its peers, rather than against an absolute scale that might disproportionately affect smaller or larger entities. This allows for a fair comparison of accounting quality within relevant peer groups.

Third, the model interprets factors holistically, meaning it considers the unique combinations of factors for each company and year. This approach allows the system to account for different financial characteristics and operational realities that might naturally vary with company size, ensuring that the assessment is tailored rather than applying a one-size-fits-all approach.

Fourth, while external interpretation thresholds might suggest different levels of concern for large-cap versus smaller firms (e.g., >30–40 % for large-caps vs. >50 % for smaller firms), this reflects a practical guideline for users based on typical risk profiles, not an inherent bias in the model's underlying calculation of accounting manipulation risk. The engine's design aims to detect manipulation signals regardless of the company's market capitalization.

‍

Mark Jolley

Mark has been an investment strategist for almost 40 years and has advised some of the world’s biggest investment funds, public companies and notable investors. In the mid-1990s, as a global investment strategist with Deutsche Bank, Mark produced a daily note read by more than 16,000 investment professionals including voting members of the Federal Reserve and the European Central Bank. In the 2000s, Mark worked as Deutsche Bank’s Asian strategist and then as strategist for China Construction Bank. He is now an independent analyst.