The data is usually incomplete. Some DOIs cannot be found at all in some APIs and for those that are found, often some metadata like authors, abstracts or reference-lists are missing or sometimes incorrect. For the local citation network and 'Top Cited', the completeness of the reference-lists is most important.
The estimated completeness of the reference-lists can be seen above the search bar in the "Seed Articles" tab and is calculated in the following way: For OpenAlex it is the fraction of Seed Articles that have reference-lists themselves (multiplied by the fraction of specified DOIs found, in case of a custom input list).
Semantic Scholar and Crossref allow for more subtle calculation, as they often also provide the total reference count, which is often larger than the reference-list of DOIs (older references often don't have DOIs and neither do some specific books, papers, conference abstracts or web links). The estimated completeness is thus calculated as the product of three fractions:
- Source reference completeness: (Number of Seed Articles found in API) / (Total reference count of source or total number of lines in custom input list)
- (Number of Seed Articles (excluding source) that have reference-lists themselves) / (Number of Seed Articles excluding source)
- Average Seed Articles' reference completeness among those Seed Articles that do have reference-lists: (Length of reference-list) / (Total reference count)