Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering

View through CrossRef
PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.
Title: Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering
Description:
PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints.
Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors.
Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis.
Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.
Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints.
However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN.
DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.
FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.
Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision.
Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.
Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation.
Most importantly, running time and storage requirements are drastically reduced.

Related Results

The current state of the plant nomenclature in crop production on the example of dissertation titles
The current state of the plant nomenclature in crop production on the example of dissertation titles
Aim. The aim of this article is to analyze the current state of plant nomenclature in agricultural practice. Methods. The analysis of literary sources, mathematical analysis. Resul...
Logical analysis of built-in DBSCAN Functions in Popular Data Science Programming Languages
Logical analysis of built-in DBSCAN Functions in Popular Data Science Programming Languages
DBSCAN algorithm is a location-based clustering approach; it is used to find relationships and patterns in geographical data. Because of its widespread application, several data s...
Reservoir Architecture and Fluid Connectivity in an Abu Dhabi Oil Accumulation
Reservoir Architecture and Fluid Connectivity in an Abu Dhabi Oil Accumulation
Summary Developing an understanding of reservoir architecture and fluid connectivity is a challenging, but essential task for well, reservoir and facilities manageme...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...
IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis
IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis
Aim/Purpose: The clustering techniques are normally considered to determine the significant and meaningful subclasses purposed in datasets. It is an unsupervised type of Machine Le...
A Study of Filtering Method for Accurate Indoor Positioning System Using Bluetooth Low Energy Beacons
A Study of Filtering Method for Accurate Indoor Positioning System Using Bluetooth Low Energy Beacons
Fingerprinting technique is an essential element in the indoor positioning system (IPS). Common methods utilize Wi-Fi signals. However, most of the Wi-Fi, because it is pre-install...

Back to Top