Efficient Text Encoders for Labor Market Analysis

ByAdmin

Jun 2, 2025

arXiv:2505.24640v1 Announce Type: cross
Abstract: Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose textbf{ConTeXT-match}, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. textbf{ConTeXT-match} significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce textbf{Skill-XL}, a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present textbf{JobBERT V2}, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.

Efficient Text Encoders for Labor Market Analysis

ByAdmin

By Admin

Related Post

A survey on large language models in biology and chemistry

Leveraging Large Language Models for Use Case Model Generation from Software Requirements

H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification

You missed

A survey on large language models in biology and chemistry

An Efficient and Almost Optimal Solver for the Joint Routing-Assignment Problem via Partial JRA and Large-{alpha} Optimization

Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

THE AI TODAY