A Big Data Approach to Gathering CSR Data

Sep 26, 2012 11:30 AM ET

As previously seen on the CSRHub blog.

The following is part 2 of a 3-part series on “Big Data.”

By Bahar Gidwani

We have previously defined “Big Data” and shown how we feel it could help address some problems that exist in collecting corporate social responsibility (CSR) and sustainability data on companies.  We have also further described the problems with the currently dominant method of gathering this data—an analyst-based method.

CSRHub uses input from investor-driven sources (known as “ESG” for Environment, Social, and Governance or “SRI” for Socially Responsible Investment), non-governmental organizations, government organizations, and “crowd sources” to construct a 360 degree view of a company’s sustainability performance.  To better understand this process, let’s consider an example.

Hewlett Packard is a heavily tracked company. We have 56 sources of data for this company that together contribute 494 different rating elements.  We map each of these elements into one of twelve different CSR subcategories.  For instance, here are mappings for 20 of the elements that contribute to the Hewlett Packard rating:

Description of Data Element

Subcategory Mapping

Source

Participant in the Walmart Sustainability Assessment Environment Policy & Reporting Carbon Disclosure Project 2010 Full Data
Better World product rating Product Better World Companies
Board Structure/Board Diversity Board Thomson Reuters Asset4
Commitment to Society and to Human Rights Protection Policies Leadership Ethics ISOS Group Assessments
Committed to improving sustainability performance Human Rights & Supply Chain BSR Member
Corporate Governance Rank Transparency & Reporting CR’s 100 Best Corporate Citizens 2011
Green House Gas (GHG) Footprint Energy & Climate Change Trucost
Human Rights/ Child and Forced Labor Issues Community Dev & Philanthropy MSCI ESG Intangible Value Assessment
Member of the Electronic Industry Citizenship Coalition Human Rights & Supply Chain Electronic Industry Citizenship Coalition
Most Admired Companies for Minority Professionals in 2011 Diversity & Labor Rights BlackEngineer Most Admired Companies 2011
North America 300 Carbon Rank Energy & Climate Change Environmental Investment Organisation
Number of corporate sustainability reports issued Transparency & Reporting CorporateRegister.com
Number of EPEAT certified products Environment Policy & Reporting EPEAT
On FCPA Corporate Investigations List Leadership Ethics FCPA Corporate Investigations
Same-sex benefits Compensation & Benefits IW Financial
Statement references corruption Leadership Ethics UN Global Compact 2010
Top 100 most accountable companies according to AccountAbility Transparency & Reporting AccountAbility
Top 50 Socially Responsible Environment Policy & Reporting Top 50 Socially Responsible
Supports UN Drugs and Crime Anti-Corruption Measures Leadership Ethics UN Office on Drugs and Crime Anti-Corruption Measures
Working Mother list 2010 Compensation & Benefits Working Mother List 2010

Some of these data elements could map to more than one subcategory.  For instance, a company that is on the list of “Best Workplaces for Commuters” would get credit both for its energy saving effort (in “Energy & Climate Change”) for the benefit its programs bring to its employees (in “Compensation & Benefits).

The list above includes examples of each of the three main contributors to the system: Investment-related sources (Asset4/Thomson Reuters, Carbon Disclosure Project, GovernanceMetrics International/Corporate Library, IW Financial, MSCI, Trucost, Vigeo); Activists and NGOs (Accountability, BSR, CorporateRegister, CR 100, EIO, FCPA, Top 50 Socially Responsible); and Government & Consumer (Better World, Black Engineer, EICC, EPEAT, UN Global Compact, UNODC, Working Mother).  The completed mapping process connects the 494 data elements from the 56 sources for HP into our twelve subcategories in 971 different ways.

 

Subcategory

Investment-Related

Activists & NGOs

Government & Consumer

Total By Subcategory

Board

67

12

1

80

Community Dev & Philanthropy

39

16

10

65

Compensation & Benefits

34

6

6

46

Diversity & Labor Rights

51

8

13

72

Energy & Climate Change

37

44

15

96

Environment Policy & Reporting

46

69

14

129

Human Rights & Supply Chain

40

19

8

67

Leadership Ethics

80

27

14

121

Product

56

8

6

70

Resource Management

46

31

11

88

Training, Health & Safety

30

6

2

38

Transparency & Reporting

51

30

18

99

Total By Type

577

276

118

971

 

While investment-related sources contribute more data elements than the other types, there are at least some of each type present in each subcategory.  Another way to look at this is to see that many sources contribute to each subcategory:

 

Subcategory

Number of Sources

Total Elements

Board

11

80

Community Dev & Philanthropy

21

65

Compensation & Benefits

18

46

Diversity & Labor Rights

23

72

Energy & Climate Change

24

96

Environment Policy & Reporting

25

129

Human Rights & Supply Chain

23

67

Leadership Ethics

29

121

Product

14

70

Resource Management

24

88

Training, Health & Safety

13

38

Transparency & Reporting

25

99

 

Each value from each data element is converted into a zero to 100 rating (zero = lowest, 100 = highest).  These scores are then adjusted by comparing them to each other.  In the example above, there are 11 sources for HP’s board performance.  Suppose three of them gave it a great rating, six a medium rating, and two a poor one.  Computer analytics would guess that the six scores that agree are correct and that HP’s board rating is in the medium range.  The assumption is that three sources tended to be biased towards high scores and two towards low scores.  This chart shows the actual distribution of scores at the subcategory level, along with a calculation of the “normal” error curve that results.

 

 

When the analysis is repeated across thousands of companies, a picture emerges as to which sources tend to be overly positive or negative and which tend to predict the “mean” of the other sources.  All sources can be adjusted, based on this feedback—moving them up or down so they more accurately match the opinion of all other sources.  After a large number of iterations in this process, there is a consensus score for each subcategory for each company analyzed.

 

By making a few assumptions about how the errors in data are distributed, one can assess the accuracy of ratings.  In a previous post, we showed that CSRHub’s overall rating accurately represents the values that underlie it to within 1.8 points at a 95% confidence interval.

In our next post, we will discuss the benefits and drawback of using this complex and data intensive approach to measuring company CSR performance.

Bahar Gidwani is a Cofounder and CEO of CSRHub. Formerly, he was the CEO of New York-based Index Stock Imagery, Inc, from 1991 through its sale in 2006. He has built and run large technology-based businesses and has experience building a multi-million visitor Web site. Bahar holds a CFA, was a partner at Kidder, Peabody & Co., and worked at McKinsey & Co. Bahar has consulted to both large companies such as Citibank, GE, and Acxiom and a number of smaller software and Web-based companies. He has an MBA (Baker Scholar) from Harvard Business School and a BS in Astronomy and Physics (magna cum laude) from Amherst College. Bahar races sailboats, plays competitive bridge, and is based in New York City.

CSRHub provides access to corporate social responsibility and sustainability ratings and information on nearly 5,000 companies from 135 industries in 65 countries. By aggregating and normalizing the information from over 170 data sources, CSRHub has created a broad, consistent rating system and a searchable database that links millions of rating elements back to their source. Managers, researchers and activists use CSRHub to benchmark company performance, learn how stakeholders evaluate company CSR practices and seek ways to change the world.