<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1d1 20130915//EN" "http://jats.nlm.nih.gov/publishing/1.1d1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JEF</journal-id>
<journal-title-group>
<journal-title>Journal of Economic and Financial Sciences</journal-title>
</journal-title-group>
<issn pub-type="ppub">1995-7076</issn>
<issn pub-type="epub">2312-2803</issn>
<publisher>
<publisher-name>AOSIS</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JEF-12-476</article-id>
<article-id pub-id-type="doi">10.4102/jef.v12i1.476</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A gamma generalised linear model as an alternative to log linear real estate price functions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-8775-6844</contrib-id>
<name>
<surname>Bax</surname>
<given-names>Dane</given-names>
</name>
<xref ref-type="aff" rid="AF0001">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-1503-8055</contrib-id>
<name>
<surname>Zewotir</surname>
<given-names>Temesgen</given-names>
</name>
<xref ref-type="aff" rid="AF0001">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-8073-5290</contrib-id>
<name>
<surname>North</surname>
<given-names>Delia</given-names>
</name>
<xref ref-type="aff" rid="AF0001">1</xref>
</contrib>
<aff id="AF0001"><label>1</label>School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, South Africa</aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><bold>Corresponding author:</bold> Dane Bax, <email xlink:href="danebax@gmail.com">danebax@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub"><day>05</day><month>12</month><year>2019</year></pub-date>
<pub-date pub-type="collection"><year>2019</year></pub-date>
<volume>12</volume>
<issue>1</issue>
<elocation-id>476</elocation-id>
<history>
<date date-type="received"><day>28</day><month>04</month><year>2019</year></date>
<date date-type="accepted"><day>02</day><month>08</month><year>2019</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2019. The Authors</copyright-statement>
<copyright-year>2019</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Licensee: AOSIS. This work is licensed under the Creative Commons Attribution License.</license-p>
</license>
</permissions>
<abstract>
<sec id="st1">
<title>Orientation</title>
<p>Residential property markets play an important role in economies, informing policy development and decision-making. However, measuring quality-adjusted growth is difficult because of the heterogeneity of properties. Hedonic regression is frequently used in real estate econometric studies as a quality-adjusted technique to estimate residential property prices for the development of price indices. Log linear models are typically used to derive these hedonic price functions.</p>
</sec>
<sec id="st2">
<title>Research purpose</title>
<p>This article develops hedonic pricing functions using generalised linear models for South African residential property listings over a 5-year period.</p>
</sec>
<sec id="st3">
<title>Motivation for the study</title>
<p>A parametric alternative to the log linear model is investigated to address the limited studies conducted in South Africa. An important feature of this study is the inclusion of different property types and the geographic scope.</p>
</sec>
<sec id="st4">
<title>Research approach/design and method</title>
<p>The data set consisted of 415 200 residential properties from all over South Africa. The data spanned a period from January 2013 to August 2017. Several generalised linear models were developed and compared.</p>
</sec>
<sec id="st5">
<title>Main findings</title>
<p>The gamma generalised linear model provided the best overall fit, generalising well to the unseen validation data. An added benefit of this model is that the estimates were kept on the original scale, avoiding the need for back transformation which is an appealing feature of any model. A dummy locational variable was shown to account for the spatial dependency in the data.</p>
</sec>
<sec id="st6">
<title>Practical/managerial implications</title>
<p>This framework provides property market participants with the ability to quantify the utility derived over the marginal distribution of the physical characteristics of properties. This research presents the groundwork to create a property price index where index number theory could be applied to the counterfactual predicted values obtained from hedonic price models to measure price inflation over time</p>
</sec>
<sec id="st7">
<title>Contribution/value-add</title>
<p>This study analysed the South African residential property market based on an online company&#x2019;s data, purportedly covering the entire market. No real estate hedonic price studies have been identified in South Africa with this level of scope. The gamma generalised linear model is a novel candidate to develop parametric real estate hedonic price functions.</p>
</sec>
</abstract>
<kwd-group>
<kwd>generalised linear models</kwd>
<kwd>real estate economics</kwd>
<kwd>model comparison and evaluation</kwd>
<kwd>spatial modelling</kwd>
<kwd>hedonic price functions</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s0001">
<title>Introduction</title>
<p>The importance of measuring residential property price inflation is paramount to households and economies; however, the heterogeneity of properties makes it difficult (De Haan &#x0026; Diewert <xref ref-type="bibr" rid="CIT0010">2011</xref>). Log linear models have been used extensively in real estate economics to estimate property prices and inflation. This study investigates generalised linear models as an alternative to the typical log linear approach. Cross-sectional hedonic price functions are developed and compared for the South African residential property market over a 5-year period.</p>
</sec>
<sec id="s0002">
<title>Background and objective</title>
<p>Residential property is an important component of individual and national wealth where it is capitalised on household balance sheets, informing economic policy formulation (Hill <xref ref-type="bibr" rid="CIT0020">2013</xref>). Goodhart and Hoffman (<xref ref-type="bibr" rid="CIT0014">2008</xref>) conducted a study providing evidence of relationships between house prices, credit and broad money. Using vector auto-regression fitted with ordinary least squares, their research showed statistically significant relationships between home prices and the macro economy. Bordo and Jeanne (<xref ref-type="bibr" rid="CIT0003">2002</xref>) found an increased likelihood of a financial crisis occurring when real estate prices reached a peak or shortly after a bust, in a study of advanced economies spanning from 1970 to 2001. This resonates with the views of De Haan and Diewert (<xref ref-type="bibr" rid="CIT0010">2011</xref>) who assert that sharp declines in home prices can adversely affect the debt to equity ratio and credit ratings. Residential property has an important role in economies and understanding price inflation is imperative; however, measuring price inflation is difficult because of infrequent transactions and the heterogeneity of properties.</p>
<p>Hedonic regression is ubiquitous in the construction of residential property price indices where log linear models are commonly developed in the price estimation procedure (De Haan &#x0026; Diewert <xref ref-type="bibr" rid="CIT0010">2011</xref>; Jiang et al. <xref ref-type="bibr" rid="CIT0021">2015</xref>). Hedonic regression has been found useful as a quality-adjusted methodology where pure price changes are measured and not simply changes in the composition of samples in different periods (Shimizu, Nishimura &#x0026; Watanabe <xref ref-type="bibr" rid="CIT0033">2010</xref>). Hedonic pricing measures the price of an item through its utility bearing characteristics where the price of the item is determined by the vector of its characteristics (Rosen <xref ref-type="bibr" rid="CIT0032">1974</xref>). Hedonic pricing describes the functional relationship of a heterogeneous item and the implicit attributes:
<disp-formula id="FD1"><alternatives><mml:math display="block" id="M1"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e001.tif"/></alternatives><label>[Eqn 1]</label></disp-formula>
where <italic>P</italic><sub><italic>j</italic></sub> is the price of the <italic>j</italic>th item which is a function of a set of characteristics <italic>Z</italic><sub><italic>j</italic></sub> (Goodman <xref ref-type="bibr" rid="CIT0015">1978</xref>). Hedonic pricing is useful when estimating the price of heterogeneous goods. Heterogeneous or differentiated goods are goods that differ in their respective composition of characteristics; however, consumers consider the set of characteristics closely related, defining it as a single item (Day <xref ref-type="bibr" rid="CIT0009">2003</xref>). Hedonic pricing mathematically models residential property prices as a function of structural and location characteristics (Lyons <xref ref-type="bibr" rid="CIT0024">2015</xref>). Ordinary least squares models are typically employed to estimate the marginal contributions of each characteristic, taking the form:
<disp-formula id="FD2"><alternatives><mml:math display="block" id="M2"><mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mtext>X</mml:mtext><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e002.tif"/></alternatives><label>[Eqn 2]</label></disp-formula>
where <italic>&#x03B2;</italic> is <italic>p &#x003C; n</italic> unknown parameters, the matrix <italic>X</italic><sub><italic>n</italic>&#x00D7;<italic>p</italic></sub> is a set of known independent variables and X<italic>&#x03B2;</italic> is the linear structure (Lindsey 2005). The implicit price for characteristic <italic>i</italic> of property <italic>j</italic> is calculated by taking the partial derivative. Because of the positive domain and positively skewed nature of residential property prices, log linear models are often adopted as a quality-adjusted technique that controls for changes in the quality of properties transacted in different periods, whilst reducing heteroscedasticity in the residuals (Silver <xref ref-type="bibr" rid="CIT0034">2016</xref>). Day (<xref ref-type="bibr" rid="CIT0009">2003</xref>); Bourassa, Cantoni and Hoelis (<xref ref-type="bibr" rid="CIT0004">2007</xref>); and Els and Von Fintel (<xref ref-type="bibr" rid="CIT0012">2010</xref>) conducted separate hedonic price studies for different property markets using log linear models. Els and Von Fintel (<xref ref-type="bibr" rid="CIT0012">2010</xref>) found that the assumption of the linear functional form was violated, finding quantile regression more appropriate to capture the hedonic price function. A potential problem with transforming property prices to the log scale is that exponentiation of the fitted values produces geometric mean estimates and not arithmetic mean estimates (Olivier, Johnson &#x0026; Marshall <xref ref-type="bibr" rid="CIT0029">2008</xref>). Another potential concern is the assumption that property prices are lognormal when a different distribution family may represent the data better. Generalised linear models incorporate exponential classes of distribution families, which facilitate modelling the response on the original scale. The objective of this study is to investigate generalised linear models as an alternative framework to the log linear model in the development of hedonic price functions for the South African residential property market.</p>
</sec>
<sec id="s0003">
<title>Generalised linear models</title>
<p>Generalised linear models are a natural extension of classical linear models where properties such as linearity and computing parameter estimates are similar (McCullagh &#x0026; Nelder <xref ref-type="bibr" rid="CIT0025">1989</xref>). Generalised linear models are characterised by three components. Firstly, a stochastic or random component representing a response variable <italic>y</italic>, consisting of independent observations (<italic>y</italic><sub><italic>1</italic></sub>,<italic>&#x202F;y</italic><sub><italic>2</italic></sub>,<italic>&#x202F;&#x2026;&#x202F;,&#x202F;y</italic><sub><italic>n</italic></sub>), belonging to a class of an exponential family distribution in the form of:
<disp-formula id="FD3"><alternatives><mml:math display="block" id="M3"><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>y</mml:mi><mml:mo>;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>&#x2205;</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x2205;</mml:mo></mml:mfrac><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x2205;</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e003.tif"/></alternatives><label>[Eqn 3]</label></disp-formula>
where <italic>&#x00D8;</italic> is a dispersion parameter and <italic>b</italic>(.), <italic>c</italic>(.) are known functions and the range of <italic>Y</italic> does not depend on <italic>&#x03B8;</italic> or &#x00D8;. For a random response variable <italic>Y</italic> with distribution of form 3 <italic>E</italic>(<italic>y</italic>) <italic>= &#x03BC;</italic>. Secondly, a systematic component that consists of a set of covariates (<italic>x</italic><sub><italic>1</italic></sub>,<italic>&#x202F;x</italic><sub><italic>2</italic></sub>,<italic>&#x202F;&#x2026;&#x202F;,&#x202F;x</italic><sub><italic>p</italic></sub>) which combine linearly with the coefficients to produce the linear predictor <italic>&#x03B7;</italic>. Therefore, <italic>&#x03B7; = &#x03B2;X</italic>. Finally, a link function that connects the stochastic and systematic components where <italic>&#x03B7; = &#x00B5;</italic>.</p>
<p>This generalisation takes the form:
<disp-formula id="FD4"><alternatives><mml:math display="block" id="M4"><mml:mrow><mml:msub><mml:mi>&#x03B7;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e004.tif"/></alternatives><label>[Eqn 4]</label></disp-formula>
where <italic>g</italic> (.) denotes the link function and <italic>&#x03B7;</italic> = <italic>&#x03BC;</italic> through the link function. The link function relates the conditional mean to the systematic component, namely the covariates. This formulation allows for the exponential family of distributions including normal; however, the link function may become any monotonic differentiable function, which then allows extensions to distributions such as Poisson, binomial and gamma amongst others (McCullagh &#x0026; Nelder <xref ref-type="bibr" rid="CIT0025">1989</xref>). This means that generalised linear models are suitable for modelling continuous data as well as count and binary data.</p>
<p>Generalised linear models obtain maximum likelihood estimates of parameters belonging to an exponential distribution family using the iterative reweighted least squared algorithm where the link function makes the systematic effects linear (Nelder &#x0026; Wedderburn <xref ref-type="bibr" rid="CIT0028">1972</xref>). Maximum likelihood estimates are a vector of parameter estimates produced by a model function which makes the observed data probable given the model function (Lindsey 2005).</p>
<p>The primary goodness-of-fit measure for generalised linear models is called the deviance which is the logarithm of a ratio of likelihoods (McCullagh &#x0026; Nelder <xref ref-type="bibr" rid="CIT0025">1989</xref>). The analysis of deviance makes model assessment and comparison possible in terms of the choice of covariates. Given a set of data, two extreme models are possible. Firstly, a null model with one parameter which represents a common <italic>&#x03BC;</italic> for all the <italic>y</italic>s. Secondly, a complete model where all the <italic>y</italic>s are different, matching the data completely. Fitting a model with more than one parameter represents a saturated model that can be compared to the null model (Dobson &#x0026; Barnett 2018). The fitting of <italic>n</italic> parameters is performed by maximising the likelihood of matching the model to the likelihood of the data through the deviance that differs based on the distribution.</p>
<p>For the normal distribution, the deviance is simply the sum of squares just like ordinary least squares which means that fitting a normal or lognormal distribution with the identity link function, where the natural logarithm of the response is taken, is equivalent to fitting a linear or log linear ordinary least squares model. For generalised linear models, the saturated model should have a lower deviance than the null model, indicating that the inclusion of <italic>n</italic> parameters is a better fit. Guisan and Zimmernam (<xref ref-type="bibr" rid="CIT0016">2000</xref>) propose that variance reduction in model formulation is generally a desired characteristic of the goodness of fit as with generalised linear models, where deviance reduction can be converted to an equivalent <italic>R</italic><sup><italic>2</italic></sup> statistic:
<disp-formula id="FD5"><alternatives><mml:math display="block" id="M5"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>N</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>Re</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>u</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>/</mml:mo><mml:mi>N</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e005.tif"/></alternatives><label>[Eqn 5]</label></disp-formula>
where <italic>D</italic><sup><italic>2</italic></sup> is the deviance explained or the amount of deviance accounted for by the model. Naturally, this leads to an understanding of the residuals of generalised linear models where the deviance residuals are reported as a measure of discrepancy. Deviance residuals are calculated as follows:
<disp-formula id="FD6"><alternatives><mml:math display="block" id="M6"><mml:mrow><mml:mi mathvariant="normal">sign</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mtext>y</mml:mtext><mml:mtext>i</mml:mtext></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover accent="true"><mml:mi>&#x03BC;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover><mml:mo stretchy="false">)</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mtext>d</mml:mtext><mml:mtext>i</mml:mtext><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e006.tif"/></alternatives><label>[Eqn 6]</label></disp-formula></p>
<p>This formulation shows that deviance residuals are calculated by taking the signed square root of the <italic>i</italic>th observation to the total model deviance. One can begin to understand the quality of fit that reflects the choice of the link function and linear predictor using deviance residuals (Nelder &#x0026; Wedderburn <xref ref-type="bibr" rid="CIT0028">1972</xref>). McCullagh and Nelder (<xref ref-type="bibr" rid="CIT0025">1989</xref>) state that through the appropriate link function and linearity of the systematic component, the desired error distribution of the deviance residuals can be achieved which should resemble normal theory residual plots, except for certain plots, like in the case of binomial errors. Standardised deviance residuals are approximately normal which is preferable to Pearson residuals that tend to reflect any skewness of the underlying distribution. Plotting the standardised deviance residuals against the fitted values can provide an informal check of the goodness of fit depending on the type of generalised linear model, where any curvature could suggest the incorrect choice of link function, omitted independent variables or the omission of quadratic terms in the independent variables (Davidson &#x0026; Snell <xref ref-type="bibr" rid="CIT0008">1991</xref>).</p>
<p>The selection of generalised linear models in this study involved choosing the appropriate distribution of <italic>Y</italic> and choosing the relationship between <italic>&#x03B7;</italic> and &#x03BC;. Three candidate combinations of model families and link functions were fit to the data, specifically the gamma log model, the normal log model and the lognormal identity model. These models are hereafter referred to as the gamma model, the normal model and lognormal model.</p>
</sec>
<sec id="s0004">
<title>Spatial dependency</title>
<p>Prices of adjacent properties are often related which can lead to correlation in the residuals of regression models, violating the assumption of independence (Bourassa et al. <xref ref-type="bibr" rid="CIT0004">2007</xref>). Spatial autocorrelation or dependency is a challenging problem in real estate modelling where correlation manifests in two-dimensional space unlike serial correlation which is one dimensional. Bourassa et al. (<xref ref-type="bibr" rid="CIT0004">2007</xref>) found that the inclusion of a submarket dummy variable accounted for spatial autocorrelation and outperformed geostatistical and lattice approaches. A similar approach was adopted in this study where a factor area variable was included to account for spatial dependence in listing prices.</p>
<p>Variograms were utilised to understand the spatial autocorrelation structure. Variograms display the dissimilarity of observations that vary in space as a function of the distance between them (Ploner <xref ref-type="bibr" rid="CIT0030">1999</xref>). The sill represents spatially autocorrelated sample locations, and the range is where the distance flattens out and the sample locations are no longer spatially autocorrelated. A variogram will be flat when no correlation or low correlation is present which indicates randomness in the structure (Chiles &#x0026; Delfiner <xref ref-type="bibr" rid="CIT0006">1999</xref>). The nugget effect is an important concept in variograms and describes the variability between observations that are closely spaced which could be inherent in the data or because of the sampling component (Clark <xref ref-type="bibr" rid="CIT0007">2010</xref>). Therefore, in the context of this study, a large nugget effect could be the product of closely clustered properties with similarly signed and order of magnitude residuals that would overestimate the amount of spatial dependency. A prevalent test developed by Moran (<xref ref-type="bibr" rid="CIT0026">1950</xref>) is a two-dimensional specification test for spatial autocorrelation, analogous to a test of univariate time series correlation (Anselin <xref ref-type="bibr" rid="CIT0001">2006</xref>):
<disp-formula id="FD7"><alternatives><mml:math display="block" id="M7"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mo>&#x2032;</mml:mo><mml:mi>W</mml:mi><mml:mi>e</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2032;</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>/</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math><graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-e007.tif"/></alternatives><label>[Eqn 7]</label></disp-formula>
where <italic>e</italic> represents the regression model residuals and <italic>w</italic> is the spatial weighting matrix and <italic>s</italic><sub><italic>0</italic></sub> is the standardisation factor that relates to the sum of weights for the nonzero cross-products. This test is applied to test for the presence of spatial autocorrelation.</p>
</sec>
<sec id="s0005">
<title>Research design</title>
<p>The open source programming language R was used to perform the statistical analysis in this research (R Core Team <xref ref-type="bibr" rid="CIT0031">2018</xref>). The data were provided by an online residential property portal that aggregates listings from real estate agencies throughout South Africa. The period of the data is from January 2013 to August 2017, and <xref ref-type="table" rid="T0001">Table 1</xref> describes the data set.</p>
<table-wrap id="T0001">
<label>TABLE 1</label>
<caption><p>Description of the data.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left">Variable</th>
<th align="left">Type</th>
<th align="left">Description</th>
<th align="left">Summary statistics (min; mean; max)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Listing price</td>
<td align="left">Market price</td>
<td align="left">The advertised price of the property</td>
<td align="left">1,000; 2 461 210; 200 000 000</td>
</tr>
<tr>
<td align="left">Size</td>
<td align="left">Structural</td>
<td align="left">The size of the structure in square meters</td>
<td align="left">2; 260; 85 102</td>
</tr>
<tr>
<td align="left">Lot</td>
<td align="left">Structural</td>
<td align="left">The land size in square meters</td>
<td align="left">2; 1,163; 99 999</td>
</tr>
<tr>
<td align="left">Bedrooms</td>
<td align="left">Structural</td>
<td align="left">The number of bedrooms</td>
<td align="left">0; 3; 78</td>
</tr>
<tr>
<td align="left">Bathrooms</td>
<td align="left">Structural</td>
<td align="left">The number of bathrooms</td>
<td align="left">0; 2; 78</td>
</tr>
<tr>
<td align="left">Property type</td>
<td align="left">Structural</td>
<td align="left">The type of property e.g. house</td>
<td align="left">Not applicable</td>
</tr>
<tr>
<td align="left">Suburb</td>
<td align="left">Locational</td>
<td align="left">The suburb the property is located</td>
<td align="left">Not applicable</td>
</tr>
<tr>
<td align="left">Province</td>
<td align="left">Locational</td>
<td align="left">The province the property is located</td>
<td align="left">Not applicable</td>
</tr>
<tr>
<td align="left">Listing date</td>
<td align="left">Time</td>
<td align="left">The date the property was advertised</td>
<td align="left">Not applicable</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>Note: The variables have been rounded to the nearest whole number.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The data were inspected for consistency where categorical variables were standardised and data types were constrained to the correct type. Property listings were duplicated by different agencies resulting in many properties being captured more than once. This could result in biased estimates, and therefore duplicate properties were identified and removed using row-wise string matching. Missing values resulted in the removal of observations. The summary statistics show that the spread of the numeric characteristic variables was large with lower and upper bounds that are unlikely.</p>
<p>Real estate agents populate data into automated feeds which could result in incorrect data capturing and anomalous data. To deal with the anomalous data, an autoencoder was developed using the H2O open source machine learning framework (Ledell et al. <xref ref-type="bibr" rid="CIT0013">2019</xref>). An autoencoder is a deep learning neural network aimed at reducing the feature space which can be viewed as a non-linear alternative to principal component analysis (Hastie, Tibshirani &#x0026; Wainwright <xref ref-type="bibr" rid="CIT0019">2015</xref>). Given enough data, the network will learn the identity of the data via non-linear reduced representation of the original data (Candel et al. 2018). A high reconstruction error for a data point indicates that the data point does not match the learned pattern and is anomalous. Lower limits were set on the listing price, size and lot variables. Listing price was set to &#x2265; ZAR 200 000; size and lot were set to &#x2265; 35 m<sup>2</sup>. These figures were chosen based on the ABSA Bank property price index which was used as a guideline (Luus <xref ref-type="bibr" rid="CIT0023">2002</xref>). Properties with a reconstruction mean squared error &#x2265; 9.39e-07 were discounted. Therefore, based on the results of the autoencoder, properties in the top 5th percentile of the reconstruction error were treated as anomalous.</p>
<p>The final data set consisted of 415 200 properties, and the spread of the variable distributions was greatly reduced after the anomalous data points were removed, evident in <xref ref-type="table" rid="T0002">Table 2</xref>.</p>
<table-wrap id="T0002">
<label>TABLE 2</label>
<caption><p>Final data summary statistics.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left">Variable</th>
<th align="center">Listing price</th>
<th align="center">Size</th>
<th align="center">Lot</th>
<th align="center">Bedrooms</th>
<th align="center">Bathrooms</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Minimum</td>
<td align="center">200 000</td>
<td align="center">35</td>
<td align="center">35</td>
<td align="center">1</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">Mean</td>
<td align="center">2 159 173</td>
<td align="center">231</td>
<td align="center">752</td>
<td align="center">3</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">Maximum</td>
<td align="center">19 700 000</td>
<td align="center">2080</td>
<td align="center">10 365</td>
<td align="center">13</td>
<td align="center">12</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>Note: The variables have been rounded to the nearest whole number.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Although the lot variable was used to detect anomalous data, it was discounted for modelling purposes as lot was not applicable for the property type apartment and often omitted. Lot was set to the size variable for apartments during the autoencoder learning stage.</p>
<p>The design approach chosen develops separate cross-sectional models for each year in the data which contrasts to building a single model using pooled cross-sectional data. The chosen framework will facilitate the development of a property price index that won&#x2019;t change previous estimates when future periods are introduced. A pooled period approach would result in new samples being added to the original sample, which would change previous estimations when constructing a residential property price index (De Haan &#x0026; Diewert <xref ref-type="bibr" rid="CIT0010">2011</xref>). Therefore, for each year, various generalised linear models were developed where listing price was regressed on the physical and locational attributes of residential properties. Tabular and graphical summaries of the model fit for each of the candidate models are presented for comparative purposes. Thereafter, based on generalisability and goodness of fit, the best model is selected and expounded upon.</p>
<sec id="s20006">
<title>Ethical considerations</title>
<p>Ethical clearance was obtained from the Research Ethics Committee of the University of KwaZulu-Natal, protocol reference number: HSS/0209/016M.</p>
</sec>
</sec>
<sec id="s0007">
<title>Results and discussion</title>
<p>The data were split into two sets, training and validation, where 70&#x0025; of the data were used for training and 30&#x0025; of the data were used for validating the models. This was done to test model generalisability on unseen data for the development of future models. The holdout data for a given model provide a more robust estimate of the generalisation error compared to the training error (Blum, Kalai &#x0026; Langford <xref ref-type="bibr" rid="CIT0002">1999</xref>). Partitioning the data into training and holdout sets for each year involved writing a function to ensure that the splits were random and that the distribution of the response was similar for each split and to the original data. The function ensured that each area factor level was present in each split. Model performance and generalisation was tested using the root mean squared error (RMSE) which is a measure of spread that compares the closeness of the model outcomes to the observed data (Gujarati <xref ref-type="bibr" rid="CIT0017">2004</xref>). A lower RMSE is indicative of less variability between model estimates and the observed data. The Akaike information criterion (AIC) statistics were also computed. When comparing models, the AIC is useful for model selection as it provides an assessment of the quality of different models given a set of data (Greene <xref ref-type="bibr" rid="CIT0018">2003</xref>). A lower AIC is indicative of better fit. Akaike information criterion concomitantly considers goodness of fit using the likelihood function whilst penalising model complexity through the number of parameters. Model selection was based on a combination of reported statistics, namely deviance explained, holdout RMSE, AIC and model fit based on diagnostic residual plots. <xref ref-type="table" rid="T0003">Table 3</xref> details the results of each yearly model fit.</p>
<table-wrap id="T0003">
<label>TABLE 3</label>
<caption><p>Model summaries.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left">Year</th>
<th align="center">Deviance explained</th>
<th align="center">Training RMSE</th>
<th align="center">Holdout RMSE</th>
<th align="center">AIC</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" colspan="5"><bold>Gamma model summary statistics</bold></td>
</tr>
<tr>
<td align="left">2013</td>
<td align="center">0.89</td>
<td align="center">708 816</td>
<td align="center">719 286</td>
<td align="center">665 059</td>
</tr>
<tr>
<td align="left">2014</td>
<td align="center">0.87</td>
<td align="center">762 918</td>
<td align="center">764 730</td>
<td align="center">1 689 332</td>
</tr>
<tr>
<td align="left">2015</td>
<td align="center">0.87</td>
<td align="center">768 004</td>
<td align="center">772 905</td>
<td align="center">1 808 202</td>
</tr>
<tr>
<td align="left">2016</td>
<td align="center">0.87</td>
<td align="center">731 159</td>
<td align="center">746 554</td>
<td align="center">2 557 727</td>
</tr>
<tr>
<td align="left">2017</td>
<td align="center">0.88</td>
<td align="center">723 854</td>
<td align="center">724 390</td>
<td align="center">1 779 451</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>Normal model summary statistics</bold></td>
</tr>
<tr>
<td align="left">2013</td>
<td align="center">0.83</td>
<td align="center">666 427</td>
<td align="center">704 715</td>
<td align="center">692 589</td>
</tr>
<tr>
<td align="left">2014</td>
<td align="center">0.82</td>
<td align="center">724 525</td>
<td align="center">743 688</td>
<td align="center">1 748 933</td>
</tr>
<tr>
<td align="left">2015</td>
<td align="center">0.83</td>
<td align="center">721 649</td>
<td align="center">743 687</td>
<td align="center">1 866 417</td>
</tr>
<tr>
<td align="left">2016</td>
<td align="center">0.83</td>
<td align="center">685 171</td>
<td align="center">710 324</td>
<td align="center">2 637 126</td>
</tr>
<tr>
<td align="left">2017</td>
<td align="center">0.84</td>
<td align="center">682 036</td>
<td align="center">693 513</td>
<td align="center">1 835 507</td>
</tr>
<tr>
<td align="left" colspan="5"><bold>Lognormal model summary statistics</bold></td>
</tr>
<tr>
<td align="left">2013</td>
<td align="center">0.89</td>
<td align="center">709 765</td>
<td align="center">716 993</td>
<td align="center">664 303</td>
</tr>
<tr>
<td align="left">2014</td>
<td align="center">0.88</td>
<td align="center">766 117</td>
<td align="center">762 544</td>
<td align="center">1 687 050</td>
</tr>
<tr>
<td align="left">2015</td>
<td align="center">0.87</td>
<td align="center">767 251</td>
<td align="center">774 243</td>
<td align="center">1 805 578</td>
</tr>
<tr>
<td align="left">2016</td>
<td align="center">0.88</td>
<td align="center">727 294</td>
<td align="center">741 349</td>
<td align="center">2 553 572</td>
</tr>
<tr>
<td align="left">2017</td>
<td align="center">0.88</td>
<td align="center">724 097</td>
<td align="center">726 167</td>
<td align="center">1 777 340</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>Note: The deviance explained figures are rounded to two decimal places. The other figures are rounded to the nearest whole number.</p></fn>
<fn><p>RMSE, root mean squared error; AIC, Akaike information criterion.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Each model produced consistent deviance explained statistics for each year respectively, where the gamma and lognormal models shared the highest amount of deviance explained. Moreover, the gamma and lognormal models appear very similar in terms of holdout RMSE and AIC statistics. The AICs produced by the lognormal models were not directly comparable to the other models as the response variable was on the logarithmic scale. The AICs of the lognormal models were made comparable by subtracting the sum of logarithms of the response variable from the likelihood. Based solely on the AICs, the lognormal models appear to fit the data the best, as they consistently produced the lowest AIC statistics. Considering only the holdout RMSE statistics, the normal model outperformed the two other models with consistently lower RMSE statistics each year. No evidence of overfitting is present as the training and holdout RMSEs are quite similar, indicating that the models generalise to unseen data. This suggests model robustness to the introduction of future periods.</p>
<p>Discerning the best model based solely on the goodness-of-fit measures reported above is difficult, and a graphical examination of the residuals is necessary. The goodness-of-fit residual diagnostic plots for each yearly model are illustrated in <xref ref-type="fig" rid="F0001">Figures 1</xref> and <xref ref-type="fig" rid="F0002">2</xref> from left to right, beginning with the gamma model, followed by the normal model and finally the lognormal model. <xref ref-type="fig" rid="F0001">Figure 1</xref> presents the residuals versus fitted values where the y-axis represents the deviance residuals and x-axis represents the fitted values. <xref ref-type="fig" rid="F0002">Figure 2</xref> presents the quantile&#x2013;quantile (Q&#x2013;Q) plots for normality.</p>
<fig id="F0001">
<label>FIGURE 1</label>
<caption><p>Model fitted versus residual plots.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-g001.tif"/>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-g001a.tif"/>
</fig>
<fig id="F0002">
<label>FIGURE 2</label>
<caption><p>Model quantile&#x2013;quantile plots.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-g002.tif"/>
</fig>
<p>The fitted versus residual diagnostic plots for the gamma and lognormal models are very similar and do not indicate any discernible pattern in the deviance residuals, one of the required assumptions. However, the normal model shows signs of heteroscedasticity at the upper quantiles, violating the assumption of constant variance.</p>
<p>None of the plots is perfectly normal with deviation at the upper and lower quintiles; the S-shaped curves indicate heavy tailed residual distributions. The gamma and lognormal Q&#x2013;Q plots appear the best behaved in terms of the normality assumption. The normal model appears to fit the data poorly in terms of the diagnostic plots, whilst the gamma and lognormal models appear to represent the data the best.</p>
<p>A possible caveat of using the lognormal model for modelling listing prices is that the expected values are on the log scale and back transformation is necessary. Transforming expected values from the log scale back to the original scale by means of exponentiation results in geometric mean estimates and not arithmetic mean estimates (Olivier et al. <xref ref-type="bibr" rid="CIT0029">2008</xref>). However, the natural logarithm is monotonic, and the back transformed estimates are equivalent to median estimates if the distribution of log<italic>(x)</italic> is symmetric (Musset <xref ref-type="bibr" rid="CIT0027">2006</xref>). An appealing feature of the gamma and normal models is that expected values are kept on the original scale where arithmetic mean expected values are computed. For this reason, the lognormal models are discounted from the candidate model selection. The gamma models are chosen over the normal models based on the diagnostic plots, lower AICs and similar holdout RMSEs. A discussion of the gamma modelling results ensues.</p>
<p>The property type factor variable included six levels, namely apartment, cluster, duplex, house, simplex and townhouse. The property type apartment was used as the reference level, resulting in the other property types being compared to this level. <xref ref-type="table" rid="T0004">Table 4</xref> tabulates the beta coefficient estimates for each covariate along with the corresponding p-values. To make reporting succinct, <xref ref-type="table" rid="T0004">Table 4</xref> discounted the area (factor variable) coefficients as there were over 2000 factor levels present in the data that varied between years. The area variable was used as a control variable to account for variability amongst listing prices and to account for the spatial dependency in the data.</p>
<table-wrap id="T0004">
<label>TABLE 4</label>
<caption><p>Gamma model results summary.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left" rowspan="2">Year</th>
<th align="center" colspan="2">2013<hr/></th>
<th align="center" colspan="2">2014<hr/></th>
<th align="center" colspan="2">2015<hr/></th>
<th align="center" colspan="2">2016<hr/></th>
<th align="center" colspan="2">2017<hr/></th>
</tr>
<tr>
<th align="center"><inline-formula id="ID1"><alternatives><mml:math display="inline" id="I1"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula></th>
<th align="center"><italic>p</italic></th>
<th align="center"><inline-formula id="ID2"><alternatives><mml:math display="inline" id="I2"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula></th>
<th align="center"><italic>p</italic></th>
<th align="center"><inline-formula id="ID3"><alternatives><mml:math display="inline" id="I3"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula></th>
<th align="center"><italic>p</italic></th>
<th align="center"><inline-formula id="ID4"><alternatives><mml:math display="inline" id="I4"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula></th>
<th align="center"><italic>p</italic></th>
<th align="center"><inline-formula id="ID5"><alternatives><mml:math display="inline" id="I5"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula></th>
<th align="center"><italic>p</italic></th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Intercept</td>
<td align="center">9.913</td>
<td align="center">2e-9</td>
<td align="center">11.013</td>
<td align="center">2e-9</td>
<td align="center">10.932</td>
<td align="center">2e-9</td>
<td align="center">11.305</td>
<td align="center">2e-9</td>
<td align="center">11.148</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Log (size)</td>
<td align="center">0.664</td>
<td align="center">2e-9</td>
<td align="center">0.626</td>
<td align="center">2e-9</td>
<td align="center">0.558</td>
<td align="center">2e-9</td>
<td align="center">0.479</td>
<td align="center">2e-9</td>
<td align="center">0.512</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Bedrooms</td>
<td align="center">0.003</td>
<td align="center">0.313</td>
<td align="center">0.017</td>
<td align="center">2e-9</td>
<td align="center">0.021</td>
<td align="center">2e-9</td>
<td align="center">0.034</td>
<td align="center">2e-9</td>
<td align="center">0.025</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Bathrooms</td>
<td align="center">0.111</td>
<td align="center">2e-9</td>
<td align="center">0.096</td>
<td align="center">2e-9</td>
<td align="center">0.112</td>
<td align="center">2e-9</td>
<td align="center">0.117</td>
<td align="center">2e-9</td>
<td align="center">0.112</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Cluster</td>
<td align="center">0.090</td>
<td align="center">2e-9</td>
<td align="center">0.136</td>
<td align="center">2e-9</td>
<td align="center">0.146</td>
<td align="center">2e-9</td>
<td align="center">0.187</td>
<td align="center">2e-9</td>
<td align="center">0.187</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Duplex</td>
<td align="center">0.003</td>
<td align="center">0.874</td>
<td align="center">0.025</td>
<td align="center">0.104</td>
<td align="center">0.035</td>
<td align="center">0.024</td>
<td align="center">0.086</td>
<td align="center">2e-9</td>
<td align="center">0.079</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">House</td>
<td align="center">0.027</td>
<td align="center">3e-4</td>
<td align="center">0.063</td>
<td align="center">2e-9</td>
<td align="center">0.103</td>
<td align="center">2e-9</td>
<td align="center">0.158</td>
<td align="center">2e-9</td>
<td align="center">0.141</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Simplex</td>
<td align="center">0.061</td>
<td align="center">4-e5</td>
<td align="center">0.068</td>
<td align="center">5e-7</td>
<td align="center">0.078</td>
<td align="center">2e-9</td>
<td align="center">0.117</td>
<td align="center">2e-9</td>
<td align="center">0.087</td>
<td align="center">2e-9</td>
</tr>
<tr>
<td align="left">Townhouse</td>
<td align="center">0.050</td>
<td align="center">0.064</td>
<td align="center">0.063</td>
<td align="center">2e-9</td>
<td align="center">0.077</td>
<td align="center">2e-9</td>
<td align="center">0.090</td>
<td align="center">2e-9</td>
<td align="center">0.099</td>
<td align="center">2e-9</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>Note: Numbers were rounded to three decimal places and scientific notation was adopted for brevity.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The coefficients given by <inline-formula id="ID6"><alternatives><mml:math display="inline" id="I6"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula> are expressed as percentage effects. The covariates log<italic>(Size)</italic> and the number of bathrooms were consistently statistically significant for each year. The natural logarithm was applied to the size covariate to improve linearity where <xref ref-type="fig" rid="F0003">Figure 3</xref> shows no discernible pattern in the plots indicating this transform was appropriate. The coefficients can be interpreted as follows:</p>
<list list-type="bullet">
<list-item><p>A 1&#x0025; increase in size (square metres), on average, increased the listing price of a residential property by <inline-formula id="ID7"><alternatives><mml:math display="inline" id="I7"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula> &#x00D7; 100 (&#x0025;) for a given year.</p></list-item>
<list-item><p>Each additional bedroom, on average, increased the listing price of a residential property by <inline-formula id="ID8"><alternatives><mml:math display="inline" id="I8"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula> &#x00D7; 100 (&#x0025;) for a given year.</p></list-item>
<list-item><p>Each additional bathroom, on average, increased the listing price of a residential property by <inline-formula id="ID9"><alternatives><mml:math display="inline" id="I9"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula> &#x00D7; 100 (&#x0025;) for a given year.</p></list-item>
<list-item><p>The property types in <xref ref-type="table" rid="T0004">Table 4</xref> are percentage difference comparisons between apartments where a property type was <inline-formula id="ID10"><alternatives><mml:math display="inline" id="I10"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula> &#x00D7; 100 (&#x0025;) greater than or less than apartments (reference level) depending on the sign in front of the <inline-formula id="ID11"><alternatives><mml:math display="inline" id="I11"><mml:mover accent="true"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:math><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-i001.tif"/></alternatives></inline-formula>.</p></list-item>
</list>
<fig id="F0003">
<label>FIGURE 3</label>
<caption><p>Gamma model residuals against transformed size covariate.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-g003.tif"/>
</fig>
<p>It is evident from <xref ref-type="table" rid="T0004">Table 4</xref> that each additional bathroom, on average, contributes more to the listing prices of homes than each additional bedroom. An appealing feature of this parametric framework is the transparency and interpretability of the model coefficients. Property market participants are able to make informed decisions about renovating their homes or making comparative buying decisions by examining the marginal utility of different characteristics.</p>
<p>Plotting the residuals against individual covariates of the linear predictor should result in a null pattern, like the residual versus fitted values plot (McCullagh &#x0026; Nelder <xref ref-type="bibr" rid="CIT0025">1989</xref>). The natural logarithm was applied to the size covariate to improve linearity where <xref ref-type="fig" rid="F0004">Figure 4</xref> shows no discernible pattern in the plots indicating this transform was appropriate.</p>
<fig id="F0004">
<label>FIGURE 4</label>
<caption><p>Gamma model variogram plots. (a) variogram 2013 model, (b) variogram 2014 model, (c) variogram 2015 model, (d) variogram 2016 model, (e) variogram 2017 model.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="JEF-12-476-g004.tif"/>
</fig>
<p>The analysis of deviance presented in <xref ref-type="table" rid="T0005">Table 5</xref> indicates that the residual deviance for each yearly gamma model was consistently lower than null deviance. This means that the covariates accounted for greater deviance explained than intercept only models and as such indicates a good fit.</p>
<table-wrap id="T0005">
<label>TABLE 5</label>
<caption><p>Analysis of deviance.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left">Year</th>
<th align="center">Residual deviance</th>
<th align="center">Null deviance</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">2013</td>
<td align="center">1470.18</td>
<td align="center">13 394.60</td>
</tr>
<tr>
<td align="left">2014</td>
<td align="center">4056.69</td>
<td align="center">31 487.73</td>
</tr>
<tr>
<td align="left">2015</td>
<td align="center">4448.07</td>
<td align="center">33 272.34</td>
</tr>
<tr>
<td align="left">2016</td>
<td align="center">6030.46</td>
<td align="center">45 648.77</td>
</tr>
<tr>
<td align="left">2017</td>
<td align="center">4008.25</td>
<td align="center">32 260.69</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The modelling of spatial data in this study required the assessment of the assumption of independence, which was investigated using several plots and by performing hypothesis tests. Variograms quantify the spatial dependence in data by describing the spatial variance. The yearly gamma models&#x2019; residuals were plotted using spherical variograms and are presented in <xref ref-type="fig" rid="F0004">Figure 4</xref> where similarities were found between all models. The ranges, distances beyond which the data are no longer correlated, are quite long, which suggests spatial autocorrelation is not an issue in the modelling results. The nugget effects as a percentage of the total sills are quite large which could indicate some variation at a small scale.</p>
<p>A permutation test for Morans I was applied to formally test for the presence of spatial autocorrelation where, under the null hypothesis, the data are randomly dispersed. The Morans I statistic or correlation coefficient ranges between -1 and 1, where -1 shows perfect negative spatial autocorrelation and 1 shows perfect positive spatial autocorrelation. Hundreds of permutations were run, 999 in total, for each yearly gamma model. The results of the tests are presented in <xref ref-type="table" rid="T0006">Table 6</xref> which indicate a weak negative correlation. Formally, at an alpha of 0.05, there is not enough evidence to reject the null hypothesis of no spatial autocorrelation for each yearly gamma model. This coincides with the findings of Bourassa et al. (<xref ref-type="bibr" rid="CIT0004">2007</xref>) where the addition of a location dummy variable accounted for spatial dependence adequately.</p>
<table-wrap id="T0006">
<label>TABLE 6</label>
<caption><p>Permutation test for Morans I.</p></caption>
<table frame="hsides" rules="groups">
<thead valign="top">
<tr>
<th align="left">Year</th>
<th align="center">Statistic</th>
<th align="center"><italic>p</italic></th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">2013</td>
<td align="center">&#x2212;0.0312</td>
<td align="center">0.999</td>
</tr>
<tr>
<td align="left">2014</td>
<td align="center">&#x2212;0.0267</td>
<td align="center">0.999</td>
</tr>
<tr>
<td align="left">2015</td>
<td align="center">&#x2212;0.0129</td>
<td align="center">0.999</td>
</tr>
<tr>
<td align="left">2016</td>
<td align="center">&#x2212;0.0207</td>
<td align="center">0.999</td>
</tr>
<tr>
<td align="left">2017</td>
<td align="center">&#x2212;0.0345</td>
<td align="center">0.999</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s0008">
<title>Conclusion</title>
<p>Residential property is a barometer of individual and collective wealth and acts as measure of financial stability in an economy. Measuring residential property prices is difficult because of the heterogeneity thereof. The estimation of residential property prices using hedonic modelling is pervasive in real estate economic literature where log linear models are typically employed. This article investigated generalised linear models as an alternative to log linear models to develop hedonic price functions to estimate residential property listing prices in South Africa over a 5-year period. The gamma generalised linear model provided the best fit and good generalisability whilst keeping the expected values on the original scale, which is an appealing alterative to log linear models. The spatial dependence of residential properties was effectively accounted for by including an area factor variable, supported by variograms and Morans I permutation tests, showing no evidence to reject the null hypothesis of no spatial autocorrelation. This framework provides property market participants with the ability to quantify the utility derived over the marginal distribution of the physical characteristics of residential properties. This research presents the groundwork to create a property price index where index number theory could be applied to the hedonic price models to measure price inflation over time.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<sec id="s20009" sec-type="COI-statement">
<title>Competing interests</title>
<p>The authors have declared that no competing interests exist.</p>
</sec>
<sec id="s20010">
<title>Authors&#x2019; contributions</title>
<p>D.B. contributed to the conceptual design, research methodology, cleaning and analysing the data, developing the models and visualisations using R, a statistical programming language. This article forms part of his PhD degree which he is currently pursuing. T.Z. contributed towards the conceptual design and research methodology as the supervisor of D.B. D.N. contributed towards the conceptual design and research methodology as the supervisor of D.B.</p>
</sec>
<sec id="s20011">
<title>Funding information</title>
<p>This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.</p>
</sec>
<sec id="s20012">
<title>Data availability statement</title>
<p>Data sharing is not applicable to this article as no new data were created or analysed in this study.</p>
</sec>
<sec id="s20013">
<title>Disclaimer</title>
<p>The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors.</p>
</sec>
</ack>
<ref-list id="references">
<title>References</title>
<ref id="CIT0001"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Anselin</surname>, <given-names>L.</given-names></string-name></person-group>, <year>2006</year>, &#x2018;<chapter-title>Spatial econometrics</chapter-title>&#x2019;, in <person-group person-group-type="editor"><string-name><given-names>T.C.</given-names> <surname>Mills</surname></string-name> &#x0026; <string-name><given-names>K.</given-names> <surname>Patterson</surname></string-name> (eds.)</person-group>, <source><italic>Palgrave handbook of econometrics Vol 1, econometric theory</italic></source>, <publisher-name>Palgrave Macmillan</publisher-name>, <publisher-loc>New York</publisher-loc>, pp. <fpage>901</fpage>&#x2013;<lpage>941</lpage>.</mixed-citation></ref>
<ref id="CIT0002"><mixed-citation publication-type="conference"><person-group person-group-type="author"><string-name><surname>Blum</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Kalai</surname>, <given-names>A.</given-names></string-name> &#x0026; <string-name><surname>Langford</surname>, <given-names>J.</given-names></string-name></person-group>, <year>1999</year>, &#x2018;<article-title>Beating the hold-out: Bounds for K-fold and progressive cross-validation</article-title>&#x2019;, <conf-name>COLT &#x2018;99 Proceedings of the twelfth annual conference on Computational learning theory</conf-name>, <conf-date>July 07&#x2013;09</conf-date>, <conf-loc>Santa Cruz, CA</conf-loc>, pp. <fpage>202</fpage>&#x2013;<lpage>208</lpage>.</mixed-citation></ref>
<ref id="CIT0003"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Bordo</surname>, <given-names>M.D.</given-names></string-name> &#x0026; <string-name><surname>Jeanne</surname>, <given-names>O.</given-names></string-name></person-group>, <year>2002</year>, <source><italic>Boom-busts in asset prices, economic instability, and monetary policy</italic></source>, <comment>CEPR Discussion Paper 3398</comment>, <publisher-name>Centre for Economic Policy Research</publisher-name>, <publisher-loc>London</publisher-loc>, <comment>viewed 12 February 2019, from <ext-link ext-link-type="uri" xlink:href="https://www.nber.org/papers/w8966">https://www.nber.org/papers/w8966</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0004"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bourassa</surname>, <given-names>S.C.</given-names></string-name>, <string-name><surname>Cantoni</surname>, <given-names>E.</given-names></string-name> &#x0026; <string-name><surname>Hoesli</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2007</year>, &#x2018;<article-title>Spatial dependence, housing submarkets, and house price prediction</article-title>&#x2019;, <source><italic>Journal of Real Estate Finance and Economics</italic></source> <volume>35</volume>(<issue>1</issue>), <fpage>142</fpage>&#x2013;<lpage>160</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s11146-007-9036-8">https://doi.org/10.1007/s11146-007-9036-8</ext-link></comment></mixed-citation></ref>
<ref id="CIT0005"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Candel</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>LeDell</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Parmar</surname>, <given-names>V.</given-names></string-name> &#x0026; <string-name><surname>Arora</surname>, <given-names>A.</given-names></string-name></person-group>, <year>2017</year>, <source><italic>Deep learning with H2O</italic></source>, <publisher-name>H2O.ai Inc.</publisher-name>, <publisher-loc>CA</publisher-loc>, <comment>viewed 20 February 2019, from <ext-link ext-link-type="uri" xlink:href="http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/DeepLearningBooklet.pdf">http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/DeepLearningBooklet.pdf</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0006"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Chiles</surname>, <given-names>J.</given-names></string-name> &#x0026; <string-name><surname>Delfiner</surname>, <given-names>P.</given-names></string-name></person-group>, <year>1999</year>, <source><italic>Geostatistics: Modeling spatial uncertainty</italic></source>, p. <fpage>695</fpage>, <publisher-name>John Wiley &#x0026; Sons</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0007"><mixed-citation publication-type="conference"><person-group person-group-type="author"><string-name><surname>Clark</surname>, <given-names>I.</given-names></string-name></person-group>, <year>2010</year>, &#x2018;<article-title>Statistics or geostatistics? Sampling error or nugget effect?</article-title>&#x2019;, <source><italic>Fourth World Conference on Sampling and Blending</italic></source>, vol. <volume>110</volume>, <conf-loc>Geostokos Ltd, Scotland</conf-loc>, <conf-date>October 21&#x2013;23, 2009</conf-date>, pp. <fpage>13</fpage>&#x2013;<lpage>18</lpage>.</mixed-citation></ref>
<ref id="CIT0008"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Davison</surname>, <given-names>A.C.</given-names></string-name> &#x0026; <string-name><surname>Snell</surname>, <given-names>E.J.</given-names></string-name></person-group>, <year>1991</year>, &#x2018;<chapter-title>Residuals and diagnostics</chapter-title>&#x2019;, in <person-group person-group-type="editor"><string-name><given-names>D.V.</given-names> <surname>Hinkley</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Reid</surname></string-name> &#x0026; <string-name><given-names>E.J.</given-names> <surname>Snell</surname></string-name> (eds.)</person-group>, <source><italic>Statistical theory and modelling: In honour of Sir David Cox</italic></source>, pp. <fpage>83</fpage>&#x2013;<lpage>106</lpage>, <publisher-name>Chapman and Hall</publisher-name>, <publisher-loc>London</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0009"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Day</surname>, <given-names>B.</given-names></string-name></person-group>, <year>2003</year>, <source><italic>Submarket identification in property markets: A hedonic housing price model for Glasgow</italic></source>, <comment>Working Paper</comment>, <publisher-name>The Centre for Social and Economic Research on the Global Environment, School of Environmental Science, and University of East Anglia</publisher-name>, <publisher-loc>Norwich</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0010"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>De Haan</surname>, <given-names>J.</given-names></string-name> &#x0026; <string-name><surname>Diewert</surname>, <given-names>E.</given-names></string-name></person-group>, <year>2011</year>, <source><italic>Handbook on residential property indices</italic></source>, <publisher-name>Eurostat European Commission</publisher-name>, <comment>viewed 12 February 2019, from <ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/eurostat/documents/3859598/5925925/KS-RA-12-022-EN.PDF">https://ec.europa.eu/eurostat/documents/3859598/5925925/KS-RA-12-022-EN.PDF</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0011"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Dobson</surname>, <given-names>A.</given-names></string-name> &#x0026; <string-name><surname>Barnett</surname>, <given-names>A.</given-names></string-name></person-group>, <year>2008</year>, &#x2018;<chapter-title>An introduction to generalized linear models</chapter-title>&#x2019;, in <person-group person-group-type="editor"><string-name><given-names>P.C.</given-names> <surname>Bradley</surname></string-name>, <string-name><given-names>J.F.</given-names> <surname>Julian</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Martin</surname></string-name> &#x0026; <string-name><given-names>Z.</given-names> <surname>Jim</surname></string-name> (eds.)</person-group>, <source><italic>Texts in statistical science series</italic></source>, vol. <volume>77</volume>, <edition>3rd</edition> edn., <publisher-name>Chapman &#x0026; Hall/CRC Press</publisher-name>, <publisher-loc>Boca Raton, FL</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0012"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Els</surname>, <given-names>M.</given-names></string-name> &#x0026; <string-name><surname>Von Fintel</surname>, <given-names>D.</given-names></string-name></person-group>, <year>2010</year>, &#x2018;<article-title>Residential property prices in a submarket of South Africa: Separating real returns from attribute growth</article-title>&#x2019;, <source><italic>South African Journal of Economics</italic></source> <volume>78</volume>(<issue>4</issue>), <fpage>418</fpage>&#x2013;<lpage>436</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/j.1813-6982.2010.01244.x">https://doi.org/10.1111/j.1813-6982.2010.01244.x</ext-link></comment></mixed-citation></ref>
<ref id="CIT0013"><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><surname>LeDell</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Gill</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Aiello</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Fu</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Candel</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Click</surname>, <given-names>C.</given-names></string-name> <etal>et al.</etal></person-group>, <year>2019</year>, <source><italic>h2o: R Interface for &#x2018;H2O&#x2019;</italic></source>, <comment>R package version 3.22.1.1, viewed 01 February 2019, from <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=h2o">https://CRAN.R-project.org/package=h2o</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0014"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Goodhart</surname>, <given-names>C.</given-names></string-name> &#x0026; <string-name><surname>Hofmann</surname>, <given-names>B.</given-names></string-name></person-group>, <year>2008</year>, &#x2018;<article-title>House prices, money, credit, and the macroeconomy</article-title>&#x2019;, <source><italic>Oxford Review of Economic Policy</italic></source> <volume>24</volume>(<issue>1</issue>), <fpage>180</fpage>&#x2013;<lpage>205</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/oxrep/grn009">https://doi.org/10.1093/oxrep/grn009</ext-link></comment></mixed-citation></ref>
<ref id="CIT0015"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Goodman</surname>, <given-names>A.C.</given-names></string-name></person-group>, <year>1978</year>, &#x2018;<article-title>Hedonic prices, price indices and housing markets</article-title>&#x2019;, <source><italic>Journal of Urban Economics</italic></source> <volume>5</volume>(<issue>4</issue>), <fpage>471</fpage>&#x2013;<lpage>484</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/0094-1190(78)90004-9">https://doi.org/10.1016/0094-1190(78)90004-9</ext-link></comment></mixed-citation></ref>
<ref id="CIT0016"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guisan</surname>, <given-names>A.</given-names></string-name> &#x0026; <string-name><surname>Zimmermann</surname>, <given-names>N.E.</given-names></string-name></person-group>, <year>2000</year>, &#x2018;<article-title>Predictive habitat distribution models in ecology</article-title>&#x2019;, <source><italic>Ecological Modelling</italic></source> <volume>135</volume>, <fpage>147</fpage>&#x2013;<lpage>186</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S0304-3800(00)00354-9">https://doi.org/10.1016/S0304-3800(00)00354-9</ext-link></comment></mixed-citation></ref>
<ref id="CIT0017"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Gujarati</surname>, <given-names>D.N.</given-names></string-name></person-group>, <year>2004</year>, <source><italic>Basic Econometrics</italic></source>, <edition>4th</edition> edn., <publisher-name>Tata McGraw-Hill</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0018"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Greene</surname>, <given-names>W.H.</given-names></string-name></person-group>, <year>2003</year>, <source><italic>Econometric analysis</italic></source>, <edition>5th</edition> edn., <publisher-name>Prentice Hall</publisher-name>, <publisher-loc>Upper Saddle River, NJ</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0019"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Hastie</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Tibshirani</surname>, <given-names>R.</given-names></string-name> &#x0026; <string-name><surname>Wainwright</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2015</year>, <source><italic>Statistical learning with sparsity: The lasso and generalizations</italic></source>, <publisher-name>CRC Press</publisher-name>, <publisher-loc>Boca Raton, FL</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0020"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hill</surname>, <given-names>R.J.</given-names></string-name></person-group>, <year>2013</year>, &#x2018;<article-title>Hedonic price indexes for residential housing: A survey, evaluation and taxonomy</article-title>&#x2019;, <source><italic>Journal of Economic Surveys</italic></source> <volume>27</volume>(<issue>5</issue>), <fpage>879</fpage>&#x2013;<lpage>914</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/j.1467-6419.2012.00731.x">https://doi.org/10.1111/j.1467-6419.2012.00731.x</ext-link></comment></mixed-citation></ref>
<ref id="CIT0021"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Phillips</surname>, <given-names>P.C.</given-names></string-name> &#x0026; <string-name><surname>Yu</surname>, <given-names>J.</given-names></string-name></person-group>, <year>2015</year>, &#x2018;<article-title>New methodology for constructing real estate price indices applied to the Singapore residential market</article-title>&#x2019;, <source><italic>Journal of Banking &#x0026; Finance</italic></source> <volume>61</volume>, <fpage>121</fpage>&#x2013;<lpage>131</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jbankfin.2015.08.026">https://doi.org/10.1016/j.jbankfin.2015.08.026</ext-link></comment></mixed-citation></ref>
<ref id="CIT0022"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Lindsey</surname>, <given-names>J.K.</given-names></string-name></person-group>, <year>1997</year>, <source><italic>Applying generalized linear models</italic></source>, <publisher-name>Springer Science &#x0026; Business Media</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0023"><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Luus</surname>, <given-names>C.</given-names></string-name></person-group>, <year>2002</year>, <source>The ABSA Residential Property Market Database for South Africa&#x2013;Key Data Trends and Implications</source>. <comment>BIS papers no 21</comment>.</mixed-citation></ref>
<ref id="CIT0024"><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><surname>Lyons</surname>, <given-names>R.C.</given-names></string-name></person-group>, <year>2015</year>, <source><italic>Measuring house prices in the long run: Insights from Dublin, 1900-2015</italic></source>, <comment>viewed 29 April 2018, from <ext-link ext-link-type="uri" xlink:href="http://eh.net/eha/wp-content/uploads/2015/05/Lyons.pdf">http://eh.net/eha/wp-content/uploads/2015/05/Lyons.pdf</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0025"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Mccullagh</surname>, <given-names>P.</given-names></string-name> &#x0026; <string-name><surname>Nelder</surname>, <given-names>J.</given-names></string-name></person-group>, <year>1989</year>, <source><italic>Generalized linear models</italic></source>, vol. <volume>37</volume>, <publisher-name>CRC Press</publisher-name>, <publisher-loc>London</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0026"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Moran</surname>, <given-names>P.</given-names></string-name></person-group>, <year>1950</year>, &#x2018;<article-title>A test for the serial independence of residuals</article-title>&#x2019;, <source><italic>Biometrika</italic></source>, <volume>37</volume>(<issue>1&#x2013;2</issue>), <fpage>pp</fpage>. <fpage>178</fpage>&#x2013;<lpage>181</lpage>.</mixed-citation></ref>
<ref id="CIT0027"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Musset</surname>, <given-names>L.</given-names></string-name></person-group>, <year>2006</year>, <source><italic>OECD environment health and safety publications series on testing and assessment</italic></source>, no. <volume>54</volume> [pdf], <comment>viewed 12 January 2019, from <ext-link ext-link-type="uri" xlink:href="http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono(2006)18&#x0026;doclanguage=en">http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono(2006)18&#x0026;doclanguage=en</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0028"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nelder</surname>, <given-names>J.A.</given-names></string-name> &#x0026; <string-name><surname>Wedderburn</surname>, <given-names>R.W.M.</given-names></string-name></person-group>, <year>1972</year>, &#x2018;<article-title>Generalized linear models</article-title>&#x2019;, <source><italic>Journal of the Royal Statistical Society Series A</italic></source> <volume>135</volume>, <fpage>370</fpage>&#x2013;<lpage>384</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2307/2344614">https://doi.org/10.2307/2344614</ext-link></comment></mixed-citation></ref>
<ref id="CIT0029"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Olivier</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Johnson</surname>, <given-names>W.</given-names></string-name> &#x0026; <string-name><surname>Marshall</surname>, <given-names>G.</given-names></string-name></person-group>, <year>2008</year>, &#x2018;<article-title>The logarithmic transformation and the geometric mean in reporting experimental IgE results: What are they and when and why to use them?</article-title>&#x2019;, <source><italic>Annals of Allergy Asthma Immunology</italic></source> <volume>100</volume>, <fpage>333</fpage>&#x2013;<lpage>338</lpage>, <fpage>625</fpage>&#x2013;<lpage>626</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S1081-1206(10)60595-9">https://doi.org/10.1016/S1081-1206(10)60595-9</ext-link></comment></mixed-citation></ref>
<ref id="CIT0030"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ploner</surname>, <given-names>A.</given-names></string-name></person-group>, <year>1999</year>, &#x2018;<article-title>The use of the variogram cloud in geostatistical modelling</article-title>&#x2019;, <source><italic>Environmetrics</italic></source> <volume>10</volume>(<issue>4</issue>), <fpage>413</fpage>&#x2013;<lpage>437</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/(SICI)1099-095X(199907/08)10:4&#x0025;3C413::AID-ENV365&#x0025;3E3.0.CO;2-U">https://doi.org/10.1002/(SICI)1099-095X(199907/08)10:4&#x0025;3C413::AID-ENV365&#x0025;3E3.0.CO;2-U</ext-link></comment></mixed-citation></ref>
<ref id="CIT0031"><mixed-citation publication-type="book"><person-group person-group-type="author"><collab>R Core Team</collab></person-group>, <year>2018</year>, <source><italic>R: A language and environment for statistical computing</italic></source>, <publisher-name>R Foundation for Statistical Computing</publisher-name>, <publisher-loc>Vienna, Austria</publisher-loc>, <comment>viewed n.d., from <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">https://www.R-project.org/</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0032"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rosen</surname>, <given-names>S.</given-names></string-name></person-group>, <year>1974</year>, &#x2018;<article-title>Hedonic prices and implicit markets: product differentiation in pure competition</article-title>&#x2019;, <source><italic>Journal of Political Economy</italic></source> <volume>82</volume>(<issue>1</issue>), <fpage>34</fpage>&#x2013;<lpage>55</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1086/260169">https://doi.org/10.1086/260169</ext-link></comment></mixed-citation></ref>
<ref id="CIT0033"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shimizu</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Nishimura</surname>, <given-names>K.</given-names></string-name> &#x0026; <string-name><surname>Watanabe</surname>, <given-names>T.</given-names></string-name></person-group>, <year>2010</year>, &#x2018;<article-title>Housing prices in Tokyo: A comparison of Hedonic and repeat sales measures</article-title>&#x2019;, <source><italic>Journal of Economics and Statistics</italic></source> <volume>230</volume>, <fpage>792</fpage>&#x2013;<lpage>813</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1515/jbnst-2010-0612">https://doi.org/10.1515/jbnst-2010-0612</ext-link></comment></mixed-citation></ref>
<ref id="CIT0034"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Silver</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2016</year>, <source><italic>How to better measure hedonic residential property price indexes</italic></source>, <comment>IMF Working Paper, WP/16/213</comment>, <publisher-name>IMF</publisher-name>, <publisher-loc>Washington, DC</publisher-loc>.</mixed-citation></ref>
</ref-list>
<fn-group>
<fn><p><bold>How to cite this article:</bold> Bax, D., Zewotir, T. &#x0026; North, D., 2019, &#x2018;A gamma generalised linear model as an alternative to log linear real estate price functions&#x2019;, <italic>Journal of Economic and Financial Sciences</italic> 12(1), a476. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.4102/jef.v12i1.476">https://doi.org/10.4102/jef.v12i1.476</ext-link></p></fn>
</fn-group>
</back>
</article>